me²TARA – Qwen3‑1.7B Spark (GGUF)
Summary: A GGUF-quantized LLM ready for Apache Spark: run inference as a step in your data pipeline. Same model, multiple tasks (predict, summarize, classify, extract, DQ, anomaly) via SparkSQL UDFs. Data stays on your cluster; the model runs where the data lives.
Why GGUF-Spark changes how you do analytics
Your analytics stack runs on Spark over tables and files. You want to add LLM-derived features — sentiment, summaries, entities, quality checks — without shipping rows to an API or standing up a separate serving tier. GGUF-Spark turns the LLM into a transformation stage: you register a UDF, point it at this model, and use it in SQL or DataFrame code like any other function. Inference runs on the same executors that hold the data, so you get feature-rich analytics in one pipeline, with no external calls and no data leaving the cluster. That’s the story: analytics and LLM features in one place, at scale.
flowchart LR
subgraph Data[" "]
T[Tables / Files]
end
subgraph Spark["Spark"]
DF[DataFrame]
P[Partitions]
DF --> P
end
subgraph Exec["Executors"]
UDF[GGUF UDF]
M[(This model)]
UDF --> M
end
subgraph Out[" "]
E[Enriched table]
end
T --> DF
P --> UDF
M --> E
UDF --> E
Data flows in; each partition runs the GGUF UDF (this model) where the data lives; out comes an enriched table for analytics or downstream pipelines.
Spark capabilities
- Embedded metadata: Spark deployment info is stored inside the GGUF (
spark.quantization,spark.memory.recommended_ram_gb,spark.llama_cpp.n_ctx/n_threads/n_batch). Runtime uses embedded metadata only (no sidecar required). - Memory-aware deployment: GGUF-Spark's
QuantizationSelectorreads embedded metadata to pick optimal quantization per executor; user-provided config overrides when set. - Partition-Resident Model (PRM): Load once per Spark partition for efficient batch inference.
Model info
| Field | Value |
|---|---|
| Base model | HuggingFaceTB/SmolLM2-360M-Instruct |
| Format | GGUF |
| Quantizations | Q4_K_M, Q8_0 |
| Use case | Batch inference, ETL pipelines, distributed data processing |
Available files
| File | Quantization | Size |
|---|---|---|
| meeTARA-smollm2-360m-instruct-spark-Q4_K_M.gguf | Q4_K_M | 258 MB |
| meeTARA-smollm2-360m-instruct-spark-Q8_0.gguf | Q8_0 | 369 MB |
Usage with GGUF-Spark
Option 1 — GGUFSparkContext (batch / single inference):
from gguf_spark import GGUFSparkContext
ctx = GGUFSparkContext(
spark=spark,
models=[{"name": "default", "path": "path/to/meeTARA-smollm2-360m-instruct-spark-Q4_K_M.gguf"}], # or Q8_0, Q5_K_M
driver_only=True, # or False for distributed
)
ctx.initialize()
result = ctx.generate(prompt="Your prompt here")
Option 2 — SparkSQL UDFs (predict, summarize, label, classify, DQ, extract, anomaly):
from gguf_spark import (
register_gguf_udf,
register_gguf_summarize_udf,
register_gguf_label_udf,
register_gguf_classify_udf,
register_gguf_dq_check_udf,
register_gguf_extract_udf,
register_gguf_anomaly_udf,
)
# Register the UDFs you need (same GGUF, different prompt tasks)
register_gguf_udf(spark, model_path, udf_name="gguf_predict")
register_gguf_summarize_udf(spark, model_path)
register_gguf_label_udf(spark, model_path)
register_gguf_classify_udf(spark, model_path) # gguf_classify(text, labels)
register_gguf_dq_check_udf(spark, model_path)
register_gguf_extract_udf(spark, model_path)
register_gguf_anomaly_udf(spark, model_path)
# Use in SQL
spark.sql("SELECT gguf_predict(text) AS answer, gguf_summarize(body) AS summary FROM my_table")
spark.sql("SELECT gguf_classify(feedback, 'positive|negative|neutral') AS sentiment FROM surveys")
Runtime uses embedded GGUF metadata for n_ctx, n_threads, n_batch when not overridden. For distributed mode, use driver_only=False and ensure executors have sufficient RAM (see embedded metadata or optional sidecar in repo).
Usage with llama.cpp
./llama-cli -m meeTARA-smollm2-360m-instruct-spark-Q4_K_M.gguf -p "Your prompt" -n 256 # or Q8_0.gguf, etc.
Requirements (to run the Quick test)
| Requirement | Version |
|---|---|
| Java | 8 or 11 (required by Apache Spark) |
| Python | 3.11+ |
Packages (minimal set to run examples):
pip install pyspark psutil llama-cpp-python huggingface_hub
Or install from the meetara-spark repo: pip install -e ".[scripts]" (adds huggingface_hub and helpers).
Quick test (examples you can run)
1. Download this model and get the path:
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="meetara-spark/meeTARA-smollm2-360m-instruct-spark",
filename="meeTARA-smollm2-360m-instruct-spark-Q4_K_M.gguf", # or Q8_0.gguf for higher quality
)
print(model_path)
2. Run GGUF-Spark demo (driver-only, no cluster):
# From meetara-spark repo root, with .venv activated
python examples/demo_gguf_spark.py "<model_path>" --driver-only
3. Run with pre-downloaded model (no local convert):
python examples/run_with_hf_model.py --example demo
# Uses meetara-spark models from HF; set REPO_ID/filename in script if needed.
4. Minimal UDF test (summarize):
from pyspark.sql import SparkSession
from pyspark.sql.functions import expr
from gguf_spark import register_gguf_summarize_udf
spark = SparkSession.builder.appName("test").master("local[1]").getOrCreate()
model_path = "path/to/meeTARA-smollm2-360m-instruct-spark-Q4_K_M.gguf" # or from hf_hub_download in step 1
register_gguf_summarize_udf(spark, model_path, driver_only=True)
df = spark.createDataFrame([("The product arrived on time and works well.",)], ["text"])
df.withColumn("summary", expr("gguf_summarize(text)")).show(truncate=False)
spark.stop()
Prompt format
Qwen-style chat template:
<|im_start|>system
You are me²TARA, an empathetic AI assistant.
<|im_end|>
<|im_start|>user
{user_message}
<|im_end|>
<|im_start|>assistant
Research & citation
These models support the research "Distributed Inference of Quantized Large Language Models on Apache Spark: A Memory-Aware Architectural Analysis" (Basina, 2026). For citation and full manuscript, see the meetara-spark repo (docs/GGUF-SPARK-RESEARCH-PAPER.md).
Credits
- Base model: HuggingFaceTB/SmolLM2-360M-Instruct
- Quantization & Spark integration: meetara-spark
- Downloads last month
- 7
4-bit
8-bit
Model tree for meetara-spark/meeTARA-smollm2-360m-instruct-spark
Base model
HuggingFaceTB/SmolLM2-360M