me²TARA – Qwen3‑1.7B Spark (GGUF)

Summary: A GGUF-quantized LLM ready for Apache Spark: run inference as a step in your data pipeline. Same model, multiple tasks (predict, summarize, classify, extract, DQ, anomaly) via SparkSQL UDFs. Data stays on your cluster; the model runs where the data lives.

Why GGUF-Spark changes how you do analytics

Your analytics stack runs on Spark over tables and files. You want to add LLM-derived features — sentiment, summaries, entities, quality checks — without shipping rows to an API or standing up a separate serving tier. GGUF-Spark turns the LLM into a transformation stage: you register a UDF, point it at this model, and use it in SQL or DataFrame code like any other function. Inference runs on the same executors that hold the data, so you get feature-rich analytics in one pipeline, with no external calls and no data leaving the cluster. That’s the story: analytics and LLM features in one place, at scale.

flowchart LR
  subgraph Data[" "]
    T[Tables / Files]
  end
  subgraph Spark["Spark"]
    DF[DataFrame]
    P[Partitions]
    DF --> P
  end
  subgraph Exec["Executors"]
    UDF[GGUF UDF]
    M[(This model)]
    UDF --> M
  end
  subgraph Out[" "]
    E[Enriched table]
  end
  T --> DF
  P --> UDF
  M --> E
  UDF --> E

Data flows in; each partition runs the GGUF UDF (this model) where the data lives; out comes an enriched table for analytics or downstream pipelines.

Spark capabilities

  • Embedded metadata: Spark deployment info is stored inside the GGUF (spark.quantization, spark.memory.recommended_ram_gb, spark.llama_cpp.n_ctx/n_threads/n_batch). Runtime uses embedded metadata only (no sidecar required).
  • Memory-aware deployment: GGUF-Spark's QuantizationSelector reads embedded metadata to pick optimal quantization per executor; user-provided config overrides when set.
  • Partition-Resident Model (PRM): Load once per Spark partition for efficient batch inference.

Model info

Field Value
Base model HuggingFaceTB/SmolLM2-360M-Instruct
Format GGUF
Quantizations Q4_K_M, Q8_0
Use case Batch inference, ETL pipelines, distributed data processing

Available files

File Quantization Size
meeTARA-smollm2-360m-instruct-spark-Q4_K_M.gguf Q4_K_M 258 MB
meeTARA-smollm2-360m-instruct-spark-Q8_0.gguf Q8_0 369 MB

Usage with GGUF-Spark

Option 1 — GGUFSparkContext (batch / single inference):

from gguf_spark import GGUFSparkContext

ctx = GGUFSparkContext(
    spark=spark,
    models=[{"name": "default", "path": "path/to/meeTARA-smollm2-360m-instruct-spark-Q4_K_M.gguf"}],  # or Q8_0, Q5_K_M
    driver_only=True,  # or False for distributed
)
ctx.initialize()
result = ctx.generate(prompt="Your prompt here")

Option 2 — SparkSQL UDFs (predict, summarize, label, classify, DQ, extract, anomaly):

from gguf_spark import (
    register_gguf_udf,
    register_gguf_summarize_udf,
    register_gguf_label_udf,
    register_gguf_classify_udf,
    register_gguf_dq_check_udf,
    register_gguf_extract_udf,
    register_gguf_anomaly_udf,
)

# Register the UDFs you need (same GGUF, different prompt tasks)
register_gguf_udf(spark, model_path, udf_name="gguf_predict")
register_gguf_summarize_udf(spark, model_path)
register_gguf_label_udf(spark, model_path)
register_gguf_classify_udf(spark, model_path)  # gguf_classify(text, labels)
register_gguf_dq_check_udf(spark, model_path)
register_gguf_extract_udf(spark, model_path)
register_gguf_anomaly_udf(spark, model_path)

# Use in SQL
spark.sql("SELECT gguf_predict(text) AS answer, gguf_summarize(body) AS summary FROM my_table")
spark.sql("SELECT gguf_classify(feedback, 'positive|negative|neutral') AS sentiment FROM surveys")

Runtime uses embedded GGUF metadata for n_ctx, n_threads, n_batch when not overridden. For distributed mode, use driver_only=False and ensure executors have sufficient RAM (see embedded metadata or optional sidecar in repo).

Usage with llama.cpp

./llama-cli -m meeTARA-smollm2-360m-instruct-spark-Q4_K_M.gguf -p "Your prompt" -n 256  # or Q8_0.gguf, etc.

Requirements (to run the Quick test)

Requirement Version
Java 8 or 11 (required by Apache Spark)
Python 3.11+

Packages (minimal set to run examples):

pip install pyspark psutil llama-cpp-python huggingface_hub

Or install from the meetara-spark repo: pip install -e ".[scripts]" (adds huggingface_hub and helpers).

Quick test (examples you can run)

1. Download this model and get the path:

from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="meetara-spark/meeTARA-smollm2-360m-instruct-spark",
    filename="meeTARA-smollm2-360m-instruct-spark-Q4_K_M.gguf",  # or Q8_0.gguf for higher quality
)
print(model_path)

2. Run GGUF-Spark demo (driver-only, no cluster):

# From meetara-spark repo root, with .venv activated
python examples/demo_gguf_spark.py "<model_path>" --driver-only

3. Run with pre-downloaded model (no local convert):

python examples/run_with_hf_model.py --example demo
# Uses meetara-spark models from HF; set REPO_ID/filename in script if needed.

4. Minimal UDF test (summarize):

from pyspark.sql import SparkSession
from pyspark.sql.functions import expr
from gguf_spark import register_gguf_summarize_udf

spark = SparkSession.builder.appName("test").master("local[1]").getOrCreate()
model_path = "path/to/meeTARA-smollm2-360m-instruct-spark-Q4_K_M.gguf"  # or from hf_hub_download in step 1

register_gguf_summarize_udf(spark, model_path, driver_only=True)
df = spark.createDataFrame([("The product arrived on time and works well.",)], ["text"])
df.withColumn("summary", expr("gguf_summarize(text)")).show(truncate=False)
spark.stop()

Prompt format

Qwen-style chat template:

<|im_start|>system
You are me²TARA, an empathetic AI assistant.
<|im_end|>
<|im_start|>user
{user_message}
<|im_end|>
<|im_start|>assistant

Research & citation

These models support the research "Distributed Inference of Quantized Large Language Models on Apache Spark: A Memory-Aware Architectural Analysis" (Basina, 2026). For citation and full manuscript, see the meetara-spark repo (docs/GGUF-SPARK-RESEARCH-PAPER.md).

Credits

  • Base model: HuggingFaceTB/SmolLM2-360M-Instruct
  • Quantization & Spark integration: meetara-spark
Downloads last month
7
GGUF
Model size
0.4B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for meetara-spark/meeTARA-smollm2-360m-instruct-spark

Quantized
(81)
this model

Space using meetara-spark/meeTARA-smollm2-360m-instruct-spark 1