Distil Siemens S7-1200 Docs — Llama 3.2 1B

A fine-tuned Llama 3.2 1B Instruct model, distilled for question-answering over Siemens SIMATIC S7-1200 PLC documentation. Designed to be paired with a RAG pipeline so that dense technical manuals — alarm codes, signal addresses, parameter tables — are accessible at the point of use, even on edge hardware without GPU.

Why this model

Industrial environments with strict security requirements (e.g. Purdue model network segmentation) cannot easily call cloud LLMs, and hosting large open-source models on-prem requires expensive GPUs. This model demonstrates that fine-tuned small language models offer a practical alternative: a 1B parameter model that, after distillation, matches or exceeds a 3B base model on domain-specific technical QA.

Evaluation

All models were evaluated on 144 held-out questions from the S7-1200 system manual using an LLM-as-a-Judge binary score.

Model Parameters LLM-as-a-Judge Pass Rate
Llama 3.2 1B Instruct (base) 1B 45.1%
Llama 3.2 3B Instruct (base) 3B 60.4%
Llama 3.2 1B Instruct (this model) 1B 61.1%

Fine-tuning improved the 1B model by +16 percentage points, bringing it to parity with the 3x larger 3B base model. On 6 out of 144 test questions, this model answered correctly where both the 1B and 3B base models failed.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "distillabs/distil-siemens-s7-1200-docs-llama-1b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

system_prompt = """You are a problem solving model working on task_description XML block:
<task_description>Answer technical questions about an industrial automation system (The S7-1200 Programmable controller) using information provided in the context passage. Make sure to provide answers that are complete and include all relevant details from the context; do not miss critical information from the context.</task_description>
You will be given a single question and a context passage. Answer the question based on the context."""

# In a RAG pipeline, `context` comes from your retriever
context = "The maximum cold junction error is ±1.5°C..."
question = "What is the maximum cold junction error for the SM 1231 Thermocouple module?"

user_message = f"""Now for the real task, solve the task in question block based on the context in context block.
Generate only the solution, do not generate anything else
<context>{context}</context>
<question>{question}</question>"""

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_message},
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)

with torch.no_grad():
    output = model.generate(input_ids, max_new_tokens=256, temperature=0.6, top_p=0.9)

response = tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

Model Details

  • Architecture: LlamaForCausalLM (1.24B parameters)
  • Base model: meta-llama/Llama-3.2-1B-Instruct
  • Context length: 131,072 tokens
  • Precision: bfloat16
  • Training method: Distillation via Distil Labs

Intended Use

This model is intended to be used as part of a RAG pipeline over Siemens S7-1200 documentation. Provide relevant context passages from the manual alongside user questions. The model was not trained for general-purpose chat or tasks outside this documentation domain.

Licenses

Downloads last month
6
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for distil-labs/distil-siemens-s7-1200-docs-llama-1b

Finetuned
(1606)
this model
Quantizations
1 model