emsLLM-4B / README.md
PEGAAICC's picture
Upload folder using huggingface_hub
c9a3647 verified
---
language:
- en
- zh
license: apache-2.0
library_name: transformers
tags:
- industrial
- maintenance
- fault-diagnosis
- sop-generation
- qwen
base_model: Qwen/Qwen3-4B
pipeline_tag: text-generation
---
# emsLLM: Industrial Equipment Fault Diagnosis & Maintenance Assistant
## Model Summary
**emsLLM** is a Large Language Model (LLM) fine-tuned specifically for the Electronic Manufacturing Services (EMS) and industrial maintenance sectors. Built upon the **Qwen3-4B** architecture, this model has been trained on equipment operation manuals, troubleshooting guides, and standardized maintenance documents. It is designed to assist engineers with **equipment fault diagnosis** and **SOP (Standard Operating Procedure) generation**.
This is a **Merged Version**, meaning the fine-tuned LoRA weights have been merged into the base model. It can be loaded directly for inference without requiring additional adapters, ensuring easy deployment and integration.
## Evaluation & Performance
We evaluated **emsLLM** against state-of-the-art general-purpose models (**Llama-3.3-70B** and **Qwen3-32B**) using an "LLM-as-a-Judge" approach. The evaluation focused on industrial fault diagnosis accuracy and inference speed.
### 1. Diagnosis Accuracy
Despite their significantly smaller parameter count, our fine-tuned models (trained on public technical documents) outperformed larger base models in retrieving and generating correct maintenance solutions.
| Model | Public Test Set Accuracy | Private Test Set Accuracy |
| :--- | :---: | :---: |
| **emsLLM-8B (Ours)** | **95.0%** | **100.0%** |
| **emsLLM-4B (Ours)** | **95.0%** | **96.0%** |
| Qwen3-32B (Base) | 90.0% | 92.0% |
| Llama-3.3-70B (Base) | 87.5% | 92.0% |
> **Key Finding:** The **emsLLM-8B** achieves **95% accuracy** on public technical queries and **100%** on private domain tasks, surpassing the Llama-3.3-70B base model while using significantly fewer resources.
### 2. Inference Speed
For industrial applications requiring real-time response, our models offer ultra-low latency.
| Model | Time to First Token (TTFT) | Time Per Output Token (TPOT) |
| :--- | :---: | :---: |
| **emsLLM-4B** | **30.46 ms** | **3.20 ms** |
| **emsLLM-8B** | **58.92 ms** | **6.40 ms** |
| Qwen3-32B | 253.49 ms | 20.91 ms |
| Llama-3.3-70B | 428.14 ms | 36.02 ms |
> **Key Finding:** The **emsLLM-8B** provides an approximate **6x speedup** in generation speed (TPOT) compared to Llama-3.3-70B, making it highly suitable for edge deployment and rapid interaction.
## Key Features
* **Fault Diagnosis Assistant**: Provides analysis of potential causes and maintenance suggestions for common production line equipment (e.g., robotic arms, dispensing machines).
* **SOP Content Generation**: Understands industrial documentation logic and assists in drafting standardized SOPs, including operational procedures and responsibility assignments.
* **Domain Optimization**: Enhanced understanding of maintenance terminology and logic compared to generic models, offering responses more aligned with engineering field requirements.
* **Ready-to-Deploy**: Weights are fully merged, supporting direct loading with standard inference frameworks (e.g., Hugging Face Transformers, vLLM).
## Quick Start
Since the weights are already merged, you can load the model just like any standard Transformers model:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Replace with your actual Hugging Face model ID
model_id = "Your-Username/emsLLM"
# 1. Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
# 2. Load Model
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# 3. Example: Fault Diagnosis
messages = [
{"role": "system", "content": "You are a professional industrial equipment maintenance assistant."},
{"role": "user", "content": "The dispensing machine on the production line has unstable output. What are the possible causes and solutions?"}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512,
temperature=0.1 # Low temperature is recommended for stability in industrial tasks
)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```
## **Contact info**
- Email: fred_tung@pegatroncorp.com