mmBERT-32K Fact-Check Classifier (LoRA)
Part of the MoM (Mixture of Models) family for vLLM Semantic Router.
This adapter is fine-tuned on llm-semantic-router/mmbert-32k-yarn to decide whether a user query should be routed to a fact-checking or factual-verification path.
Labels
| Label | ID | Meaning | Example |
|---|---|---|---|
| NO_FACT_CHECK_NEEDED | 0 | Creative, opinion, brainstorming, or other prompts that do not require factual verification | "Write me a poem about the ocean." |
| FACT_CHECK_NEEDED | 1 | Factual questions or claims that should be verified against external knowledge | "What is the capital of France?" |
Usage
Load the LoRA adapter with PEFT
from peft import PeftModel
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
base_model = "llm-semantic-router/mmbert-32k-yarn"
adapter = "llm-semantic-router/mmbert32k-factcheck-classifier-lora"
tokenizer = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForSequenceClassification.from_pretrained(
base_model,
num_labels=2,
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()
id2label = {
0: "NO_FACT_CHECK_NEEDED",
1: "FACT_CHECK_NEEDED",
}
queries = [
"What is the capital of France?",
"Write me a poem about the ocean.",
]
for query in queries:
inputs = tokenizer(
query,
return_tensors="pt",
truncation=True,
max_length=32768,
)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)[0]
pred = torch.argmax(probs).item()
print(
{
"text": query,
"label": id2label[pred],
"score": float(probs[pred]),
}
)
Use the merged model for production
If you do not want a PEFT dependency at inference time, use the merged checkpoint:
from transformers import pipeline
pipe = pipeline(
"text-classification",
model="llm-semantic-router/mmbert32k-factcheck-classifier-merged",
)
print(pipe("Who is the current president of France?"))
Model Details
- Base model:
llm-semantic-router/mmbert-32k-yarn - Architecture: ModernBERT with YaRN RoPE scaling
- Context length: 32,768 tokens
- Task: Binary sequence classification
- Adaptation method: LoRA via PEFT
- LoRA rank: 16
- LoRA alpha: 32
- LoRA dropout: 0.1
- Saved label mapping:
0 -> NO_FACT_CHECK_NEEDED1 -> FACT_CHECK_NEEDED
The base model is multilingual, but the supervised training mixture for this adapter is primarily composed of public English-language question-answering and instruction datasets. Validate carefully on your target languages before using it as a hard routing gate.
Training Summary
The published adapter is documented as a balanced 8,500-example training mixture with:
- Training samples: 6,800
- Validation samples: 1,700
- Epochs: 5
- Method: LoRA fine-tuning on mmBERT-32K-YaRN
In the repository training pipeline, the best checkpoint is selected by validation F1, and the reported evaluation metrics are:
- Accuracy
- F1
- Precision
- Recall
Training Data
The current model card for the published adapter describes the following source mixture.
FACT_CHECK_NEEDED
- SQuAD: factual question answering prompts
- TriviaQA: factual trivia questions
- TruthfulQA: high-risk factual questions and misconceptions
- HotpotQA: multi-hop factual reasoning questions
- CoQA: conversational factual questions
- HaluEval: factual QA prompts from hallucination evaluation
- RAG datasets: retrieval-oriented factual queries
NO_FACT_CHECK_NEEDED
- Dolly: creative writing, brainstorming, and opinion-style instructions
- WritingPrompts: creative writing prompts
- Alpaca: non-factual instruction-following prompts
The repository training code also supports broader mixtures for local retraining, including NISQ-style information-seeking labels and additional fact-check-oriented corpora. If you plan to reproduce or extend this model, use the repository training script and inspect the dataset-loading logic directly.
Intended Use
This model is intended for query-level routing, such as:
- deciding whether a prompt should trigger a fact-check or verification subsystem
- routing requests in RAG or knowledge-grounded generation systems
- filtering which prompts need retrieval before answer generation
Out-of-Scope Use
- answer-level factuality grading or hallucination detection on model outputs
- content moderation or safety classification
- multi-turn conversation reasoning without additional context handling
- high-stakes factual decisions without downstream verification
Related Models
- Merged model:
llm-semantic-router/mmbert32k-factcheck-classifier-merged - Base model:
llm-semantic-router/mmbert-32k-yarn - Collection:
llm-semantic-router/mom-multilingual-class
License
Apache 2.0
- Downloads last month
- 9
Model tree for llm-semantic-router/mmbert32k-factcheck-classifier-lora
Base model
jhu-clsp/mmBERT-base