mmBERT-32K Fact-Check Classifier (LoRA)

Part of the MoM (Mixture of Models) family for vLLM Semantic Router.

This adapter is fine-tuned on llm-semantic-router/mmbert-32k-yarn to decide whether a user query should be routed to a fact-checking or factual-verification path.

Labels

Label	ID	Meaning	Example
NO_FACT_CHECK_NEEDED	0	Creative, opinion, brainstorming, or other prompts that do not require factual verification	"Write me a poem about the ocean."
FACT_CHECK_NEEDED	1	Factual questions or claims that should be verified against external knowledge	"What is the capital of France?"

Usage

Load the LoRA adapter with PEFT

from peft import PeftModel
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

base_model = "llm-semantic-router/mmbert-32k-yarn"
adapter = "llm-semantic-router/mmbert32k-factcheck-classifier-lora"

tokenizer = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForSequenceClassification.from_pretrained(
    base_model,
    num_labels=2,
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()

id2label = {
    0: "NO_FACT_CHECK_NEEDED",
    1: "FACT_CHECK_NEEDED",
}

queries = [
    "What is the capital of France?",
    "Write me a poem about the ocean.",
]

for query in queries:
    inputs = tokenizer(
        query,
        return_tensors="pt",
        truncation=True,
        max_length=32768,
    )
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)[0]
        pred = torch.argmax(probs).item()
    print(
        {
            "text": query,
            "label": id2label[pred],
            "score": float(probs[pred]),
        }
    )

Use the merged model for production

If you do not want a PEFT dependency at inference time, use the merged checkpoint:

llm-semantic-router/mmbert32k-factcheck-classifier-merged

from transformers import pipeline

pipe = pipeline(
    "text-classification",
    model="llm-semantic-router/mmbert32k-factcheck-classifier-merged",
)

print(pipe("Who is the current president of France?"))

Model Details

Base model: llm-semantic-router/mmbert-32k-yarn
Architecture: ModernBERT with YaRN RoPE scaling
Context length: 32,768 tokens
Task: Binary sequence classification
Adaptation method: LoRA via PEFT
LoRA rank: 16
LoRA alpha: 32
LoRA dropout: 0.1
Saved label mapping:
- 0 -> NO_FACT_CHECK_NEEDED
- 1 -> FACT_CHECK_NEEDED

The base model is multilingual, but the supervised training mixture for this adapter is primarily composed of public English-language question-answering and instruction datasets. Validate carefully on your target languages before using it as a hard routing gate.

Training Summary

The published adapter is documented as a balanced 8,500-example training mixture with:

Training samples: 6,800
Validation samples: 1,700
Epochs: 5
Method: LoRA fine-tuning on mmBERT-32K-YaRN

In the repository training pipeline, the best checkpoint is selected by validation F1, and the reported evaluation metrics are:

Accuracy
F1
Precision
Recall

Training Data

The current model card for the published adapter describes the following source mixture.

FACT_CHECK_NEEDED

SQuAD: factual question answering prompts
TriviaQA: factual trivia questions
TruthfulQA: high-risk factual questions and misconceptions
HotpotQA: multi-hop factual reasoning questions
CoQA: conversational factual questions
HaluEval: factual QA prompts from hallucination evaluation
RAG datasets: retrieval-oriented factual queries

NO_FACT_CHECK_NEEDED

Dolly: creative writing, brainstorming, and opinion-style instructions
WritingPrompts: creative writing prompts
Alpaca: non-factual instruction-following prompts

The repository training code also supports broader mixtures for local retraining, including NISQ-style information-seeking labels and additional fact-check-oriented corpora. If you plan to reproduce or extend this model, use the repository training script and inspect the dataset-loading logic directly.