News Article Event Classifier — Cross-Encoder (ModernBERT)
A fine-tuned ModernBERT-base cross-encoder for binary classification of news article pairs. Given two articles (by headline, or headline + content), the model predicts whether they refer to the same real-world event.
This model is used in Stage 2 of a news article grouping research pipeline.
Model Description
The model classifies whether a pair of news articles refers to the same underlying real-world event — not just general semantic similarity, but whether both articles report on the same specific occurrence, potentially from different sources or perspectives.
Input structure:
(Article A, Article B) → [1 = Same Event | 0 = Different Event]
The model is built on top of answerdotai/ModernBERT-base, fine-tuned using task-specific layer unfreezing and a focal loss function to handle class imbalance.
How to Use
Installation
pip install transformers torch
Predict with the Model
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSequenceClassification
MODEL_NAME = "Juanillaberia/articles-pairs-event-detection"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
model.eval()
def predict_same_event(headline_a: str, headline_b: str) -> dict:
"""
Predicts whether two article headlines refer to the same real-world event.
Args:
headline_a: Headline of the first article.
headline_b: Headline of the second article.
Returns:
Dictionary with predicted label and probabilities.
"""
inputs = tokenizer(
text=headline_a,
text_pair=headline_b,
return_tensors="pt",
truncation=True,
max_length=128
)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = F.softmax(logits, dim=-1)
predicted_class = torch.argmax(probs, dim=-1).item()
labels = {0: "Different Event", 1: "Same Event"}
return {
"label": labels[predicted_class],
"score": probs[0][predicted_class].item(),
"probabilities": {
"Different Event": probs[0][0].item(),
"Same Event": probs[0][1].item()
}
}
# Example usage
headline_a = "Government announces new climate policy targeting carbon emissions"
headline_b = "New climate bill signed into law by administration"
result = predict_same_event(headline_a, headline_b)
print(result)
# {'label': 'Same Event', 'score': 0.87, 'probabilities': {'Different Event': 0.13, 'Same Event': 0.87}}
Optional: Apply Temporal Adjustment
If you have access to the publication dates of both articles, you can apply a post-hoc temporal adjustment to improve precision:
from datetime import datetime
def predict_with_date_adjustment(
headline_a: str,
headline_b: str,
date_a: str,
date_b: str,
lambda_: float = 0.20,
threshold: float = 0.45
) -> dict:
"""
Predicts same-event with temporal adjustment based on publication date difference.
Args:
headline_a: Headline of the first article.
headline_b: Headline of the second article.
date_a: Publication date of article A (format: 'YYYY-MM-DD').
date_b: Publication date of article B (format: 'YYYY-MM-DD').
lambda_: Decay factor for temporal adjustment (default: 0.20).
threshold: Classification threshold after adjustment (default: 0.45).
"""
inputs = tokenizer(
text=headline_a,
text_pair=headline_b,
return_tensors="pt",
truncation=True,
max_length=128
)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
prob_same_event = F.softmax(logits, dim=-1)[0][1]
diff_days = abs((
datetime.strptime(date_a, "%Y-%m-%d") - datetime.strptime(date_b, "%Y-%m-%d")
).days)
adjusted_prob = prob_same_event * torch.exp(torch.tensor(-lambda_ * diff_days))
predicted = int(adjusted_prob.item() >= threshold)
labels = {0: "Different Event", 1: "Same Event"}
return {
"label": labels[predicted],
"adjusted_score": adjusted_prob.item(),
"raw_score": prob_same_event.item(),
"diff_days": diff_days
}
# Example usage
result = predict_with_date_adjustment(
headline_a="Government announces new climate policy",
headline_b="New climate bill signed into law",
date_a="2024-03-01",
date_b="2024-03-02"
)
print(result)
Training Details
Architecture
- Base model:
answerdotai/ModernBERT-base(149M parameters) - Task head: Standard Hugging Face sequence classification head
- Fine-tuning strategy: Embeddings and lower transformer layers are frozen; only the top 3 layers (19–21) and the classification head are trained
- Loss function: Focal loss with class weights
[0.18, 0.82]to address class imbalance - Max sequence length: 128 tokens (headlines only)
Parameter Summary
| Parameter Type | Count |
|---|---|
| Total | 149,606,402 |
| Trainable | 15,638,018 |
| Frozen | 133,968,384 |
Hyperparameters (selected via Optuna HPO)
| Hyperparameter | Value |
|---|---|
| Epochs | 2 |
| Train Batch Size | 32 |
| Gradient Accumulation | 2 |
| Learning Rate | 3.538e-05 |
| Weight Decay | 0.002508 |
| Warmup Ratio | 0.1014 |
HPO was run on a NVIDIA L4 GPU (22.5 GB VRAM, 53 GB system RAM) using 10 Optuna trials with TPE sampler and Median Pruner.
Evaluation Results
Experiment 1: Headlines-Only (~20k samples) — Best Configuration
After threshold tuning (threshold = 0.55):
| Metric | Value |
|---|---|
| Eval Loss | 0.0261 |
| Precision | 0.8927 |
| Recall | 0.8789 |
| F1-Score | 0.8838 |
| Accuracy | 0.91 |
Per-class breakdown:
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| Different Event | 0.94 | 0.95 | 0.94 | 1656 |
| Same Event | 0.74 | 0.72 | 0.73 | 350 |
| Weighted Avg | 0.91 | 0.91 | 0.91 | 2006 |
Experiment 5: Post-Hoc Date Adjustment (λ=0.20, threshold=0.45)
Incorporating the publication date difference as a post-hoc feature further improves performance:
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| Different Event | 0.95 | 0.95 | 0.95 | 1656 |
| Same Event | 0.75 | 0.76 | 0.76 | 350 |
| Accuracy | 0.92 | 2006 | ||
| Weighted Avg | 0.92 | 0.92 | 0.92 | 2006 |
The date adjustment results in ~3% F1 improvement while simultaneously reducing both false positives and false negatives.
Experiment Summary
| Experiment | F1-Score | Notes |
|---|---|---|
| Headlines-Only (~20k) | 0.91 | Best standalone model |
| Headlines + Content (~8k) | 0.83 | Content adds noise, not signal |
| Content-Only (~8k) | 0.41 | Confirms headlines are key |
| Headlines-Only reduced (~8k) | ~0.87 | Dataset size has minor effect |
| Headlines + Date Adjustment | 0.92 | Best overall — recommended configuration |
Dataset
The model was trained on a custom dataset of article pairs collected from approximately 40 news outlets, with binary labels indicating whether each pair refers to the same real-world event.
- Headlines-only dataset: ~20,056 pairs
- Headlines + Content dataset: ~8,284 pairs
@inproceedings {Laban2021NewsHG, title={News Headline Grouping as a Challenging NLU Task}, author={Laban, Philippe and Bandarkar, Lucas and Hearst, Marti A}, booktitle={NAACL 2021}, publisher = {Association for Computational Linguistics}, year={2021} }
Intended Use & Limitations
Intended for:
- News deduplication and clustering pipelines
- Event-centric article grouping
- Research on media coverage analysis
Limitations:
- Trained primarily on English-language news headlines
- Performance may degrade on non-English or domain-specific content
- Full article content was found to reduce performance in experiments — headlines only is recommended
- The model may struggle with events that are semantically very similar but distinct (e.g., recurring political debates)
- Downloads last month
- 6
Model tree for Juanillaberia/articles-pairs-event-detection
Base model
answerdotai/ModernBERT-base