News Article Event Classifier — Cross-Encoder (ModernBERT)

A fine-tuned ModernBERT-base cross-encoder for binary classification of news article pairs. Given two articles (by headline, or headline + content), the model predicts whether they refer to the same real-world event.

This model is used in Stage 2 of a news article grouping research pipeline.

Model Description

The model classifies whether a pair of news articles refers to the same underlying real-world event — not just general semantic similarity, but whether both articles report on the same specific occurrence, potentially from different sources or perspectives.

Input structure:

(Article A, Article B) → [1 = Same Event | 0 = Different Event]

The model is built on top of answerdotai/ModernBERT-base, fine-tuned using task-specific layer unfreezing and a focal loss function to handle class imbalance.

How to Use

Installation

pip install transformers torch

Predict with the Model

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSequenceClassification

MODEL_NAME = "Juanillaberia/articles-pairs-event-detection"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
model.eval()

def predict_same_event(headline_a: str, headline_b: str) -> dict:
    """
    Predicts whether two article headlines refer to the same real-world event.

    Args:
        headline_a: Headline of the first article.
        headline_b: Headline of the second article.

    Returns:
        Dictionary with predicted label and probabilities.
    """
    inputs = tokenizer(
        text=headline_a,
        text_pair=headline_b,
        return_tensors="pt",
        truncation=True,
        max_length=128
    )

    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = F.softmax(logits, dim=-1)

    predicted_class = torch.argmax(probs, dim=-1).item()
    labels = {0: "Different Event", 1: "Same Event"}

    return {
        "label": labels[predicted_class],
        "score": probs[0][predicted_class].item(),
        "probabilities": {
            "Different Event": probs[0][0].item(),
            "Same Event": probs[0][1].item()
        }
    }

# Example usage
headline_a = "Government announces new climate policy targeting carbon emissions"
headline_b = "New climate bill signed into law by administration"

result = predict_same_event(headline_a, headline_b)
print(result)
# {'label': 'Same Event', 'score': 0.87, 'probabilities': {'Different Event': 0.13, 'Same Event': 0.87}}

Optional: Apply Temporal Adjustment

If you have access to the publication dates of both articles, you can apply a post-hoc temporal adjustment to improve precision:

from datetime import datetime

def predict_with_date_adjustment(
    headline_a: str,
    headline_b: str,
    date_a: str,
    date_b: str,
    lambda_: float = 0.20,
    threshold: float = 0.45
) -> dict:
    """
    Predicts same-event with temporal adjustment based on publication date difference.

    Args:
        headline_a: Headline of the first article.
        headline_b: Headline of the second article.
        date_a: Publication date of article A (format: 'YYYY-MM-DD').
        date_b: Publication date of article B (format: 'YYYY-MM-DD').
        lambda_: Decay factor for temporal adjustment (default: 0.20).
        threshold: Classification threshold after adjustment (default: 0.45).
    """
    inputs = tokenizer(
        text=headline_a,
        text_pair=headline_b,
        return_tensors="pt",
        truncation=True,
        max_length=128
    )

    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        prob_same_event = F.softmax(logits, dim=-1)[0][1]

    diff_days = abs((
        datetime.strptime(date_a, "%Y-%m-%d") - datetime.strptime(date_b, "%Y-%m-%d")
    ).days)

    adjusted_prob = prob_same_event * torch.exp(torch.tensor(-lambda_ * diff_days))
    predicted = int(adjusted_prob.item() >= threshold)
    labels = {0: "Different Event", 1: "Same Event"}

    return {
        "label": labels[predicted],
        "adjusted_score": adjusted_prob.item(),
        "raw_score": prob_same_event.item(),
        "diff_days": diff_days
    }

# Example usage
result = predict_with_date_adjustment(
    headline_a="Government announces new climate policy",
    headline_b="New climate bill signed into law",
    date_a="2024-03-01",
    date_b="2024-03-02"
)
print(result)

Training Details

Architecture

Base model: answerdotai/ModernBERT-base (149M parameters)
Task head: Standard Hugging Face sequence classification head
Fine-tuning strategy: Embeddings and lower transformer layers are frozen; only the top 3 layers (19–21) and the classification head are trained
Loss function: Focal loss with class weights [0.18, 0.82] to address class imbalance
Max sequence length: 128 tokens (headlines only)

Parameter Summary

Parameter Type	Count
Total	149,606,402
Trainable	15,638,018
Frozen	133,968,384

Hyperparameters (selected via Optuna HPO)

Hyperparameter	Value
Epochs	2
Train Batch Size	32
Gradient Accumulation	2
Learning Rate	3.538e-05
Weight Decay	0.002508
Warmup Ratio	0.1014

HPO was run on a NVIDIA L4 GPU (22.5 GB VRAM, 53 GB system RAM) using 10 Optuna trials with TPE sampler and Median Pruner.

Evaluation Results

Experiment 1: Headlines-Only (~20k samples) — Best Configuration

After threshold tuning (threshold = 0.55):

Metric	Value
Eval Loss	0.0261
Precision	0.8927
Recall	0.8789
F1-Score	0.8838
Accuracy	0.91

Per-class breakdown:

Class	Precision	Recall	F1	Support
Different Event	0.94	0.95	0.94	1656
Same Event	0.74	0.72	0.73	350
Weighted Avg	0.91	0.91	0.91	2006

Experiment 5: Post-Hoc Date Adjustment (λ=0.20, threshold=0.45)

Incorporating the publication date difference as a post-hoc feature further improves performance:

Class	Precision	Recall	F1	Support
Different Event	0.95	0.95	0.95	1656
Same Event	0.75	0.76	0.76	350
Accuracy			0.92	2006
Weighted Avg	0.92	0.92	0.92	2006

The date adjustment results in ~3% F1 improvement while simultaneously reducing both false positives and false negatives.

Experiment Summary

Experiment	F1-Score	Notes
Headlines-Only (~20k)	0.91	Best standalone model
Headlines + Content (~8k)	0.83	Content adds noise, not signal
Content-Only (~8k)	0.41	Confirms headlines are key
Headlines-Only reduced (~8k)	~0.87	Dataset size has minor effect
Headlines + Date Adjustment	0.92	Best overall — recommended configuration

Dataset

The model was trained on a custom dataset of article pairs collected from approximately 40 news outlets, with binary labels indicating whether each pair refers to the same real-world event.

Headlines-only dataset: ~20,056 pairs
Headlines + Content dataset: ~8,284 pairs

@inproceedings {Laban2021NewsHG, title={News Headline Grouping as a Challenging NLU Task}, author={Laban, Philippe and Bandarkar, Lucas and Hearst, Marti A}, booktitle={NAACL 2021}, publisher = {Association for Computational Linguistics}, year={2021} }

Intended Use & Limitations

Intended for:

News deduplication and clustering pipelines
Event-centric article grouping
Research on media coverage analysis

Limitations:

Trained primarily on English-language news headlines
Performance may degrade on non-English or domain-specific content
Full article content was found to reduce performance in experiments — headlines only is recommended
The model may struggle with events that are semantically very similar but distinct (e.g., recurring political debates)

Downloads last month: 6

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for Juanillaberia/articles-pairs-event-detection

Base model

answerdotai/ModernBERT-base

Finetuned

(1188)

this model