LLaMA 3.2 3B Fine-Tuned for TLINK Classification

🧠 Overview

This model is a fully fine-tuned version of meta-llama/Llama-3.2-3B-Instruct for temporal relation classification (TLINK task).

It predicts the temporal relationship between events in text.

Labels:

BEFORE
AFTER
OTHER
NONE

📊 Task

Temporal Relation Classification Given a sentence, the model predicts the temporal relationship between events.

📚 Dataset

Name: fahmidiqbal/tlink-classification
Format: JSONL (text + label)
Labels: BEFORE, AFTER, OTHER, NONE
Distribution: Balanced (50 samples per class in test set)

⚙️ Training Details

Model: meta-llama/Llama-3.2-3B-Instruct
Fine-tuning: Full fine-tuning (all parameters updated)
Epochs: 3
Learning Rate: 2e-5
Batch Size: 1 (with gradient accumulation)
Optimizer: AdamW
Precision: bfloat16
Tracking: Weights & Biases (wandb)

📈 Evaluation

Validation Performance

Metric	Score
Accuracy	0.8480
Macro-F1	0.8285
Precision	0.8327
Recall	0.8246

Test Performance

Metric	Score
Accuracy	0.7950
Macro-F1	0.7973
Precision	0.8100
Recall	0.7950

Per-Class Performance (Test Set)

Class	Precision	Recall	F1-score
BEFORE	0.7826	0.7200	0.7500
AFTER	0.6613	0.8200	0.7321
OTHER	0.8462	0.8800	0.8627
NONE	0.9500	0.7600	0.8444

🔬 Analysis

The model achieves strong overall performance with a Macro-F1 of ~0.80 on the test set.
OTHER and NONE classes show the highest performance.
The AFTER class has lower precision but strong recall, indicating some over-prediction.
The drop from validation (0.83 F1) to test (0.79 F1) suggests a mild generalization gap, but overall stable performance.
Errors are mainly observed in:
- BEFORE vs AFTER confusion
- NONE vs OTHER boundary ambiguity

🚀 Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "AnirbanSaha/llama32-3b-tlink-full-finetune"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "The patient developed fever before taking the medication."

inputs = tokenizer(text, return_tensors="pt", truncation=True)

outputs = model(**inputs)
pred = outputs.logits.argmax(dim=-1).item()

labels = ["BEFORE", "AFTER", "OTHER", "NONE"]
print(labels[pred])

⚠️ Important Notice (LLaMA License)

This model is based on:

👉 meta-llama/Llama-3.2-3B-Instruct

You must request access and accept the license:

https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct

Otherwise, loading this model will fail.

📊 Training Logs

Weights & Biases run: 👉 https://wandb.ai/anirbansaha002-univeristy-of-north-texas/llama32-3b-full-finetune/runs/mrdwp04g

🧩 Limitations

Performance depends on dataset distribution
May not generalize well to unseen domains
Sensitive to long input truncation
Some confusion between temporally similar classes (BEFORE vs AFTER)

🔬 Research Context

This model was developed as part of research on:

Temporal reasoning in NLP
Relation classification using large language models

📬 Contact

Author: Anirban Saha Anik
Affiliation: University of North Texas

⭐ Citation

If you use this model in your research, please cite appropriately.

Downloads last month: 53

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for AnirbanSaha/llama32-3b-tlink-full-finetune

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

(1542)

this model

AnirbanSaha
/

llama32-3b-tlink-full-finetune