LLaMA 3.2 3B Fine-Tuned for TLINK Classification

🧠 Overview

This model is a fully fine-tuned version of meta-llama/Llama-3.2-3B-Instruct for temporal relation classification (TLINK task).

It predicts the temporal relationship between events in text.

Labels:

  • BEFORE
  • AFTER
  • OTHER
  • NONE

πŸ“Š Task

Temporal Relation Classification Given a sentence, the model predicts the temporal relationship between events.


πŸ“š Dataset

  • Name: fahmidiqbal/tlink-classification
  • Format: JSONL (text + label)
  • Labels: BEFORE, AFTER, OTHER, NONE
  • Distribution: Balanced (50 samples per class in test set)

βš™οΈ Training Details

  • Model: meta-llama/Llama-3.2-3B-Instruct
  • Fine-tuning: Full fine-tuning (all parameters updated)
  • Epochs: 3
  • Learning Rate: 2e-5
  • Batch Size: 1 (with gradient accumulation)
  • Optimizer: AdamW
  • Precision: bfloat16
  • Tracking: Weights & Biases (wandb)

πŸ“ˆ Evaluation

Validation Performance

Metric Score
Accuracy 0.8480
Macro-F1 0.8285
Precision 0.8327
Recall 0.8246

Test Performance

Metric Score
Accuracy 0.7950
Macro-F1 0.7973
Precision 0.8100
Recall 0.7950

Per-Class Performance (Test Set)

Class Precision Recall F1-score
BEFORE 0.7826 0.7200 0.7500
AFTER 0.6613 0.8200 0.7321
OTHER 0.8462 0.8800 0.8627
NONE 0.9500 0.7600 0.8444

πŸ”¬ Analysis

  • The model achieves strong overall performance with a Macro-F1 of ~0.80 on the test set.
  • OTHER and NONE classes show the highest performance.
  • The AFTER class has lower precision but strong recall, indicating some over-prediction.
  • The drop from validation (0.83 F1) to test (0.79 F1) suggests a mild generalization gap, but overall stable performance.
  • Errors are mainly observed in:
    • BEFORE vs AFTER confusion
    • NONE vs OTHER boundary ambiguity

πŸš€ Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "AnirbanSaha/llama32-3b-tlink-full-finetune"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "The patient developed fever before taking the medication."

inputs = tokenizer(text, return_tensors="pt", truncation=True)

outputs = model(**inputs)
pred = outputs.logits.argmax(dim=-1).item()

labels = ["BEFORE", "AFTER", "OTHER", "NONE"]
print(labels[pred])

⚠️ Important Notice (LLaMA License)

This model is based on:

πŸ‘‰ meta-llama/Llama-3.2-3B-Instruct

You must request access and accept the license:

https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct

Otherwise, loading this model will fail.


πŸ“Š Training Logs

Weights & Biases run: πŸ‘‰ https://wandb.ai/anirbansaha002-univeristy-of-north-texas/llama32-3b-full-finetune/runs/mrdwp04g


🧩 Limitations

  • Performance depends on dataset distribution
  • May not generalize well to unseen domains
  • Sensitive to long input truncation
  • Some confusion between temporally similar classes (BEFORE vs AFTER)

πŸ”¬ Research Context

This model was developed as part of research on:

  • Temporal reasoning in NLP
  • Relation classification using large language models

πŸ“¬ Contact

Author: Anirban Saha Anik
Affiliation: University of North Texas


⭐ Citation

If you use this model in your research, please cite appropriately.

Downloads last month
53
Safetensors
Model size
3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AnirbanSaha/llama32-3b-tlink-full-finetune

Finetuned
(1542)
this model

Dataset used to train AnirbanSaha/llama32-3b-tlink-full-finetune