metaphor-cat-roberta-large-weights
This model is a fine-tuned version of projecte-aina/roberta-large-ca-v2 on the Catalan metaphor detection dataset metaphor-catalan.
It achieves the following results on the evaluation set:
- Precision: 0.6897
- Recall: 0.5556
- F1: 0.6154
- Accuracy: 0.9713
Model description
This model is a RoBERTa-large transformer trained for Catalan and fine-tuned for token-level metaphor detection.
The model performs sequence labeling to identify metaphorical expressions using a BIO tagging scheme:
- O – non-metaphorical token
- B-METAPHOR – beginning of a metaphorical expression
- I-METAPHOR – continuation of a metaphorical expression
The base model was pretrained as part of the AINA project for Catalan NLP.
During fine-tuning, class-weighted cross-entropy loss was applied to mitigate the strong class imbalance in the dataset, where metaphor tokens are much less frequent than literal tokens.
This model is suitable for research in figurative language detection, computational linguistics, and Catalan NLP applications.
Intended uses & limitations
Intended uses:
- Detecting metaphorical expressions in Catalan text.
- Supporting linguistic research on figurative language.
- Assisting annotation workflows for metaphor datasets.
- Integrating metaphor detection into Catalan NLP pipelines.
Limitations:
- The dataset used for training is relatively small and domain-limited.
- The model may not generalize well to:
- highly informal language
- social media text
- poetry or highly creative figurative language.
- Predictions are performed at the token level, so additional processing may be required to reconstruct full metaphor spans.
- Metaphor detection is inherently subjective, and annotation inconsistencies may affect predictions.
This model should be used as a support tool rather than a definitive metaphor detection system.
Training and evaluation data
Training dataset:
metaphor-catalan
The dataset contains Catalan sentences annotated for metaphorical language at the token level.
Example dataset structure:
tokens: tokenized sentencetags: BIO labels identifying metaphor spans
Label set used during training:
OB-METAPHORI-METAPHOR
The dataset is highly imbalanced, with many more literal tokens than metaphor tokens.
To address this imbalance, class weights were applied during training.
Training procedure
Hyperparameters
- Learning rate: 3e-5
- Train batch size: 4
- Evaluation batch size: 4
- Gradient accumulation steps: 2
- Weight decay: 0.01
- Warmup steps: 50
- Epochs: 15
- LR scheduler: linear
- Optimizer: AdamW
Framework versions
- Transformers: 4.57.3
- PyTorch: 2.9.0
- Datasets: 4.0.0
- Tokenizers: 0.22.1
- Downloads last month
- 3
Model tree for mariadelcarmenramirez/metaphor-cat-roberta-large-weights
Base model
projecte-aina/roberta-large-ca-v2