MT5 Biomedical Translator (Mode 1 with NE Embeddings)
This is a fine-tuned MT5 model for English-to-Portuguese biomedical translation. It uses named entity-aware embeddings to prioritise the translation of biomedical terms.
π§ Model Highlights
- β
Based on
google/mt5-base - 𧬠Trained with additional Named Entity (NE) tag embeddings
- βοΈ Mode 1: injects NE embeddings during both training and inference
- π Achieved BLEU score of 52.27 on validation set
π Files
This repository includes:
pytorch_model.binormodel.safetensorsconfig.jsontokenizer.json+ sentencepiece files
π» Usage
from transformers import MT5ForConditionalGeneration, AutoTokenizer
model = MT5ForConditionalGeneration.from_pretrained("AnaluRRamos/mt5-biomedical-translation-mode1")
tokenizer = AutoTokenizer.from_pretrained("AnaluRRamos/mt5-biomedical-translation-mode1")
input_text = "Translate this English biomedical abstract."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
translated = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translated)
- Downloads last month
- 40