Medical Translation Model: Vietnamese → English

Model Description

This is a LoRA fine-tuned model based on Qwen/Qwen2.5-0.5B for medical text translation from Vietnamese to English.

The model was fine-tuned on the VLSP 2025 Medical Translation dataset using LoRA (Low-Rank Adaptation) for efficient training and deployment.

Key Features

🏥 Specialized for medical domain: Trained on medical terminology and clinical texts
🚀 Efficient LoRA fine-tuning: Only trains a small subset of parameters
📊 High quality translations: Optimized for medical accuracy and fluency
💾 Lightweight: LoRA adapters are much smaller than full model weights

Model Details

Base Model: Qwen/Qwen2.5-0.5B
Language Pair: Vietnamese → English
Domain: Medical/Clinical
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Task: Machine Translation

Performance

BLEU Score: 41.20

Usage

Requirements

pip install transformers peft torch

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model and tokenizer
model_name = "Qwen/Qwen2.5-0.5B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load LoRA adapters
model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/YOUR_REPO_NAME")
model.eval()

Translation Example

def translate(text: str, model, tokenizer, max_new_tokens: int = 256):
    """Translate Vietnamese medical text to English."""
    prompt = f"Translate the following Vietnamese medical text to English:\n\n{text}\n\nTranslation:"
    
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            num_beams=5,
            temperature=0.8,
            do_sample=False,
            repetition_penalty=1.1,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )
    
    full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
    translation = full_output[len(prompt):].strip()
    
    return translation

# Example usage
vi_text = "Bệnh nhân được chẩn đoán mắc bệnh tiểu đường type 2."
translation = translate(vi_text, model, tokenizer)
print(translation)

Batch Translation

def translate_batch(texts: list, model, tokenizer, batch_size: int = 8):
    """Translate multiple texts in batches."""
    translations = []
    
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        prompts = [f"Translate the following Vietnamese medical text to English:\n\n{text}\n\nTranslation:" 
                   for text in batch]
        
        inputs = tokenizer(prompts, return_tensors="pt", padding=True).to(model.device)
        
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_new_tokens=256,
                num_beams=5,
                pad_token_id=tokenizer.pad_token_id,
            )
        
        for j, output in enumerate(outputs):
            full_output = tokenizer.decode(output, skip_special_tokens=True)
            translation = full_output[len(prompts[j]):].strip()
            translations.append(translation)
    
    return translations

Training Data

Dataset: VLSP 2025 Medical Translation
Domain: Medical and clinical texts
Languages: Vietnamese-English parallel corpus
Preprocessing: Filtered for quality and medical relevance

Limitations and Considerations

⚠️ Important Limitations:

This model is designed for medical domain translation. Performance on general domain text may be suboptimal.
The model should be used as an assistive tool and translations should be reviewed by medical professionals.
Not validated for clinical use - human expert review is essential for medical translations.
May not handle very rare medical terms or newly coined terminology.
Performance may vary with different medical specialties.

Ethical Considerations

Medical translations require high accuracy and should be verified by qualified professionals
This model should not be used as the sole translation source for critical medical information
Always consult with medical experts for important clinical decisions

Citation

If you use this model, please cite:

@misc{medical-translation-vien,
  author = {VLSP 2025 Team},
  title = {Medical Translation Model: Vietnamese to English},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/YOUR_USERNAME/YOUR_REPO_NAME}}
}

License

This model inherits the license from the base model (Qwen/Qwen2.5-0.5B). Please refer to the base model's license for usage terms.

Acknowledgments

Base model: Qwen/Qwen2.5-0.5B
Training framework: HuggingFace Transformers + PEFT
Dataset: VLSP 2025 Medical Translation

Created: 2025-12-15

For questions or issues, please open an issue on the model repository.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for MothMalone/medical-vi2en-qwen0.5b

Base model

Qwen/Qwen2.5-0.5B

Adapter

(380)

this model