YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

🧠 MarianMT-Text-Translation-AI-Model-"en-fr"

A sequence-to-sequence translation model fine-tuned on English–French sentence pairs. This model translates English text into French and is built using the Hugging Face MarianMTModel. It’s ideal for general-purpose translation, educational use, and light regulatory or formal communication tasks between English and French.


✨ Model Highlights

  • πŸ“Œ Based on Helsinki-NLP/opus-mt-en-fr
  • πŸ” Fine-tuned on a cleaned parallel corpus of English-French sentence pairs
  • ⚑ Translates from English β†’ French
  • 🧠 Built using Hugging Face Transformers and PyTorch

🧠 Intended Uses

  • βœ… Translating English feedback, emails, or documents into French
  • βœ… Cross-lingual support for customer service or regulatory communication
  • βœ… Educational platforms and language learning

🚫 Limitations

  • ❌ Not suitable for informal slang or code-mixed inputs
  • πŸ“ Inputs longer than 128 tokens will be truncated
  • πŸ€” May produce less accurate translations for highly specialized or domain-specific language
  • ⚠️ Not intended for legal, medical, or safety-critical translations without expert review

πŸ‹οΈβ€β™‚οΈ Training Details

Attribute Value
Base Model Helsinki-NLP/opus-mt-en-fr
Dataset Parallel English-French corpus
Task Type Translation
Max Token Length 128
Epochs 3
Batch Size 16
Optimizer AdamW
Loss Function CrossEntropyLoss
Framework PyTorch + Transformers
Hardware CUDA-enabled GPU

πŸ“Š Evaluation Metrics

Metric Score
BLEU Score 27.82

πŸ”Ž Output Details

  • Input: English text string
  • Output: Translated French text string

πŸš€ Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

model_name = "AventIQ-AI/MarianMT-Text-Translation-AI-Model-en-fr"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
model.eval()

def translate(text):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    finetuned_model.to(device)
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True).to(device)
    outputs = finetuned_model.generate(**inputs)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example
print(translate("Hello, how are you?"))

πŸ“ Repository Structure

finetuned-model/
β”œβ”€β”€ config.json               βœ… Model architecture & config
β”œβ”€β”€ pytorch_model.bin         βœ… Model weights
β”œβ”€β”€ tokenizer_config.json     βœ… Tokenizer settings
β”œβ”€β”€ tokenizer.json            βœ… Tokenizer vocabulary (JSON format)
β”œβ”€β”€ source.spm                βœ… SentencePiece model for source language
β”œβ”€β”€ target.spm                βœ… SentencePiece model for target language
β”œβ”€β”€ special_tokens_map.json   βœ… Special tokens mapping
β”œβ”€β”€ generation_config.json    βœ… (Optional) Generation defaults
β”œβ”€β”€ README.md                 βœ… Model card

🀝 Contributing

Contributions are welcome! Feel free to open an issue or pull request to improve the model, training scripts, or documentation.

Downloads last month
19
Safetensors
Model size
74.7M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support