English → Hindi Transformer
A from-scratch PyTorch encoder-decoder Transformer for English → Hindi machine translation, trained on a raw Tatoeba EN-HI export (13 186 sentence pairs, including multiple Hindi translations per English sentence).
This repository provides two versioned checkpoints:
| Version | Description | BLEU | Epochs | Weights file |
|---|---|---|---|---|
| v1.0.0 | Baseline — fixed hyperparameters | 0.7566 | 100 | v1.0.0/transformer_translation_final.pth |
| v1.1.0 ✔ recommended | Ray Tune + Optuna optimised | 0.8369 | 50 | v1.1.0/m25csa023_ass_4_best_model.pth |
v1.1.0 achieves +10.6% BLEU in half the epochs compared to v1.0.0.
Training Summary
(a) Training loss curves - baseline (100 ep) vs tuned (50 ep). (b) BLEU progression across epochs. (c) All 20 Ray Tune trial loss curves (grey = pruned by ASHA, orange = best). (d) Hyperparameter importance (Spearman ρ) - batch size & dropout matter most. (e–g) Scatter plots: LR / dropout / batch size vs final loss across all trials. (h) Final comparison bar chart: time, loss, and BLEU for v1.0.0 vs v1.1.0.
Dataset
Source: Raw export from tatoeba.org/en/downloads - English-Hindi sentence pairs.
Note: This is the unprocessed Tatoeba dump, not the Helsinki-NLP filtered version.
The file used during training: English-Hindi.tsv
TSV Column Structure
| Column | Content | Example |
|---|---|---|
| 1 | English sentence ID (Tatoeba) | 1282 |
| 2 | English sentence | Muiriel is 20 now. |
| 3 | Hindi sentence ID (Tatoeba) | 485968 |
| 4 | Hindi sentence | म्यूरियल अब बीस साल की हो गई है। |
Statistics
| Property | Value |
|---|---|
| Total sentence pairs | 13 186 |
| Unique English sentences | 11 109 (2 077 have multiple Hindi translations) |
| Mean English length | 5.6 words |
| Mean Hindi length | 6.3 words |
| Max English length | 53 tokens |
| Max Hindi length | 57 tokens |
| English ID range | 1 277 – 12 886 231 (Tatoeba IDs) |
| Hindi ID range | 440 811 – 13 125 624 (Tatoeba IDs) |
| Tokenisation | Whitespace split, lowercased |
| Min word frequency (vocab) | 2 |
Repository File Structure
en-hi-transformer/
├── README.md ← model card (this page)
├── config.json ← shared architecture config
├── assets/
│ └── summary.png ← training & evaluation plots
├── v1.0.0/
│ ├── transformer_translation_final.pth ← baseline weights (~192 MB)
│ └── config.json ← v1.0.0 hyperparameters
├── v1.1.0/
│ ├── m25csa023_ass_4_best_model.pth ← optimised weights (~216 MB) ← recommended
│ └── config.json ← v1.1.0 hyperparameters + search config
└── vocab/
├── en_vocab.pkl ← English vocabulary (4 117 tokens)
└── hi_vocab.pkl ← Hindi vocabulary (4 044 tokens)
Model Architecture
Built from scratch following Vaswani et al. (2017) - "Attention Is All You Need", no HuggingFace Transformers library used internally.
| Property | Value |
|---|---|
| Architecture | Encoder-Decoder Transformer |
| d_model | 512 |
| num_layers | 6 encoder + 6 decoder |
| num_heads | 8 |
| d_ff | 2048 (v1.0.0) / 2560 (v1.1.0) |
| Dropout | 0.10 (v1.0.0) / 0.081 (v1.1.0) |
| Max sequence length | 50 tokens |
| Positional encoding | Sinusoidal (fixed) |
| Source vocabulary | 4 117 English tokens |
| Target vocabulary | 4 044 Hindi tokens |
| Special tokens | <pad> <sos> <eos> <unk> |
Versions
v1.0.0 - Baseline
Trained for 100 epochs with manually chosen hyperparameters on an NVIDIA A100 80 GB
(BF16 autocast + torch.compile + cudnn.benchmark).
| Hyperparameter | Value |
|---|---|
| Learning rate | 1e-4 |
| Batch size | 60 |
| d_ff | 2048 |
| Dropout | 0.10 |
| Gradient clipping | - |
Results: BLEU 0.7566 · Loss 0.0998 · Training time 12.3 min
v1.1.0 - Ray Tune + Optuna Optimised ✔
Hyperparameters discovered automatically using Ray Tune 2.x with OptunaSearch (TPE) and an ASHA early-stopping scheduler (20 trials, ~65% pruned early).
| Hyperparameter | Optimised Value |
|---|---|
| Learning rate | 1.112e-4 |
| Batch size | 32 |
| d_ff | 2560 |
| Dropout | 0.081 |
| Gradient clipping | max_norm = 1.0 |
Results: BLEU 0.8369 · Loss 0.1264 · Training time 13.5 min · Epochs 50
The winning configuration first surpassed the v1.0.0 BLEU at epoch 10 during the search sweep.
How to Use
1. Clone the repo & install dependencies
git lfs install
git clone https://huggingface.co/priyadip/en-hi-transformer
pip install torch
2. Load a checkpoint
import torch, pickle
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load vocabularies
with open("en-hi-transformer/vocab/en_vocab.pkl", "rb") as f:
en_vocab = pickle.load(f)
with open("en-hi-transformer/vocab/hi_vocab.pkl", "rb") as f:
hi_vocab = pickle.load(f)
# Instantiate the model (Transformer class from the training script)
model = Transformer(
src_vocab_size = len(en_vocab),
tgt_vocab_size = len(hi_vocab),
d_model = 512,
num_layers = 6,
num_heads = 8,
d_ff = 2560, # use 2048 for v1.0.0
max_len = 50,
dropout = 0.081, # use 0.10 for v1.0.0
).to(DEVICE)
# Load weights - pick the version you need
model.load_state_dict(
torch.load("en-hi-transformer/v1.1.0/m25csa023_ass_4_best_model.pth", map_location=DEVICE)
# or: "en-hi-transformer/v1.0.0/transformer_translation_final.pth"
)
model.eval()
3. Translate a sentence
def translate(model, sentence, max_len=50):
tokens = encode_sentence(sentence, en_vocab, max_len)
src = torch.tensor(tokens).unsqueeze(0).to(DEVICE)
tgt = [hi_vocab["<sos>"]]
with torch.no_grad():
for _ in range(max_len):
out = model(src, torch.tensor(tgt).unsqueeze(0).to(DEVICE),
en_vocab["<pad>"], hi_vocab["<pad>"])
nxt = out[0, -1].argmax().item()
tgt.append(nxt)
if nxt == hi_vocab["<eos>"]:
break
return " ".join(hi_vocab.itos[i] for i in tgt[1:-1])
print(translate(model, "How are you?")) # → तुम कैसी हो?
print(translate(model, "I love you.")) # → मैं तुमसे प्यार करती हूँ।
print(translate(model, "What is your name?")) # → आपका नाम क्या है?
Transformerandencode_sentenceare defined in the training script available in the linked GitHub repository.
Sample Outputs (v1.1.0)
| English | Hindi |
|---|---|
| How are you? | तुम कैसी हो? |
| I love you. | मैं तुमसे प्यार करती हूँ। |
| What is your name? | आपका नाम क्या है? |
| The weather is nice today. | आज मौसम अच्छा है। |
| She is a good teacher. | वह अच्छा शिक्षक है। |
Limitations
- Vocabulary of ~4 K tokens; unknown words map to
<unk>. - Optimised for short sentences (≤ 10 words); quality degrades on longer input.
- Greedy decoding - no beam search.
- BLEU evaluated on a small held-out set; treat scores as indicative.
Citation
If you use this model, please cite:
This model:
@misc{en_hi_transformer_2026,
author = {priyadip},
title = {English to Hindi Transformer (v1.0.0 / v1.1.0)},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/priyadip/en-hi-transformer}},
note = {v1.0.0: BLEU 0.7566 / 100 epochs.
v1.1.0: BLEU 0.8369 / 50 epochs via Ray Tune + Optuna (+10.6\%).}
}
Architecture - Attention Is All You Need:
@inproceedings{vaswani2017attention,
title = {Attention Is All You Need},
author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and
Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and
Kaiser, Lukasz and Polosukhin, Illia},
booktitle = {Advances in Neural Information Processing Systems},
volume = {30},
year = {2017},
url = {https://papers.neurips.cc/paper/7181-attention-is-all-you-need.pdf}
}
- Paper: https://papers.neurips.cc/paper/7181-attention-is-all-you-need.pdf
- Papers with Code: https://paperswithcode.com/paper/attention-is-all-you-need
Dataset - Tatoeba:
@misc{tatoeba,
title = {Tatoeba: A multilingual sentence collection},
author = {Tatoeba contributors},
howpublished = {\url{https://tatoeba.org}},
note = {Raw EN-HI export used; 13 186 pairs including multiple
Hindi translations per English sentence.}
}
- Downloads last month
- 9
Dataset used to train priyadip/en-hi-transformer
Evaluation results
- BLEU (NLTK method4 ×100) on Tatoeba EN-HI (raw export, 13186 pairs)self-reported75.660
- BLEU (NLTK method4 ×100) on Tatoeba EN-HI (raw export, 13186 pairs)self-reported83.690
