English → Hindi Transformer

A from-scratch PyTorch encoder-decoder Transformer for English → Hindi machine translation, trained on a raw Tatoeba EN-HI export (13 186 sentence pairs, including multiple Hindi translations per English sentence).

This repository provides two versioned checkpoints:

Version Description BLEU Epochs Weights file
v1.0.0 Baseline — fixed hyperparameters 0.7566 100 v1.0.0/transformer_translation_final.pth
v1.1.0recommended Ray Tune + Optuna optimised 0.8369 50 v1.1.0/m25csa023_ass_4_best_model.pth

v1.1.0 achieves +10.6% BLEU in half the epochs compared to v1.0.0.


Training Summary

Training & Evaluation Summary

(a) Training loss curves - baseline (100 ep) vs tuned (50 ep). (b) BLEU progression across epochs. (c) All 20 Ray Tune trial loss curves (grey = pruned by ASHA, orange = best). (d) Hyperparameter importance (Spearman ρ) - batch size & dropout matter most. (e–g) Scatter plots: LR / dropout / batch size vs final loss across all trials. (h) Final comparison bar chart: time, loss, and BLEU for v1.0.0 vs v1.1.0.


Dataset

Source: Raw export from tatoeba.org/en/downloads - English-Hindi sentence pairs. Note: This is the unprocessed Tatoeba dump, not the Helsinki-NLP filtered version. The file used during training: English-Hindi.tsv

TSV Column Structure

Column Content Example
1 English sentence ID (Tatoeba) 1282
2 English sentence Muiriel is 20 now.
3 Hindi sentence ID (Tatoeba) 485968
4 Hindi sentence म्यूरियल अब बीस साल की हो गई है।

Statistics

Property Value
Total sentence pairs 13 186
Unique English sentences 11 109 (2 077 have multiple Hindi translations)
Mean English length 5.6 words
Mean Hindi length 6.3 words
Max English length 53 tokens
Max Hindi length 57 tokens
English ID range 1 277 – 12 886 231 (Tatoeba IDs)
Hindi ID range 440 811 – 13 125 624 (Tatoeba IDs)
Tokenisation Whitespace split, lowercased
Min word frequency (vocab) 2

Repository File Structure

en-hi-transformer/
├── README.md                    ← model card (this page)
├── config.json                  ← shared architecture config
├── assets/
│   └── summary.png              ← training & evaluation plots
├── v1.0.0/
│   ├── transformer_translation_final.pth   ← baseline weights  (~192 MB)
│   └── config.json              ← v1.0.0 hyperparameters
├── v1.1.0/
│   ├── m25csa023_ass_4_best_model.pth      ← optimised weights (~216 MB)  ← recommended
│   └── config.json              ← v1.1.0 hyperparameters + search config
└── vocab/
    ├── en_vocab.pkl             ← English vocabulary (4 117 tokens)
    └── hi_vocab.pkl             ← Hindi vocabulary   (4 044 tokens)

Model Architecture

Built from scratch following Vaswani et al. (2017) - "Attention Is All You Need", no HuggingFace Transformers library used internally.

Property Value
Architecture Encoder-Decoder Transformer
d_model 512
num_layers 6 encoder + 6 decoder
num_heads 8
d_ff 2048 (v1.0.0) / 2560 (v1.1.0)
Dropout 0.10 (v1.0.0) / 0.081 (v1.1.0)
Max sequence length 50 tokens
Positional encoding Sinusoidal (fixed)
Source vocabulary 4 117 English tokens
Target vocabulary 4 044 Hindi tokens
Special tokens <pad> <sos> <eos> <unk>

Versions

v1.0.0 - Baseline

Trained for 100 epochs with manually chosen hyperparameters on an NVIDIA A100 80 GB (BF16 autocast + torch.compile + cudnn.benchmark).

Hyperparameter Value
Learning rate 1e-4
Batch size 60
d_ff 2048
Dropout 0.10
Gradient clipping -

Results: BLEU 0.7566 · Loss 0.0998 · Training time 12.3 min


v1.1.0 - Ray Tune + Optuna Optimised ✔

Hyperparameters discovered automatically using Ray Tune 2.x with OptunaSearch (TPE) and an ASHA early-stopping scheduler (20 trials, ~65% pruned early).

Hyperparameter Optimised Value
Learning rate 1.112e-4
Batch size 32
d_ff 2560
Dropout 0.081
Gradient clipping max_norm = 1.0

Results: BLEU 0.8369 · Loss 0.1264 · Training time 13.5 min · Epochs 50

The winning configuration first surpassed the v1.0.0 BLEU at epoch 10 during the search sweep.


How to Use

1. Clone the repo & install dependencies

git lfs install
git clone https://huggingface.co/priyadip/en-hi-transformer
pip install torch

2. Load a checkpoint

import torch, pickle

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load vocabularies
with open("en-hi-transformer/vocab/en_vocab.pkl", "rb") as f:
    en_vocab = pickle.load(f)
with open("en-hi-transformer/vocab/hi_vocab.pkl", "rb") as f:
    hi_vocab = pickle.load(f)

# Instantiate the model  (Transformer class from the training script)
model = Transformer(
    src_vocab_size = len(en_vocab),
    tgt_vocab_size = len(hi_vocab),
    d_model  = 512,
    num_layers = 6,
    num_heads  = 8,
    d_ff     = 2560,   # use 2048 for v1.0.0
    max_len  = 50,
    dropout  = 0.081,  # use 0.10 for v1.0.0
).to(DEVICE)

# Load weights  -  pick the version you need
model.load_state_dict(
    torch.load("en-hi-transformer/v1.1.0/m25csa023_ass_4_best_model.pth", map_location=DEVICE)
    # or: "en-hi-transformer/v1.0.0/transformer_translation_final.pth"
)
model.eval()

3. Translate a sentence

def translate(model, sentence, max_len=50):
    tokens = encode_sentence(sentence, en_vocab, max_len)
    src = torch.tensor(tokens).unsqueeze(0).to(DEVICE)
    tgt = [hi_vocab["<sos>"]]
    with torch.no_grad():
        for _ in range(max_len):
            out = model(src, torch.tensor(tgt).unsqueeze(0).to(DEVICE),
                        en_vocab["<pad>"], hi_vocab["<pad>"])
            nxt = out[0, -1].argmax().item()
            tgt.append(nxt)
            if nxt == hi_vocab["<eos>"]:
                break
    return " ".join(hi_vocab.itos[i] for i in tgt[1:-1])

print(translate(model, "How are you?"))         # → तुम कैसी हो?
print(translate(model, "I love you."))          # → मैं तुमसे प्यार करती हूँ।
print(translate(model, "What is your name?"))   # → आपका नाम क्या है?

Transformer and encode_sentence are defined in the training script available in the linked GitHub repository.


Sample Outputs (v1.1.0)

English Hindi
How are you? तुम कैसी हो?
I love you. मैं तुमसे प्यार करती हूँ।
What is your name? आपका नाम क्या है?
The weather is nice today. आज मौसम अच्छा है।
She is a good teacher. वह अच्छा शिक्षक है।

Limitations

  • Vocabulary of ~4 K tokens; unknown words map to <unk>.
  • Optimised for short sentences (≤ 10 words); quality degrades on longer input.
  • Greedy decoding - no beam search.
  • BLEU evaluated on a small held-out set; treat scores as indicative.

Citation

If you use this model, please cite:

This model:

@misc{en_hi_transformer_2026,
  author       = {priyadip},
  title        = {English to Hindi Transformer (v1.0.0 / v1.1.0)},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/priyadip/en-hi-transformer}},
  note         = {v1.0.0: BLEU 0.7566 / 100 epochs.
                   v1.1.0: BLEU 0.8369 / 50 epochs via Ray Tune + Optuna (+10.6\%).}
}

Architecture - Attention Is All You Need:

@inproceedings{vaswani2017attention,
  title     = {Attention Is All You Need},
  author    = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and
               Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and
               Kaiser, Lukasz and Polosukhin, Illia},
  booktitle = {Advances in Neural Information Processing Systems},
  volume    = {30},
  year      = {2017},
  url       = {https://papers.neurips.cc/paper/7181-attention-is-all-you-need.pdf}
}

Dataset - Tatoeba:

@misc{tatoeba,
  title        = {Tatoeba: A multilingual sentence collection},
  author       = {Tatoeba contributors},
  howpublished = {\url{https://tatoeba.org}},
  note         = {Raw EN-HI export used; 13 186 pairs including multiple
                   Hindi translations per English sentence.}
}
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train priyadip/en-hi-transformer

Evaluation results

  • BLEU (NLTK method4 ×100) on Tatoeba EN-HI (raw export, 13186 pairs)
    self-reported
    75.660
  • BLEU (NLTK method4 ×100) on Tatoeba EN-HI (raw export, 13186 pairs)
    self-reported
    83.690