English → Hindi Transformer

A from-scratch PyTorch encoder-decoder Transformer for English → Hindi machine translation, trained on a raw Tatoeba EN-HI export (13 186 sentence pairs, including multiple Hindi translations per English sentence).

This repository provides two versioned checkpoints:

Version	Description	BLEU	Epochs	Weights file
v1.0.0	Baseline — fixed hyperparameters	0.7566	100	`v1.0.0/transformer_translation_final.pth`
v1.1.0 ✔ recommended	Ray Tune + Optuna optimised	0.8369	50	`v1.1.0/m25csa023_ass_4_best_model.pth`

v1.1.0 achieves +10.6% BLEU in half the epochs compared to v1.0.0.

Training Summary

(a) Training loss curves - baseline (100 ep) vs tuned (50 ep). (b) BLEU progression across epochs. (c) All 20 Ray Tune trial loss curves (grey = pruned by ASHA, orange = best). (d) Hyperparameter importance (Spearman ρ) - batch size & dropout matter most. (e–g) Scatter plots: LR / dropout / batch size vs final loss across all trials. (h) Final comparison bar chart: time, loss, and BLEU for v1.0.0 vs v1.1.0.

Dataset

Source: Raw export from tatoeba.org/en/downloads - English-Hindi sentence pairs. Note: This is the unprocessed Tatoeba dump, not the Helsinki-NLP filtered version. The file used during training: English-Hindi.tsv

TSV Column Structure

Column	Content	Example
1	English sentence ID (Tatoeba)	`1282`
2	English sentence	`Muiriel is 20 now.`
3	Hindi sentence ID (Tatoeba)	`485968`
4	Hindi sentence	`म्यूरियल अब बीस साल की हो गई है।`

Statistics

Property	Value
Total sentence pairs	13 186
Unique English sentences	11 109 (2 077 have multiple Hindi translations)
Mean English length	5.6 words
Mean Hindi length	6.3 words
Max English length	53 tokens
Max Hindi length	57 tokens
English ID range	1 277 – 12 886 231 (Tatoeba IDs)
Hindi ID range	440 811 – 13 125 624 (Tatoeba IDs)
Tokenisation	Whitespace split, lowercased
Min word frequency (vocab)	2

Repository File Structure

en-hi-transformer/
├── README.md                    ← model card (this page)
├── config.json                  ← shared architecture config
├── assets/
│   └── summary.png              ← training & evaluation plots
├── v1.0.0/
│   ├── transformer_translation_final.pth   ← baseline weights  (~192 MB)
│   └── config.json              ← v1.0.0 hyperparameters
├── v1.1.0/
│   ├── m25csa023_ass_4_best_model.pth      ← optimised weights (~216 MB)  ← recommended
│   └── config.json              ← v1.1.0 hyperparameters + search config
└── vocab/
    ├── en_vocab.pkl             ← English vocabulary (4 117 tokens)
    └── hi_vocab.pkl             ← Hindi vocabulary   (4 044 tokens)

Model Architecture

Built from scratch following Vaswani et al. (2017) - "Attention Is All You Need", no HuggingFace Transformers library used internally.

Property	Value
Architecture	Encoder-Decoder Transformer
d_model	512
num_layers	6 encoder + 6 decoder
num_heads	8
d_ff	2048 (v1.0.0) / 2560 (v1.1.0)
Dropout	0.10 (v1.0.0) / 0.081 (v1.1.0)
Max sequence length	50 tokens
Positional encoding	Sinusoidal (fixed)
Source vocabulary	4 117 English tokens
Target vocabulary	4 044 Hindi tokens
Special tokens	`<pad>` `<sos>` `<eos>` `<unk>`

Versions

v1.0.0 - Baseline

Trained for 100 epochs with manually chosen hyperparameters on an NVIDIA A100 80 GB (BF16 autocast + torch.compile + cudnn.benchmark).

Hyperparameter	Value
Learning rate	1e-4
Batch size	60
d_ff	2048
Dropout	0.10
Gradient clipping	-

Results: BLEU 0.7566 · Loss 0.0998 · Training time 12.3 min

v1.1.0 - Ray Tune + Optuna Optimised ✔

Hyperparameters discovered automatically using Ray Tune 2.x with OptunaSearch (TPE) and an ASHA early-stopping scheduler (20 trials, ~65% pruned early).

Hyperparameter	Optimised Value
Learning rate	1.112e-4
Batch size	32
d_ff	2560
Dropout	0.081
Gradient clipping	max_norm = 1.0

Results: BLEU 0.8369 · Loss 0.1264 · Training time 13.5 min · Epochs 50

The winning configuration first surpassed the v1.0.0 BLEU at epoch 10 during the search sweep.

How to Use

1. Clone the repo & install dependencies

git lfs install
git clone https://huggingface.co/priyadip/en-hi-transformer
pip install torch

2. Load a checkpoint

import torch, pickle

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load vocabularies
with open("en-hi-transformer/vocab/en_vocab.pkl", "rb") as f:
    en_vocab = pickle.load(f)
with open("en-hi-transformer/vocab/hi_vocab.pkl", "rb") as f:
    hi_vocab = pickle.load(f)

# Instantiate the model  (Transformer class from the training script)
model = Transformer(
    src_vocab_size = len(en_vocab),
    tgt_vocab_size = len(hi_vocab),
    d_model  = 512,
    num_layers = 6,
    num_heads  = 8,
    d_ff     = 2560,   # use 2048 for v1.0.0
    max_len  = 50,
    dropout  = 0.081,  # use 0.10 for v1.0.0
).to(DEVICE)

# Load weights  -  pick the version you need
model.load_state_dict(
    torch.load("en-hi-transformer/v1.1.0/m25csa023_ass_4_best_model.pth", map_location=DEVICE)
    # or: "en-hi-transformer/v1.0.0/transformer_translation_final.pth"
)
model.eval()

3. Translate a sentence

def translate(model, sentence, max_len=50):
    tokens = encode_sentence(sentence, en_vocab, max_len)
    src = torch.tensor(tokens).unsqueeze(0).to(DEVICE)
    tgt = [hi_vocab["<sos>"]]
    with torch.no_grad():
        for _ in range(max_len):
            out = model(src, torch.tensor(tgt).unsqueeze(0).to(DEVICE),
                        en_vocab["<pad>"], hi_vocab["<pad>"])
            nxt = out[0, -1].argmax().item()
            tgt.append(nxt)
            if nxt == hi_vocab["<eos>"]:
                break
    return " ".join(hi_vocab.itos[i] for i in tgt[1:-1])

print(translate(model, "How are you?"))         # → तुम कैसी हो?
print(translate(model, "I love you."))          # → मैं तुमसे प्यार करती हूँ।
print(translate(model, "What is your name?"))   # → आपका नाम क्या है?

Transformer and encode_sentence are defined in the training script available in the linked GitHub repository.

Sample Outputs (v1.1.0)

English	Hindi
How are you?	तुम कैसी हो?
I love you.	मैं तुमसे प्यार करती हूँ।
What is your name?	आपका नाम क्या है?
The weather is nice today.	आज मौसम अच्छा है।
She is a good teacher.	वह अच्छा शिक्षक है।

Limitations

Vocabulary of ~4 K tokens; unknown words map to <unk>.
Optimised for short sentences (≤ 10 words); quality degrades on longer input.
Greedy decoding - no beam search.
BLEU evaluated on a small held-out set; treat scores as indicative.

Citation

If you use this model, please cite:

This model:

@misc{en_hi_transformer_2026,
  author       = {priyadip},
  title        = {English to Hindi Transformer (v1.0.0 / v1.1.0)},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/priyadip/en-hi-transformer}},
  note         = {v1.0.0: BLEU 0.7566 / 100 epochs.
                   v1.1.0: BLEU 0.8369 / 50 epochs via Ray Tune + Optuna (+10.6\%).}
}

Architecture - Attention Is All You Need:

@inproceedings{vaswani2017attention,
  title     = {Attention Is All You Need},
  author    = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and
               Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and
               Kaiser, Lukasz and Polosukhin, Illia},
  booktitle = {Advances in Neural Information Processing Systems},
  volume    = {30},
  year      = {2017},
  url       = {https://papers.neurips.cc/paper/7181-attention-is-all-you-need.pdf}
}

Paper: https://papers.neurips.cc/paper/7181-attention-is-all-you-need.pdf
Papers with Code: https://paperswithcode.com/paper/attention-is-all-you-need

Dataset - Tatoeba:

@misc{tatoeba,
  title        = {Tatoeba: A multilingual sentence collection},
  author       = {Tatoeba contributors},
  howpublished = {\url{https://tatoeba.org}},
  note         = {Raw EN-HI export used; 13 186 pairs including multiple
                   Hindi translations per English sentence.}
}

Downloads last month: 9

Dataset used to train priyadip/en-hi-transformer

Evaluation results

BLEU (NLTK method4 ×100) on Tatoeba EN-HI (raw export, 13186 pairs)
self-reported

75.660
BLEU (NLTK method4 ×100) on Tatoeba EN-HI (raw export, 13186 pairs)
self-reported

83.690