Automatic Speech Recognition
PEFT
Safetensors
Tamil
whisper
tamil
indic
lora
script-fidelity

Praxy-STT-TA-r2: Whisper-large-v3 + Per-Language LoRA for Tamil

Companion to the paper The TTS↔STT Flywheel: Synthetic Entity-Dense Audio Closes the Indic ASR Gap Where Commercial and Open-Source Systems Fail (preprint forthcoming).

This is the Whisper-large-v3-base LoRA variant trained on the EDSA entity-dense corpus. The paper's headline-recommended model is Praxel/praxy-stt-ta-rb (vasista22 base) which has substantially better read-prose performance.

⚠️ Honest status: contraindicated for production

Same finding as Hindi: vanilla Whisper-large-v3 already has SFR $\geq 0.98$ on Tamil (no Script Collapse to fix). The r2 LoRA causes net regressions on read-prose holdouts (paper §5.3, Table 3):

Holdout Vanilla v3 WER → Ta-r2 WER
FLEURS-Ta 0.56 → 0.75
CV25-Ta 0.67 → 0.89
IV-Ta 0.82 → 0.98

Use Praxel/praxy-stt-ta-rb instead (vasista22 base + EDSA LoRA) for entity-dense Tamil.

Published for reproducibility of the language-conditional finding (paper §5.3); not for production use.

Usage

from transformers import WhisperForConditionalGeneration, WhisperProcessor
from peft import PeftModel
import torch, librosa

base = "openai/whisper-large-v3"
processor = WhisperProcessor.from_pretrained(base, language="tamil", task="transcribe")
model = WhisperForConditionalGeneration.from_pretrained(base, torch_dtype=torch.bfloat16).to("cuda")
model.generation_config.language = "tamil"
model.generation_config.task = "transcribe"
model.generation_config.forced_decoder_ids = None
model.generation_config.suppress_tokens = []

model = PeftModel.from_pretrained(model, "Praxel/praxy-stt-ta-r2")
model.eval()

audio, _ = librosa.load("path/to/audio.wav", sr=16000, mono=True)
feats = processor.feature_extractor(audio, sampling_rate=16000, return_tensors="pt").input_features.to("cuda", dtype=torch.bfloat16)
pred_ids = model.generate(feats, max_new_tokens=400, num_beams=1, language="tamil", task="transcribe")
print(processor.tokenizer.decode(pred_ids[0], skip_special_tokens=True).strip())

Training

LoRA rank 16, alpha 32, on q/k/v/out_proj of openai/whisper-large-v3. 6000 steps Modal A10G. Per-language decoder prefix <|ta|> (no Hindi-proxy). Pin chain: transformers==4.49.0, peft>=0.13, torch==2.4.0.

Companion artefacts

License: Apache-2.0.

Downloads last month
36
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Praxel/praxy-stt-ta-r2

Adapter
(204)
this model

Datasets used to train Praxel/praxy-stt-ta-r2

Papers for Praxel/praxy-stt-ta-r2