Praxy-STT-TA-rb: Entity-Dense Tamil ASR via TTS↔STT Flywheel

LoRA adapter on top of vasista22/whisper-tamil-large-v2 trained on the EDSA (Entity-Dense Synthetic Audio) corpus.

Headline results (entity-dense Tamil, Cartesia held-out)

System	EHR
vasista22 (open SOTA)	0.025
Deepgram Nova-3 (commercial)	0.025
Praxy-STT-TA-rb (this model)	0.543

= 22× over open SOTA, 22× over commercial — both vasista22 and Deepgram completely fail (EHR 0.025) on Tamil entity-dense; β-Ta is the cleanest demonstration of the flywheel value.

Read-prose regression vs vasista22 base: +9 pp on FLEURS, +3 pp on CV25, tied on IV.

Usage

from transformers import WhisperForConditionalGeneration, WhisperProcessor
from peft import PeftModel
import torch, librosa

base_model = "vasista22/whisper-tamil-large-v2"
processor = WhisperProcessor.from_pretrained(base_model, language="tamil", task="transcribe")
model = WhisperForConditionalGeneration.from_pretrained(base_model, torch_dtype=torch.bfloat16).to("cuda")
forced = processor.tokenizer.get_decoder_prompt_ids(language="tamil", task="transcribe")
model.config.forced_decoder_ids = forced
model.generation_config.forced_decoder_ids = forced
model.generation_config.suppress_tokens = []

model = PeftModel.from_pretrained(model, "Praxel/praxy-stt-ta-rb")
model.eval()

audio, _ = librosa.load("path/to/audio.wav", sr=16000, mono=True)
feats = processor.feature_extractor(audio, sampling_rate=16000, return_tensors="pt").input_features.to("cuda", dtype=torch.bfloat16)
pred_ids = model.generate(feats, max_new_tokens=400, num_beams=1)
print(processor.tokenizer.decode(pred_ids[0], skip_special_tokens=True).strip())

Training

LoRA rank 16, alpha 32, on q/k/v/out_proj of vasista22/whisper-tamil-large-v2. 4000 steps Modal A10G. Cartesia rows held out for evaluation. Pin chain: transformers==4.36.2, peft==0.10.0.