For usage instructions follow openai/whisper-large-v3-turbo

Turbo finetune with japanese tokenizer. Trained ~60M sequences with model progressively unfrozen from embeddings, decoder, full. Smaller vocab with ~1.6x bytes/token allows faster speed with 4 layers (10% larger decoder) vs 2 layer distil.

Quality bad. SOTA in short form general japanese but long form degraded too much and hallucination problems. I rescued it a little from a much worse state but probably gone too far to fully fix. (Reazon needs filtering)

Note for faster-whisper vocab changes make model.is_multilingual and suppress_tokens wrong. You shouldn't be using this with faster-whisper as long form is bad, but if you do please adjust the code as required.

Acknowledgements

  • Train sets: OOPPEENN, Reazon, Common Voice 20, ๅฐ่™ซๅ“ฅ_, deepghs
  • Test sets: KitsuneX07, TEDxJP, kotoba-tech, Saruwatari-lab, grider-withourai
Downloads last month
7
Safetensors
Model size
0.8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using efwkjn/whisper-ja-anime-v0.2 1