Whisper Medium Uzbek v1

Developed by Akbarali Salohiddinov

Uzbek Automatic Speech Recognition (ASR) model fine-tuned from Whisper Medium.

Model page: https://huggingface.co/akbaralihah/uzbek-stt

Model Description

  • Base Model: OpenAI Whisper Medium (769M parameters)
  • Language: Uzbek (uz)
  • Training Data: ~1,600 hours of Uzbek audio
  • Precision: BF16
  • Script: Latin (supports Russian loanwords written in Latin, e.g. brat, davay, prosto)

Evaluation Results

Category WER
Overall 16.7%
Clean Speech ~6-11%
Noisy/Augmented ~12-24%
Dialects ~16-25%

Evaluated on 1,864 samples across 8 diverse test sets.

Usage

Transformers

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa

processor = WhisperProcessor.from_pretrained("akbaralihah/uzbek-stt")
model = WhisperForConditionalGeneration.from_pretrained("akbaralihah/uzbek-stt")

audio, _ = librosa.load("audio.wav", sr=16000)
input_features = processor(
    audio,
    sampling_rate=16000,
    return_tensors="pt"
).input_features

predicted_ids = model.generate(
    input_features,
    language="uz",
    task="transcribe"
)
transcription = processor.batch_decode(
    predicted_ids,
    skip_special_tokens=True
)[0]
print(transcription)

Pipeline

from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="akbaralihah/uzbek-stt",
    chunk_length_s=30,
    device="cuda"
)

result = pipe(
    "audio.wav",
    generate_kwargs={"language": "uz", "task": "transcribe"}
)
print(result["text"])

Training

Trained in 3 stages using curriculum learning:

Stage Hours
Foundation 725h
Robustness 394h
Domain Adaptation 474h

Intended Use

  • Uzbek speech-to-text transcription
  • Voice assistants and dictation
  • Media transcription and subtitling

Limitations

  • Performance degrades on very noisy audio
  • May struggle with heavy code-switching
  • Optimized for Uzbek only

License

Apache 2.0

Downloads last month
15
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for akbaralihah/uzbek-stt

Finetuned
(852)
this model

Evaluation results