Model Description

This model was fine-tuned on over 98 hours of transcribed upper sorbian speech, including colloquial speech. Part of the corpus was augmented to create additional training data of 89 hours.

Training Data

Training Details

  • Hyperparameters:
    • Batch size: 8
    • Gradient accumulation steps: 4
    • Learning rate: 5e-6, linear decay
  • Warmup: 2000 steps
  • Additional Techniques: BF16 training, initial 15 layers frozen

Performance

Metrics

  • Model checkpoint: 8000
  • Word Error Rate: 4.5

For a later checkpoint with better WER but worse robustness to noise, check this branch: https://huggingface.co/zalozbadev/whisper-large-v3-turbo-hsb-aug/tree/longest_trained

Usage

  • Specify transcription language "czech" (model was finetuned this way)

Model Details

  • Model Name: zalozbadev/whisper-large-v3-turbo-hsb-aug
  • Publisher: Załožba za serbski lud
  • Model Version: 1.0.0
  • Model Date: 2026-16-01
  • License: CC-BY-4.0
  • Architecture: Whisper Large v3 Turbo
  • Task: Automatic Speech Recognition
Downloads last month
1
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zalozbadev/whisper-large-v3-turbo-hsb-aug

Finetuned
(512)
this model