Whisper Large V3 Turbo Swiss German
This model is a fine-tuned version of openai/whisper-large-v3-turbo for Swiss German automatic speech recognition. It is intended to transcribe Swiss German speech into Standard German text.
The current upload corresponds to the best checkpoint from the March 20, 2026 retraining run: checkpoint-750.
Summary
- Base model:
openai/whisper-large-v3-turbo - Task: Swiss German speech recognition
- Output language: Standard German text
- Best uploaded checkpoint:
checkpoint-750 - Training data for this checkpoint: about
301hours of curated private Swiss German training audio - Training infrastructure: 4x A100 80GB GPUs
- Checkpoint format:
safetensors
Training Data
The training data for this model is private.
For readers, the important part is the data mix and scale:
- about
301hours of training audio were used for this published checkpoint - about
335hours are included across the corresponding train, validation, and test splits - the curated subset is made up primarily of Swiss German parliamentary or other semi-formal speech plus read or prompted Swiss German speech
- transcripts are in Standard German
Because the corpus is private, this model card does not list internal dataset names, split names, or source identifiers. The March 20, 2026 retraining run used a filtered subset of the broader private Swiss German corpus after earlier experiments showed that some internal sources reduced validation quality.
Public corpus references that help explain the broader data provenance are:
- SwissDial Dataset (ETH Zurich): around
24hours across 8 major Swiss German dialects with Swiss German and High German transcripts - Swiss Parliaments Corpus V2 (FHNW):
293hours of Swiss German parliamentary speech with Standard German transcripts - All Swiss German Dialects Test Set (FHNW):
13hours with a dialect distribution intended to be close to real-world Swiss German - ArchiMob Release 2 (UZH / SWISSUbase): a transcribed spoken Swiss German corpus covering linguistic varieties across Switzerland
Those public references are included here for reader context and provenance. They should not be read as a verbatim public listing of the exact filtered private subset used for this published checkpoint.
Why This Checkpoint
The training run improved steadily up to checkpoint-750, then degraded afterward. The uploaded model is therefore the best checkpoint from the run, not the final checkpoint.
Validation trajectory during the successful run:
| Step | WER | Normalized WER |
|---|---|---|
| 250 | 41.05 | 40.21 |
| 500 | 39.63 | 38.86 |
| 750 | 37.96 | 37.25 |
| 1000 | 42.92 | 42.20 |
| 1250 | 43.64 | 42.92 |
| 1500 | 43.64 | 42.89 |
This is why checkpoint-750 is the shipped model.
Comparison To Base Whisper V3 Turbo
The tuned model was compared against the base openai/whisper-large-v3-turbo on a large random sample from the same private training corpus regime used for this retraining run.
Comparison setup:
- split evaluated: training split
- sample size:
16,384
Results:
| Model | WER | Normalized WER |
|---|---|---|
Base openai/whisper-large-v3-turbo |
45.71 | 44.52 |
| This model | 39.18 | 38.48 |
Absolute improvement over base on that sampled training slice:
- WER:
-6.54 - normalized WER:
-6.04
Intended Use
This model is intended for Swiss German ASR workloads where the target transcription is Standard German text.
It is the right version to try if:
- you want a Whisper Turbo model adapted for Swiss German speech
- your audio is reasonably clean conversational or semi-formal speech
- you want a stronger Swiss German starting point than zero-shot base Whisper Turbo
Limitations
- The training data is private, so the reported metrics are self-reported from internal evaluation.
- The reported best validation metric is from the curated private validation slice used for model selection.
- The run overfit after
checkpoint-750; later checkpoints were worse. - Performance can still vary by dialect, speaker population, audio quality, and domain.
Usage
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
model_id = "Flurin17/whisper-large-v3-turbo-swiss-german"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id,
torch_dtype=torch_dtype,
low_cpu_mem_usage=True,
use_safetensors=True,
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
torch_dtype=torch_dtype,
device=device,
)
result = pipe("path/to/audio.wav")
print(result["text"])
Technical Notes
- architecture: Whisper Large V3 Turbo
- framework: PyTorch + Transformers
- optimizer:
adamw_torch_fused - scheduler: cosine
- learning rate:
1e-5 - epochs configured:
5 - model selection: best checkpoint by validation WER
License
This model is distributed under the Creative Commons Attribution-NonCommercial 4.0 license (cc-by-nc-4.0).
- Downloads last month
- 2,732
Model tree for Flurin17/whisper-large-v3-turbo-swiss-german
Base model
openai/whisper-large-v3Spaces using Flurin17/whisper-large-v3-turbo-swiss-german 2
Evaluation results
- Word Error Rate on Private Swiss German validation splitvalidation set self-reported37.963
- Normalized Word Error Rate on Private Swiss German validation splitvalidation set self-reported37.250