Whisper Large V3 Turbo Swiss German

This model is a fine-tuned version of openai/whisper-large-v3-turbo for Swiss German automatic speech recognition. It is intended to transcribe Swiss German speech into Standard German text.

The current upload corresponds to the best checkpoint from the March 20, 2026 retraining run: checkpoint-750.

Summary

  • Base model: openai/whisper-large-v3-turbo
  • Task: Swiss German speech recognition
  • Output language: Standard German text
  • Best uploaded checkpoint: checkpoint-750
  • Training data for this checkpoint: about 301 hours of curated private Swiss German training audio
  • Training infrastructure: 4x A100 80GB GPUs
  • Checkpoint format: safetensors

Training Data

The training data for this model is private.

For readers, the important part is the data mix and scale:

  • about 301 hours of training audio were used for this published checkpoint
  • about 335 hours are included across the corresponding train, validation, and test splits
  • the curated subset is made up primarily of Swiss German parliamentary or other semi-formal speech plus read or prompted Swiss German speech
  • transcripts are in Standard German

Because the corpus is private, this model card does not list internal dataset names, split names, or source identifiers. The March 20, 2026 retraining run used a filtered subset of the broader private Swiss German corpus after earlier experiments showed that some internal sources reduced validation quality.

Public corpus references that help explain the broader data provenance are:

Those public references are included here for reader context and provenance. They should not be read as a verbatim public listing of the exact filtered private subset used for this published checkpoint.

Why This Checkpoint

The training run improved steadily up to checkpoint-750, then degraded afterward. The uploaded model is therefore the best checkpoint from the run, not the final checkpoint.

Validation trajectory during the successful run:

Step WER Normalized WER
250 41.05 40.21
500 39.63 38.86
750 37.96 37.25
1000 42.92 42.20
1250 43.64 42.92
1500 43.64 42.89

This is why checkpoint-750 is the shipped model.

Comparison To Base Whisper V3 Turbo

The tuned model was compared against the base openai/whisper-large-v3-turbo on a large random sample from the same private training corpus regime used for this retraining run.

Comparison setup:

  • split evaluated: training split
  • sample size: 16,384

Results:

Model WER Normalized WER
Base openai/whisper-large-v3-turbo 45.71 44.52
This model 39.18 38.48

Absolute improvement over base on that sampled training slice:

  • WER: -6.54
  • normalized WER: -6.04

Intended Use

This model is intended for Swiss German ASR workloads where the target transcription is Standard German text.

It is the right version to try if:

  • you want a Whisper Turbo model adapted for Swiss German speech
  • your audio is reasonably clean conversational or semi-formal speech
  • you want a stronger Swiss German starting point than zero-shot base Whisper Turbo

Limitations

  • The training data is private, so the reported metrics are self-reported from internal evaluation.
  • The reported best validation metric is from the curated private validation slice used for model selection.
  • The run overfit after checkpoint-750; later checkpoints were worse.
  • Performance can still vary by dialect, speaker population, audio quality, and domain.

Usage

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

model_id = "Flurin17/whisper-large-v3-turbo-swiss-german"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id,
    torch_dtype=torch_dtype,
    low_cpu_mem_usage=True,
    use_safetensors=True,
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch_dtype,
    device=device,
)

result = pipe("path/to/audio.wav")
print(result["text"])

Technical Notes

  • architecture: Whisper Large V3 Turbo
  • framework: PyTorch + Transformers
  • optimizer: adamw_torch_fused
  • scheduler: cosine
  • learning rate: 1e-5
  • epochs configured: 5
  • model selection: best checkpoint by validation WER

License

This model is distributed under the Creative Commons Attribution-NonCommercial 4.0 license (cc-by-nc-4.0).

Downloads last month
2,732
Safetensors
Model size
0.8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 1 Ask for provider support

Model tree for Flurin17/whisper-large-v3-turbo-swiss-german

Finetuned
(512)
this model
Finetunes
1 model
Quantizations
1 model

Spaces using Flurin17/whisper-large-v3-turbo-swiss-german 2

Evaluation results