You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Whisper Large v3 German - Faster Whisper

Overview

This repository contains a high-performance German speech recognition model based on OpenAI's Whisper Large v3 architecture. The model has been optimized using CTranslate2 for faster inference and reduced memory usage, making it ideal for production deployments.

Model Details

  • Architecture: Whisper Large v3
  • Language: Multilingual and German (de) [primary data trained on]
  • Parameters: ~1.5B
  • Format: CTranslate2 optimized (int8_float16)
  • License: cc-by-nc-4.0 CC BY-NC 4.0

While this model is optimized for German, it can also transcribe multiple languages supported by Whisper Large v3 (French, Spanish, Italian, etc.), though accuracy may vary depending on the language.

Original Model

This model is based on the work from primeline/whisper-large-v3-german and has been converted to CTranslate2 format for optimal performance with faster-whisper.

Performance

The model achieves state-of-the-art performance on German speech recognition tasks with a Word Error Rate (WER) of 2.628% on comprehensive test datasets (inherited from the base model).

User Benchmark: NVIDIA GeForce RTX 4070 Laptop GPU

Metric Value
Audio Duration 198.16 seconds
Transcription Time 5.83 seconds
Real-time Factor (RTF) 0.0294
Speedup ~34x faster than real-time
Compute Type int8_float16
Batch Size 1

Note: The model is extremely efficient, processing over 3 minutes of audio in under 6 seconds.

Use Cases

This model is designed for various German speech recognition applications:

  • Real-time Transcription: Live audio transcription for meetings, lectures, and conferences
  • Media Processing: Automatic subtitle generation for German video content
  • Voice Assistants: Speech-to-text conversion for voice-controlled applications
  • Call Center Analytics: Transcription and analysis of customer service calls
  • Accessibility Tools: Converting spoken German to text for hearing-impaired users
  • Document Creation: Voice-to-text dictation for content creation

Installation and Usage

Prerequisites

pip install faster-whisper

Basic Usage

from faster_whisper import WhisperModel

# Load the model
model = WhisperModel(
    "TheChola/whisper-large-v3-german-faster-whisper",
    device="cuda",
    compute_type="int8_float16"
)

# Transcribe audio file
segments, info = model.transcribe("audio.mp3", language=None, task="transcribe")

print(f"Detected language: {info.language} (probability: {info.language_probability:.2f})")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

Advanced Usage (Optimized)

segments, info = model.transcribe(
    "audio.mp3",
    language=None,                  # Auto-detect language
    task="transcribe",
    beam_size=5,
    vad_filter=True,                # Filter silence
    vad_parameters=dict(min_silence_duration_ms=500),
    condition_on_previous_text=False # Prevent loops/skipping
)

Model Files

This repository contains the following files:

  • model.bin - Main model weights in CTranslate2 format
  • config.json - Model configuration
  • tokenizer.json - Tokenizer configuration
  • vocabulary.json - Vocabulary mapping
  • preprocessor_config.json - Preprocessing details

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Reality-Interface/whisper-large-v3-german-faster-whisper

Finetuned
(813)
this model