Parakeet-TDT-0.6B Dutch

A Dutch automatic speech recognition (ASR) model fine-tuned from nvidia/parakeet-tdt-0.6b-v3.

Model Details

Property Value
Base model nvidia/parakeet-tdt-0.6b-v3
Architecture FastConformer-TDT (600M params)
Language Dutch (nl)
Input 16 kHz mono audio
Output Dutch text with punctuation and capitalization
License CC-BY-4.0

Evaluation Results

Evaluated on Common Voice 17.0 Dutch splits (raw text, no normalization):

Split WER CER Samples
Validation 3.73% 1.02% 9,062
Test 5.33% 1.46% 11,266

Training

Fine-tuned on a combination of:

Training Configuration

Parameter Value
Optimizer AdamW
Learning rate 5e-5 (cosine annealing)
Warmup 10% of total steps
Batch size 64
Precision bf16-mixed
Gradient clipping 1.0
Early stopping 10 epochs patience on val WER
Best epoch 21

Usage

Installation

pip install nemo_toolkit[asr]

Transcribe Audio

import nemo.collections.asr as nemo_asr

# Load model
asr_model = nemo_asr.models.ASRModel.from_pretrained(
    model_name="yuriyvnv/parakeet-tdt-0.6b-dutch"
)

# Transcribe
output = asr_model.transcribe(["audio.wav"])
print(output[0].text)

Transcribe with Timestamps

output = asr_model.transcribe(["audio.wav"], timestamps=True)

for stamp in output[0].timestamp["segment"]:
    print(f"{stamp['start']:.1f}s - {stamp['end']:.1f}s : {stamp['segment']}")

Long-Form Audio

For audio longer than 24 minutes, enable local attention:

asr_model.change_attention_model(
    self_attention_model="rel_pos_local_attn",
    att_context_size=[256, 256],
)
output = asr_model.transcribe(["long_audio.wav"])

Intended Use

This model is designed for transcribing Dutch speech to text. It works best on:

  • Read speech and conversational Dutch
  • Audio recorded at 16 kHz or higher
  • Segments up to 24 minutes (or longer with local attention enabled)

Limitations

  • Trained primarily on European Portuguese-accented Dutch from Common Voice; performance may vary on regional dialects or heavily accented speech
  • Synthetic training data was generated with OpenAI TTS voices, which may not fully represent natural speech variability
  • Not suitable for real-time streaming without additional configuration
Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yuriyvnv/parakeet-tdt-0.6b-dutch

Finetuned
(35)
this model
Quantizations
5 models

Datasets used to train yuriyvnv/parakeet-tdt-0.6b-dutch

Collection including yuriyvnv/parakeet-tdt-0.6b-dutch

Evaluation results