Parakeet-DE-Med — GGUF (ggml-quantised)

GGUF / ggml conversions of johannhartmann/parakeet_de_med for use with the parakeet-main CLI from CrispStrobe/CrispASR@parakeet.

parakeet_de_med is Johann Hartmann's PEFT decoder+joint fine-tune of nvidia/parakeet-tdt-0.6b-v3 specialised for German medical documentation (Arztbriefe). On the German medical test set it scores 3.28% WER vs the base model's 11.73% — a 72% relative reduction.

The fine-tune freezes the encoder and trains only the TDT decoder + joint head (18.1M out of 627M parameters, 2.89%). This means:

The architecture is identical to parakeet-tdt-0.6b-v3 (24-layer FastConformer encoder, 2-layer LSTM predictor, 8198-class TDT joint head)
The same GGUF converter, runtime, and CLI work as-is
The frozen encoder still uses the base model's auto-language detection — for clean German speech this works well, for accented or noisy audio you may want to fall back to a different runtime (see comparison table)

Files

File	Size	Notes
`parakeet_de_med.gguf`	1.26 GB	F16, full precision
`parakeet_de_med-q8_0.gguf`	711 MB	Q8_0, near-lossless
`parakeet_de_med-q5_0.gguf`	516 MB	Q5_0
`parakeet_de_med-q4_k.gguf`	467 MB	Q4_K — recommended default

All quantisations produce the same text on the German verification clip:

Leider zu spät. Leider zu spät.

Quick start

# 1. Build the runtime
git clone -b parakeet https://github.com/CrispStrobe/CrispASR
cd CrispASR
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc) --target parakeet-main

# 2. Download a quantisation
huggingface-cli download cstr/parakeet_de_med-GGUF \
    parakeet_de_med-q4_k.gguf --local-dir .

# 3. Transcribe German audio
./build/bin/parakeet-main \
    -m parakeet_de_med-q4_k.gguf \
    -f german_audio.wav -t 8

The runtime is the same parakeet-main binary used for the base parakeet-tdt-0.6b-v3. All the usual flags work: -vad-model for Silero VAD slicing, -ck N for fixed chunking, -ml N for max chars per line, -osrt/-ovtt/-ot for subtitle output, -v for per-token timestamps via the TDT duration head.

Word-level timestamps

Like the base parakeet model, this fine-tune emits TDT durations as part of decoding, so word-level timestamps come for free at one encoder frame = 80 ms granularity. No separate forced alignment model needed:

$ ./build/bin/parakeet-main -m parakeet_de_med-q4_k.gguf -f german.wav -t 8 -v
[ 0.32s →  0.64s]  Der
[ 0.64s →  1.04s]  Patient
[ 1.04s →  1.32s]  klagt
[ 1.32s →  1.92s]  über
...

Which runtime should I use?

For German speech specifically:

Use case	Right tool
German medical documentation	`parakeet_de_med-q4_k.gguf` ← this repo
General German ASR with explicit language control	`canary-1b-v2-q4_k.gguf` (`-sl de -tl de`)
German → English translation	`canary-1b-v2-q4_k.gguf` (`-sl de -tl en`)
General multilingual ASR (auto-detect)	`parakeet-tdt-0.6b-v3-q4_k.gguf`
Lowest English WER	`cohere-transcribe-q4_k.gguf`

Architecture (inherited from base)

Component	Details
Encoder	24-layer FastConformer (frozen), d=1024, 8 heads, head_dim=128, FFN=4096, conv kernel=9
Subsampling	Conv2d dw_striding stack, 8× temporal (100 → 12.5 fps)
Predictor	2-layer LSTM, hidden 640, embed 8193 × 640 (fine-tuned)
Joint head	enc(1024 → 640) + pred(640 → 640) → ReLU → linear(640 → 8198) (fine-tuned)
Vocab	8192 SentencePiece tokens (multilingual, but generation biased toward German medical)
Audio	16 kHz mono, 128 mel bins, n_fft=512, hop=160, win=400
Parameters	627M total, 18.1M trained (2.89%)

Attribution

Base model: nvidia/parakeet-tdt-0.6b-v3 (CC-BY-4.0). NVIDIA NeMo team.
Fine-tune: johannhartmann/parakeet_de_med (CC-BY-4.0). Johann Hartmann. Trained on 976 German medical documentation samples for 5 epochs with PEFT decoder+joint strategy.
GGUF conversion + ggml runtime: CrispStrobe/CrispASR@parakeet.

C++ runtime: CrispStrobe/CrispASR@parakeet
Base multilingual model (auto-detect): cstr/parakeet-tdt-0.6b-v3-GGUF
Encoder–decoder companion (canary, with explicit language control + speech translation): cstr/canary-1b-v2-GGUF
Cohere Transcribe (lowest English WER): cstr/cohere-transcribe-03-2026-GGUF

License

CC-BY-4.0, inherited from both the base model and the fine-tune. Use of these GGUF files must comply with the CC-BY-4.0 license including attribution to NVIDIA NeMo team and Johann Hartmann.

Downloads last month: 205

GGUF

Model size

0.6B params

Architecture

parakeet

Hardware compatibility

5-bit

8-bit

View +1 variant

Model tree for cstr/parakeet_de_med-GGUF

Base model

nvidia/parakeet-tdt-0.6b-v3

Finetuned

johannhartmann/parakeet_de_med

Quantized

(1)

this model

cstr
/

parakeet_de_med-GGUF