Parakeet TDT 0.6B v2 β€” GGUF (ggml-quantised)

GGUF / ggml conversions of nvidia/parakeet-tdt-0.6b-v2 for use with the crispasr CLI from CrispStrobe/CrispASR.

Parakeet TDT 0.6B v2 is NVIDIA's English-only 600 M-parameter ASR model β€” the original Open ASR Leaderboard topper before v3 spread capacity across 25 European languages. On plain English, v2 is often stronger than v3 since it didn't have to share encoder capacity with 24 other languages.

  • English-only, mixed-case + punctuation output
  • Built-in word-level timestamps from the TDT (Token-and-Duration Transducer) decoder β€” no separate CTC alignment model required
  • CC-BY-4.0 licence (friendlier than most ASR models)

This repo provides three quantisations, all converted from the same .nemo checkpoint via the convert-parakeet-to-gguf.py script and quantised with crispasr-quantize.

Files

File Size Notes
parakeet-tdt-0.6b-v2.gguf 1.24 GB F16, full precision
parakeet-tdt-0.6b-v2-q8_0.gguf 735 MB Q8_0, near-lossless
parakeet-tdt-0.6b-v2-q4_k.gguf 468 MB Q4_K β€” recommended default

All three precisions produce the same text on samples/jfk.wav:

And so, my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

Quick Start

# 1. Build the runtime
git clone https://github.com/CrispStrobe/CrispASR
cd CrispASR
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc) --target crispasr

# 2a. Auto-download via the registry key
./build/bin/crispasr -m parakeet-v2 --auto-download -f your-audio.wav

# 2b. Or explicit download + load
hf download cstr/parakeet-tdt-0.6b-v2-GGUF \
    parakeet-tdt-0.6b-v2-q4_k.gguf --local-dir .
./build/bin/crispasr -m parakeet-tdt-0.6b-v2-q4_k.gguf -f your-audio.wav

When to pick v2 over v3

Scenario Pick
English only, want best WER v2 (this repo)
Multilingual, 25 EU languages v3 β€” cstr/parakeet-tdt-0.6b-v3-GGUF
Tight RAM, English smaller hybrid β€” cstr/parakeet-tdt_ctc-110m-GGUF
Long-tail English vocab, willing to pay 2x compute larger β€” cstr/parakeet-tdt-1.1b-GGUF

Model architecture

Component Details
Encoder 24-layer FastConformer, d=1024, 8 heads, head_dim=128, FFN=4096, conv kernel=9
Subsampling Conv2d dw_striding stack, 8Γ— temporal (50 β†’ 12.5 fps)
Predictor 2-layer LSTM, hidden 640
Joint head enc(1024 β†’ 640) + pred(640 β†’ 640) β†’ ReLU β†’ linear(640 β†’ 1029)
Vocab 1024 SentencePiece tokens (English, mixed case + punctuation)
Audio 16 kHz mono, 128 mel bins, n_fft=512, hop=160, win=400
Parameters ~600 M

Same FastConformer encoder + TDT decoder as v3 β€” just trained on English-only data with an English-only BPE.

How this was made

  1. The .nemo checkpoint was unpacked, NeMo state-dict keys were remapped to ggml-friendly names, and weights were written to GGUF F16 (matmul tensors) + F32 (norms / biases / mel filterbank).
  2. Quantised variants are produced by crispasr-quantize (the same llama.cpp-style quantiser used for the other GGUF releases).
  3. Inference uses src/parakeet.{h,cpp}: FastConformer encoder runs as a single ggml graph (BN folded out), LSTM predictor + joint head run as CPU F32 loops, TDT greedy decode alternates "advance encoder frame" / "emit token + advance predictor" using the duration head's argmax.

Attribution

License

CC-BY-4.0, inherited from the base model. Use of these GGUF files must comply with the CC-BY-4.0 license including attribution.

Downloads last month
170
GGUF
Model size
0.6B params
Architecture
parakeet
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cstr/parakeet-tdt-0.6b-v2-GGUF

Quantized
(14)
this model