Parakeet TDT+CTC 1.1B β€” GGUF (ggml-quantised)

GGUF / ggml conversions of nvidia/parakeet-tdt_ctc-1.1b for use with the crispasr CLI from CrispStrobe/CrispASR.

The largest hybrid Parakeet β€” 1.1 B parameters, 42-layer FastConformer encoder with both TDT and CTC heads. The hybrid head gives you two decode strategies on the same encoder: native TDT word timestamps (default), or CTC if you need shallow-fusion biasing.

  • English, mixed-case + punctuation output (vocab includes uppercase + punctuation tokens, unlike the pure parakeet-tdt-1.1b)
  • Hybrid TDT+CTC β€” default decode is TDT; pass --parakeet-decoder ctc for the CTC head
  • CC-BY-4.0 licence

This repo provides three quantisations, all converted from the same .nemo checkpoint via the convert-parakeet-to-gguf.py script and quantised with crispasr-quantize.

Files

File Size Notes
parakeet-tdt_ctc-1.1b.gguf 2.15 GB F16, full precision
parakeet-tdt_ctc-1.1b-q8_0.gguf 1.27 GB Q8_0, near-lossless
parakeet-tdt_ctc-1.1b-q4_k.gguf 810 MB Q4_K β€” recommended default

Smoke test on samples/jfk.wav (11 s clip, M1 Metal):

Quant Time Realtime Output
F16 0.74 s 14.8Γ— "And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country."
Q8_0 2.12 s 5.2Γ— (identical)
Q4_K 2.67 s 4.1Γ— (identical)

Note: this checkpoint's Q4_K/Q8_0 run slower than the pure parakeet-tdt-1.1b quants on M1 (CTC + TDT both wired in, plus a per-tensor q4_0 fallback on the joint head). F16 is the fastest precision here.

Quick Start

# 1. Build the runtime
git clone https://github.com/CrispStrobe/CrispASR
cd CrispASR
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc) --target crispasr

# 2a. Auto-download via the registry key
./build/bin/crispasr -m parakeet-tdt_ctc-1.1b --auto-download -f your-audio.wav

# 2b. Or explicit download + load
hf download cstr/parakeet-tdt_ctc-1.1b-GGUF \
    parakeet-tdt_ctc-1.1b-q4_k.gguf --local-dir .
./build/bin/crispasr -m parakeet-tdt_ctc-1.1b-q4_k.gguf -f your-audio.wav

# 2c. Switch to the CTC head (e.g. when adding hotword biasing)
./build/bin/crispasr -m parakeet-tdt_ctc-1.1b --parakeet-decoder ctc -f your-audio.wav

When to pick this over the other Parakeet variants

Scenario Pick
English 1.1B with proper casing + punctuation in output tdt_ctc-1.1b (this repo)
English 1.1B, lowercase output, faster Q4_K/Q8_0 cstr/parakeet-tdt-1.1b-GGUF
English, best WER per FLOP cstr/parakeet-tdt-0.6b-v2-GGUF
Multilingual (25 EU languages) cstr/parakeet-tdt-0.6b-v3-GGUF
Tight RAM cstr/parakeet-tdt_ctc-110m-GGUF

Model architecture

Component Details
Encoder 42-layer FastConformer, d=1024, 8 heads, head_dim=128, FFN=4096, conv kernel=9
Subsampling Conv2d dw_striding stack, 8Γ— temporal (100 β†’ 12.5 fps)
Predictor 2-layer LSTM, hidden 640
Joint head enc(1024 β†’ 640) + pred(640 β†’ 640) β†’ ReLU β†’ linear(640 β†’ 1029) β€” TDT, 5 durations
CTC head linear(1024 β†’ 1025)
Vocab 1024 SentencePiece tokens (English, mixed case + punctuation) + blank
Audio 16 kHz mono, 80 mel bins, n_fft=512, hop=160, win=400
Parameters ~1.1 B

Same 42-layer encoder as parakeet-tdt-1.1b, but with an added CTC head and a mixed-case + punctuated vocab.

Attribution

License

CC-BY-4.0, inherited from the base model.

Downloads last month
160
GGUF
Model size
1B params
Architecture
parakeet
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cstr/parakeet-tdt_ctc-1.1b-GGUF

Quantized
(1)
this model