Parakeet TDT+CTC 1.1B β GGUF (ggml-quantised)
GGUF / ggml conversions of nvidia/parakeet-tdt_ctc-1.1b for use with the crispasr CLI from CrispStrobe/CrispASR.
The largest hybrid Parakeet β 1.1 B parameters, 42-layer FastConformer encoder with both TDT and CTC heads. The hybrid head gives you two decode strategies on the same encoder: native TDT word timestamps (default), or CTC if you need shallow-fusion biasing.
- English, mixed-case + punctuation output (vocab includes uppercase + punctuation tokens, unlike the pure
parakeet-tdt-1.1b) - Hybrid TDT+CTC β default decode is TDT; pass
--parakeet-decoder ctcfor the CTC head - CC-BY-4.0 licence
This repo provides three quantisations, all converted from the same .nemo checkpoint via the convert-parakeet-to-gguf.py script and quantised with crispasr-quantize.
Files
| File | Size | Notes |
|---|---|---|
parakeet-tdt_ctc-1.1b.gguf |
2.15 GB | F16, full precision |
parakeet-tdt_ctc-1.1b-q8_0.gguf |
1.27 GB | Q8_0, near-lossless |
parakeet-tdt_ctc-1.1b-q4_k.gguf |
810 MB | Q4_K β recommended default |
Smoke test on samples/jfk.wav (11 s clip, M1 Metal):
| Quant | Time | Realtime | Output |
|---|---|---|---|
| F16 | 0.74 s | 14.8Γ | "And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country." |
| Q8_0 | 2.12 s | 5.2Γ | (identical) |
| Q4_K | 2.67 s | 4.1Γ | (identical) |
Note: this checkpoint's Q4_K/Q8_0 run slower than the pure
parakeet-tdt-1.1bquants on M1 (CTC + TDT both wired in, plus a per-tensor q4_0 fallback on the joint head). F16 is the fastest precision here.
Quick Start
# 1. Build the runtime
git clone https://github.com/CrispStrobe/CrispASR
cd CrispASR
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc) --target crispasr
# 2a. Auto-download via the registry key
./build/bin/crispasr -m parakeet-tdt_ctc-1.1b --auto-download -f your-audio.wav
# 2b. Or explicit download + load
hf download cstr/parakeet-tdt_ctc-1.1b-GGUF \
parakeet-tdt_ctc-1.1b-q4_k.gguf --local-dir .
./build/bin/crispasr -m parakeet-tdt_ctc-1.1b-q4_k.gguf -f your-audio.wav
# 2c. Switch to the CTC head (e.g. when adding hotword biasing)
./build/bin/crispasr -m parakeet-tdt_ctc-1.1b --parakeet-decoder ctc -f your-audio.wav
When to pick this over the other Parakeet variants
| Scenario | Pick |
|---|---|
| English 1.1B with proper casing + punctuation in output | tdt_ctc-1.1b (this repo) |
| English 1.1B, lowercase output, faster Q4_K/Q8_0 | cstr/parakeet-tdt-1.1b-GGUF |
| English, best WER per FLOP | cstr/parakeet-tdt-0.6b-v2-GGUF |
| Multilingual (25 EU languages) | cstr/parakeet-tdt-0.6b-v3-GGUF |
| Tight RAM | cstr/parakeet-tdt_ctc-110m-GGUF |
Model architecture
| Component | Details |
|---|---|
| Encoder | 42-layer FastConformer, d=1024, 8 heads, head_dim=128, FFN=4096, conv kernel=9 |
| Subsampling | Conv2d dw_striding stack, 8Γ temporal (100 β 12.5 fps) |
| Predictor | 2-layer LSTM, hidden 640 |
| Joint head | enc(1024 β 640) + pred(640 β 640) β ReLU β linear(640 β 1029) β TDT, 5 durations |
| CTC head | linear(1024 β 1025) |
| Vocab | 1024 SentencePiece tokens (English, mixed case + punctuation) + blank |
| Audio | 16 kHz mono, 80 mel bins, n_fft=512, hop=160, win=400 |
| Parameters | ~1.1 B |
Same 42-layer encoder as parakeet-tdt-1.1b, but with an added CTC head and a mixed-case + punctuated vocab.
Attribution
- Original model:
nvidia/parakeet-tdt_ctc-1.1b(CC-BY-4.0). NVIDIA NeMo team. - GGUF conversion + ggml runtime:
CrispStrobe/CrispASR.
License
CC-BY-4.0, inherited from the base model.
- Downloads last month
- 160
8-bit
Model tree for cstr/parakeet-tdt_ctc-1.1b-GGUF
Base model
nvidia/parakeet-tdt_ctc-1.1b