OmniASR CTC-1B-v2 β€” GGUF

GGUF conversion of aadel4/omniASR-CTC-1B-v2 for use with CrispASR.

OmniASR is Meta's multilingual ASR model family supporting 1600+ languages. Apache-2.0 license.

Recommended CTC model. Q4_K uses a mixed-quantization recipe (first 4 of 48 encoder layers kept at F16) that recovers nearly all of Q8_0's quality at ~65% of the size. Plain uniform Q4_K is preserved as *_old.gguf for reference.

Files

File Size Notes
omniasr-ctc-1b-v2-q4_k.gguf 658 MB Recommended. Mixed Q4_K (first 4 encoder layers F16, rest Q4_K). 5% WER on JFK (one-word artefact americans→americas, model-internal).
omniasr-ctc-1b-v2-q8_0.gguf 1007 MB Byte-perfect against the FP32 transformers reference. Use this if 5% WER on edge cases is unacceptable.
omniasr-ctc-1b-v2.gguf 1.8 GB F16 source β€” feed to crispasr-quantize for custom recipes.
omniasr-ctc-1b-v2-q4_k_old.gguf 551 MB Legacy. Uniform Q4_K. Drops characters under CTC argmax pressure (~22.7% WER on JFK). Kept for reproducibility / size-vs-quality study; do not use for production.

Why mixed Q4_K?

CTC argmax decoding is structurally sensitive to weight drift β€” small perturbations to encoder weights flip frame-level argmax decisions toward the blank token, which manifests as missing characters in the output (e.g. "americans" β†’ "amercans" β†’ "americas"). Per-layer activation analysis (via OMNIASR_DUMP_DIR=...) shows quantization noise enters at every encoder layer's matmul and compounds through the residual stream, peaking at layers 36-47 even though the cause is upstream.

Counter-intuitively, keeping the late layers at F16 made things worse β€” F16 math preserves accumulated upstream noise more faithfully than Q4_K math does (Q4_K rounding occasionally lands back near the right bin). The fix is to keep the first 4 layers at F16, stopping noise from entering the residual stream. This is automatic in crispasr-quantize for any GGUF whose general.architecture is omniasr-ctc. Override via env vars:

CRISPASR_OMNIASR_KEEP_F16_HEAD=N   # default 4; 0 = uniform Q4_K
CRISPASR_OMNIASR_KEEP_F16_TAIL=N   # default 0; >0 is counter-productive
CRISPASR_OMNIASR_QUANT_ALL=1       # full quant, smaller, ~22% WER

Full diagnosis in LEARNINGS.md under "Q4_K is too lossy as the default for CTC-decoded ASR" and its follow-up "mixed Q4_K head-skip recovers nearly all Q8_0 quality".

Quick Start

git clone https://github.com/CrispStrobe/CrispASR && cd CrispASR
cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc)

./build/bin/crispasr --backend omniasr -m auto --auto-download -f audio.wav

Conversion

Converted using CrispASR's converter scripts with fixed positional conv weight normalization (per-kernel-position norm, not per-output-channel).

Downloads last month
444
GGUF
Model size
1.0B params
Architecture
omniasr-ctc
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cstr/omniASR-CTC-1B-v2-GGUF

Quantized
(1)
this model