IndicConformer 600M β ONNX export for Vernacula
Re-packaged ONNX export of AI4Bharat's 22-language
ai4bharat/indic-conformer-600m-multilingual,
in the on-disk shape that Vernacula's
desktop ASR app expects. The CTC head only β the RNNT components from the
source repo are not shipped here.
- Conversion script:
scripts/indicconformer_export/ - Vernacula: github.com/christopherthompson81/vernacula
- Upstream model:
ai4bharat/indic-conformer-600m-multilingual
All numerical behavior is identical to the upstream encoder + CTC graph; only the on-disk packaging differs.
Highlights
- 22 languages, one shared CTC head. Encoder dim β 5633 logits with the shared blank at id 5632; per-language vocab spans live in
language_spans.jsonas{start, length}pairs (22 Γ 256 tokens). Language selection is a C# post-argmax mask, not an ONNX input β one model serves every language. - Phase 1 parity (CPU-CPU FP32): max-abs logits delta 6.87e-5 at 1e-3 tolerance. CUDA-vs-CPU cross-device drift hit 1e-2 scale β typical for a 17-layer Conformer; CPU is the numerically exact path.
- Real-audio parity on a 9.4 s hi-IN Fleurs clip: ~2 word-edit WER on an 11-word reference confirmed vocab, SentencePiece detokenisation, and language-span masking end-to-end.
- Repackaged 2.43 GB of AI4Bharat external-data from 366 per-tensor blobs into a single
.datasidecar. The repackaging walks initialisers + node attributes recursively so nothing is left behind. - Reused the Parakeet DFT preprocessor (no
STFTop), with agetattr()shim for NeMo 1.23.0rc0 compatibility (exact_pad,stft_pad_amountare 2.x-only field names).
Contents
| File | Purpose |
|---|---|
encoder-model.onnx (+ .data) |
Conformer encoder, [features, features_lens] -> [encoded, encoded_lens] |
ctc_decoder-model.onnx |
Single Conv1d β 5633-dim logits (22 Γ 256 language tokens + 1 shared CTC blank at id 5632) |
nemo128.onnx |
DFT-conv1d 80-mel preprocessor, [waveforms, waveforms_lens] -> [features, features_lens] |
vocab.txt |
Flat 5632-line vocab, id = line index; shared CTC blank is implicit at id 5632 |
language_spans.json |
22 Γ {start, length} β which slice of vocab.txt each language's 256 tokens occupy |
config.json |
Preprocessor frontend params + CTC blank id |
manifest.json |
Per-file MD5 hashes (used by Vernacula's download verifier) |
Export provenance
Exported via scripts/indicconformer_export/
in the Vernacula repo. The export uses AI4Bharat's NeMo fork
(kept in an isolated venv from the main NeMo export tooling, since the fork pins
different NeMo internals).
License
MIT, inherited from the upstream
ai4bharat/indic-conformer-600m-multilingual
model.
Using these files
In Vernacula, select IndicConformer as the ASR backend in Settings and the
package will be downloaded and verified automatically. Outside Vernacula,
pull with huggingface_hub and load with onnxruntime:
from huggingface_hub import snapshot_download
path = snapshot_download(repo_id="christopherthompson81/indicconformer-600m-onnx")
CTC decoding is performed against vocab.txt with the blank id at 5632.
The language_spans.json file lets you mask the logits to a specific
language's 256-token span before greedy / beam decoding. See scripts/indicconformer_export/README.md
for details.
Limitations
Covers 22 official Indian languages (listed in frontmatter). Accuracy and known failure modes inherit from the upstream AI4Bharat model card. The RNNT head from the source model is not included β only the CTC path β which trades a small amount of accuracy for substantially simpler decoding.
Citation
For the underlying model, see the upstream model card for the canonical citation.
Acknowledgments
- Original model: AI4Bharat (IIT Madras)
- ONNX repackaging: Chris Thompson for Vernacula
Issues with the ONNX export specifically: open an issue on the Vernacula repo. Issues with the underlying model: see the upstream model card.
See also
- Vernacula on GitHub β the speech pipeline app this package is built for
- Conversion script (
scripts/indicconformer_export/) β the export pipeline that produced these files ai4bharat/indic-conformer-600m-multilingualβ upstream model card- AI4Bharat β upstream research group at IIT Madras
- Other Vernacula model packages
- Downloads last month
- 30
Model tree for christopherthompson81/indicconformer-600m-onnx
Base model
ai4bharat/indic-conformer-600m-multilingual