IndicConformer 600M β€” ONNX export for Vernacula

Re-packaged ONNX export of AI4Bharat's 22-language ai4bharat/indic-conformer-600m-multilingual, in the on-disk shape that Vernacula's desktop ASR app expects. The CTC head only β€” the RNNT components from the source repo are not shipped here.

All numerical behavior is identical to the upstream encoder + CTC graph; only the on-disk packaging differs.

Highlights

  • 22 languages, one shared CTC head. Encoder dim β†’ 5633 logits with the shared blank at id 5632; per-language vocab spans live in language_spans.json as {start, length} pairs (22 Γ— 256 tokens). Language selection is a C# post-argmax mask, not an ONNX input β€” one model serves every language.
  • Phase 1 parity (CPU-CPU FP32): max-abs logits delta 6.87e-5 at 1e-3 tolerance. CUDA-vs-CPU cross-device drift hit 1e-2 scale β€” typical for a 17-layer Conformer; CPU is the numerically exact path.
  • Real-audio parity on a 9.4 s hi-IN Fleurs clip: ~2 word-edit WER on an 11-word reference confirmed vocab, SentencePiece detokenisation, and language-span masking end-to-end.
  • Repackaged 2.43 GB of AI4Bharat external-data from 366 per-tensor blobs into a single .data sidecar. The repackaging walks initialisers + node attributes recursively so nothing is left behind.
  • Reused the Parakeet DFT preprocessor (no STFT op), with a getattr() shim for NeMo 1.23.0rc0 compatibility (exact_pad, stft_pad_amount are 2.x-only field names).

Contents

File Purpose
encoder-model.onnx (+ .data) Conformer encoder, [features, features_lens] -> [encoded, encoded_lens]
ctc_decoder-model.onnx Single Conv1d β†’ 5633-dim logits (22 Γ— 256 language tokens + 1 shared CTC blank at id 5632)
nemo128.onnx DFT-conv1d 80-mel preprocessor, [waveforms, waveforms_lens] -> [features, features_lens]
vocab.txt Flat 5632-line vocab, id = line index; shared CTC blank is implicit at id 5632
language_spans.json 22 Γ— {start, length} β€” which slice of vocab.txt each language's 256 tokens occupy
config.json Preprocessor frontend params + CTC blank id
manifest.json Per-file MD5 hashes (used by Vernacula's download verifier)

Export provenance

Exported via scripts/indicconformer_export/ in the Vernacula repo. The export uses AI4Bharat's NeMo fork (kept in an isolated venv from the main NeMo export tooling, since the fork pins different NeMo internals).

License

MIT, inherited from the upstream ai4bharat/indic-conformer-600m-multilingual model.

Using these files

In Vernacula, select IndicConformer as the ASR backend in Settings and the package will be downloaded and verified automatically. Outside Vernacula, pull with huggingface_hub and load with onnxruntime:

from huggingface_hub import snapshot_download
path = snapshot_download(repo_id="christopherthompson81/indicconformer-600m-onnx")

CTC decoding is performed against vocab.txt with the blank id at 5632. The language_spans.json file lets you mask the logits to a specific language's 256-token span before greedy / beam decoding. See scripts/indicconformer_export/README.md for details.

Limitations

Covers 22 official Indian languages (listed in frontmatter). Accuracy and known failure modes inherit from the upstream AI4Bharat model card. The RNNT head from the source model is not included β€” only the CTC path β€” which trades a small amount of accuracy for substantially simpler decoding.

Citation

For the underlying model, see the upstream model card for the canonical citation.

Acknowledgments

Issues with the ONNX export specifically: open an issue on the Vernacula repo. Issues with the underlying model: see the upstream model card.

See also

Downloads last month
30
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for christopherthompson81/indicconformer-600m-onnx

Quantized
(5)
this model