Instructions to use soniqo/Omnilingual-ASR-CTC-300M-LiteRT-INT8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use soniqo/Omnilingual-ASR-CTC-300M-LiteRT-INT8 with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Omnilingual ASR CTC 300M β LiteRT (INT8)
Meta's 1600-language wav2vec2 CTC ASR, INT8 weight-only quantization.
Part of the soniqo.audio speech toolkit β an open, runtime-portable stack for speech AI. This bundle is the LiteRT export, designed to plug into the abstract interfaces in
speech-core(C++ voice-agent orchestration library). Browse all LiteRT bundles in the soniqo LiteRT collection.
Use cases on soniqo.audio
Meta's Omnilingual ASR (Wav2Vec2-CTC, 300 M parameters) exported to LiteRT for Android. Supports 1600+ languages and doubles as a forced-alignment model via standard CTC Viterbi decoding on the output logits.
Model
| Property | Value |
|---|---|
| Architecture | Wav2Vec2 temporal CNN frontend + 24-layer Transformer + CTC head |
| Parameters | ~300 M |
| Format | LiteRT (TFLite) |
| Quantization | INT8 dynamic weights (fp32 activations) |
| Sample rate | 16 000 Hz |
| Input length | 160 000 samples (10 s, fixed) |
| Frame rate | 50 Hz (320Γ downsample) |
| Vocab size | 10 288 (SentencePiece) |
Files
| File | Size | Description |
|---|---|---|
omnilingual-ctc-300m.tflite |
315.2 MB | Full model, INT8 |
tokenizer.model |
89 KB | SentencePiece tokenizer |
config.json |
1 KB | Model + fbank specs |
Signature
Inputs:
audio [1, 160000] float32 z-score normalized 10 s @ 16 kHz
Outputs:
logits [1, 500, 10288] float32 per-frame CTC logits (50 Hz)
Capabilities
1. Greedy / beam ASR
Standard CTC decoding: argmax per frame, collapse repeats, remove blanks,
decode with SentencePiece. Works across 1600+ languages.
2. Forced alignment
Given a target transcription, run the CTC Viterbi forced-alignment algorithm
over the [T, vocab_size] posteriors to recover per-token start/end frame
positions. Convert frame indices to seconds with frame_rate = 50 Hz
(1 frame = 20 ms).
Reference implementations you can port directly:
torchaudio.functional.forced_alignctc-forced-aligner(MahmoudAshraf97)- The CTC alignment pseudocode in the Wav2Vec2 paper
The full DP is ~100 lines of straightforward code and runs in microseconds per utterance on mobile CPUs.
Usage
val model = Interpreter(loadModelFile("omnilingual-ctc-300m.tflite"))
// 10 s of z-score normalized audio at 16 kHz
val audio = FloatArray(160_000)
val logits = Array(1) { Array(500) { FloatArray(10_288) } }
model.run(audio, logits)
// Greedy CTC β tokens
val tokens = logits[0].map { frame -> frame.indexOfMax() }
// Or: CTC forced alignment for timestamps
val alignment = ctcForcedAlign(logits[0], tokenize("hello world"))
Source
Upstream: Meta Omnilingual ASR project (CC BY-NC 4.0). Paper: Omnilingual ASR Technical Report (2026).
Links
- speech-android β Android SDK
- soniqo.audio β website
- blog β blog
Ecosystem
- soniqo.audio β use-case explorer (transcription, voice cloning, live ASR, voice agents).
- speech-core β C++ orchestration library for voice agents. Abstract
STTInterface/TTSInterface/VADInterface/EnhancerInterface; LiteRT implementations plug straight into the interfaces. - speech-swift β Apple Silicon MLX companion runtime (model-specific MLX bundles linked above where applicable).
- speech-android β Android SDK consuming on-device LiteRT bundles.
Other LiteRT models in this collection
ASR / Transcription
- Parakeet TDT 0.6B v3 β LiteRT (INT8)
- Nemotron Speech Streaming 0.6B β LiteRT
- Omnilingual ASR CTC 300M β LiteRT
- Qwen3 ASR 0.6B Encoder β LiteRT (INT8)
VAD / Diarization
TTS / Voice Cloning
License
This bundle inherits the upstream model license (cc-by-nc-4.0). See the
linked base_model repository for the full terms.
- Downloads last month
- 17