Automatic Speech Recognition
LiteRT
LiteRT
qwen
qwen3
chinese
cantonese
on-device
soniqo
speech-cloud
speech-core
Instructions to use soniqo/Qwen3-ASR-0.6B-Encoder-LiteRT-INT8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use soniqo/Qwen3-ASR-0.6B-Encoder-LiteRT-INT8 with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| language: | |
| - zh | |
| - yue | |
| - en | |
| - multilingual | |
| tags: | |
| - automatic-speech-recognition | |
| - qwen | |
| - qwen3 | |
| - chinese | |
| - cantonese | |
| - litert | |
| - tflite | |
| - on-device | |
| - soniqo | |
| - speech-cloud | |
| - speech-core | |
| base_model: Qwen/Qwen3-ASR-0.6B | |
| library_name: litert | |
| pipeline_tag: automatic-speech-recognition | |
| # Qwen3 ASR 0.6B Encoder β LiteRT (INT8) | |
| Qwen3-ASR audio encoder (zh / yue / en). INT8 weight-only. | |
| > Part of the [**soniqo.audio**](https://soniqo.audio) speech toolkit β | |
| > an open, runtime-portable stack for speech AI. This bundle is the | |
| > **LiteRT** export, designed to plug into the abstract interfaces in | |
| > [`speech-core`](https://github.com/soniqo/speech-core) (C++ voice-agent | |
| > orchestration library). Browse all LiteRT bundles in the | |
| > [**soniqo LiteRT collection**](https://huggingface.co/collections/soniqo/litert-6a08268e11d5a47d7aacc02b). | |
| ## Use cases on soniqo.audio | |
| - [Multilingual transcription](https://soniqo.audio/transcription/) | |
| Audio encoder of Qwen3-ASR-0.6B, specialized for Chinese (including 22 | |
| Chinese dialects) and 30 additional languages. Exported to LiteRT for | |
| Android. The text decoder is a Qwen3-0.6B LLM and is intended to run | |
| through LiteRT-LM as a separate runtime. | |
| ## Model | |
| | Property | Value | | |
| |---|---| | |
| | Component | Audio encoder only | | |
| | Parameters | ~180 M (encoder), decoder is a separate 0.6B LLM | | |
| | Format | LiteRT (TFLite) | | |
| | Quantization | INT8 dynamic weights (fp32 activations) | | |
| | Sample rate | 16 000 Hz | | |
| | Input | 128-bin log mel, 1000 frames (10 s, fixed) | | |
| | Output | 125 audio embedding tokens, 1024-dim each | | |
| | Languages | 30 + 22 Chinese dialects (Cantonese, Shanghainese, Sichuan, β¦) | | |
| ## Files | |
| | File | Size | Description | | |
| |---|---|---| | |
| | `qwen3-asr-encoder.tflite` | 180.5 MB | Audio encoder, INT8 | | |
| | `config.json` | 1 KB | Architecture + I/O specs | | |
| ## Signature | |
| ``` | |
| Inputs: | |
| mel [1, 128, 1000] float32 10 s log mel spectrogram | |
| Outputs: | |
| audio_embeddings [1, 125, 1024] float32 For cross-attention into the decoder | |
| ``` | |
| ## Architecture | |
| ``` | |
| mel [1, 128, 1000] | |
| βββ 3Γ Conv2d(stride=2) + GELU β [1, 480, 16, 125] | |
| βββ reshape β Linear(7680β896) β [1, 125, 896] | |
| βββ + sinusoidal pos embed | |
| βββ 18Γ pre-norm Transformer β [1, 125, 896] | |
| βββ LayerNorm β Linear(896) β GELU | |
| βββ Linear(896β1024) β [1, 125, 1024] | |
| ``` | |
| ## Why encoder only | |
| The text decoder is a full Qwen3-0.6B language model with GQA, RoPE, | |
| SwiGLU and RMSNorm. It doesn't fit cleanly into a single `.tflite`; the | |
| right runtime for LLM decoders on Android is | |
| [LiteRT-LM](https://github.com/google-ai-edge/litert-lm) or a comparable | |
| LLM executor, with the audio embeddings from this encoder wired in as | |
| cross-attention context. | |
| For ASR-only (no LLM), pair this encoder with a CTC or transducer head | |
| fine-tuned on your target languages. | |
| ## Audio preprocessing | |
| - 16 kHz mono, float32 | |
| - 128 log mel bins | |
| - `n_fft=400`, `hop_length=160`, `win_length=400`, `pad_mode="reflect"` | |
| - log mel, mean/std normalization per utterance | |
| The exact reference is in the upstream Qwen3-ASR tokenizer config. | |
| ## Source | |
| Upstream: [Qwen/Qwen3-ASR-0.6B](https://huggingface.co/Qwen/Qwen3-ASR-0.6B) | |
| (Apache 2.0). Released January 2026 as part of the Qwen3 audio family. | |
| ## Links | |
| - [speech-android](https://github.com/soniqo/speech-android) β Android SDK | |
| - [soniqo.audio](https://soniqo.audio) β website | |
| - [blog](https://soniqo.audio/blog) β blog | |
| ## Ecosystem | |
| - [**soniqo.audio**](https://soniqo.audio) β use-case explorer (transcription, voice cloning, live ASR, voice agents). | |
| - [**speech-core**](https://github.com/soniqo/speech-core) β C++ orchestration library for voice agents. Abstract `STTInterface` / `TTSInterface` / `VADInterface` / `EnhancerInterface`; LiteRT implementations plug straight into the interfaces. | |
| - [**speech-swift**](https://github.com/soniqo/speech-swift) β Apple Silicon MLX companion runtime (model-specific MLX bundles linked above where applicable). | |
| - [**speech-android**](https://github.com/soniqo/speech-android) β Android SDK consuming on-device LiteRT bundles. | |
| ## Other LiteRT models in this collection | |
| **ASR / Transcription** | |
| - [Parakeet TDT 0.6B v3 β LiteRT (INT8)](https://huggingface.co/soniqo/Parakeet-TDT-0.6B-v3-LiteRT-INT8) | |
| - [Nemotron Speech Streaming 0.6B β LiteRT](https://huggingface.co/soniqo/Nemotron-Speech-Streaming-LiteRT) | |
| - [Omnilingual ASR CTC 300M β LiteRT](https://huggingface.co/soniqo/Omnilingual-ASR-CTC-300M-LiteRT) | |
| - [Omnilingual ASR CTC 300M β LiteRT (INT8)](https://huggingface.co/soniqo/Omnilingual-ASR-CTC-300M-LiteRT-INT8) | |
| **VAD / Diarization** | |
| - [Silero VAD v5 β LiteRT](https://huggingface.co/soniqo/Silero-VAD-v5-LiteRT) | |
| - [Pyannote Segmentation 3.0 β LiteRT](https://huggingface.co/soniqo/Pyannote-Segmentation-LiteRT) | |
| - [WeSpeaker ResNet34-LM β LiteRT](https://huggingface.co/soniqo/WeSpeaker-ResNet34-LM-LiteRT) | |
| **TTS / Voice Cloning** | |
| - [VoxCPM2 β LiteRT (INT8)](https://huggingface.co/soniqo/VoxCPM2-LiteRT-INT8) | |
| ## License | |
| This bundle inherits the upstream model license (**apache-2.0**). See the | |
| linked `base_model` repository for the full terms. | |