Automatic Speech Recognition
LiteRT
LiteRT
qwen
qwen3
chinese
cantonese
on-device
soniqo
speech-cloud
speech-core
Instructions to use soniqo/Qwen3-ASR-0.6B-Encoder-LiteRT-INT8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use soniqo/Qwen3-ASR-0.6B-Encoder-LiteRT-INT8 with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
File size: 5,264 Bytes
80df339 b648ffa 80df339 b648ffa f1c4e67 e21dba3 b648ffa 80df339 b648ffa f1c4e67 b648ffa | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | ---
license: apache-2.0
language:
- zh
- yue
- en
- multilingual
tags:
- automatic-speech-recognition
- qwen
- qwen3
- chinese
- cantonese
- litert
- tflite
- on-device
- soniqo
- speech-cloud
- speech-core
base_model: Qwen/Qwen3-ASR-0.6B
library_name: litert
pipeline_tag: automatic-speech-recognition
---
# Qwen3 ASR 0.6B Encoder β LiteRT (INT8)
Qwen3-ASR audio encoder (zh / yue / en). INT8 weight-only.
> Part of the [**soniqo.audio**](https://soniqo.audio) speech toolkit β
> an open, runtime-portable stack for speech AI. This bundle is the
> **LiteRT** export, designed to plug into the abstract interfaces in
> [`speech-core`](https://github.com/soniqo/speech-core) (C++ voice-agent
> orchestration library). Browse all LiteRT bundles in the
> [**soniqo LiteRT collection**](https://huggingface.co/collections/soniqo/litert-6a08268e11d5a47d7aacc02b).
## Use cases on soniqo.audio
- [Multilingual transcription](https://soniqo.audio/transcription/)
Audio encoder of Qwen3-ASR-0.6B, specialized for Chinese (including 22
Chinese dialects) and 30 additional languages. Exported to LiteRT for
Android. The text decoder is a Qwen3-0.6B LLM and is intended to run
through LiteRT-LM as a separate runtime.
## Model
| Property | Value |
|---|---|
| Component | Audio encoder only |
| Parameters | ~180 M (encoder), decoder is a separate 0.6B LLM |
| Format | LiteRT (TFLite) |
| Quantization | INT8 dynamic weights (fp32 activations) |
| Sample rate | 16 000 Hz |
| Input | 128-bin log mel, 1000 frames (10 s, fixed) |
| Output | 125 audio embedding tokens, 1024-dim each |
| Languages | 30 + 22 Chinese dialects (Cantonese, Shanghainese, Sichuan, β¦) |
## Files
| File | Size | Description |
|---|---|---|
| `qwen3-asr-encoder.tflite` | 180.5 MB | Audio encoder, INT8 |
| `config.json` | 1 KB | Architecture + I/O specs |
## Signature
```
Inputs:
mel [1, 128, 1000] float32 10 s log mel spectrogram
Outputs:
audio_embeddings [1, 125, 1024] float32 For cross-attention into the decoder
```
## Architecture
```
mel [1, 128, 1000]
βββ 3Γ Conv2d(stride=2) + GELU β [1, 480, 16, 125]
βββ reshape β Linear(7680β896) β [1, 125, 896]
βββ + sinusoidal pos embed
βββ 18Γ pre-norm Transformer β [1, 125, 896]
βββ LayerNorm β Linear(896) β GELU
βββ Linear(896β1024) β [1, 125, 1024]
```
## Why encoder only
The text decoder is a full Qwen3-0.6B language model with GQA, RoPE,
SwiGLU and RMSNorm. It doesn't fit cleanly into a single `.tflite`; the
right runtime for LLM decoders on Android is
[LiteRT-LM](https://github.com/google-ai-edge/litert-lm) or a comparable
LLM executor, with the audio embeddings from this encoder wired in as
cross-attention context.
For ASR-only (no LLM), pair this encoder with a CTC or transducer head
fine-tuned on your target languages.
## Audio preprocessing
- 16 kHz mono, float32
- 128 log mel bins
- `n_fft=400`, `hop_length=160`, `win_length=400`, `pad_mode="reflect"`
- log mel, mean/std normalization per utterance
The exact reference is in the upstream Qwen3-ASR tokenizer config.
## Source
Upstream: [Qwen/Qwen3-ASR-0.6B](https://huggingface.co/Qwen/Qwen3-ASR-0.6B)
(Apache 2.0). Released January 2026 as part of the Qwen3 audio family.
## Links
- [speech-android](https://github.com/soniqo/speech-android) β Android SDK
- [soniqo.audio](https://soniqo.audio) β website
- [blog](https://soniqo.audio/blog) β blog
## Ecosystem
- [**soniqo.audio**](https://soniqo.audio) β use-case explorer (transcription, voice cloning, live ASR, voice agents).
- [**speech-core**](https://github.com/soniqo/speech-core) β C++ orchestration library for voice agents. Abstract `STTInterface` / `TTSInterface` / `VADInterface` / `EnhancerInterface`; LiteRT implementations plug straight into the interfaces.
- [**speech-swift**](https://github.com/soniqo/speech-swift) β Apple Silicon MLX companion runtime (model-specific MLX bundles linked above where applicable).
- [**speech-android**](https://github.com/soniqo/speech-android) β Android SDK consuming on-device LiteRT bundles.
## Other LiteRT models in this collection
**ASR / Transcription**
- [Parakeet TDT 0.6B v3 β LiteRT (INT8)](https://huggingface.co/soniqo/Parakeet-TDT-0.6B-v3-LiteRT-INT8)
- [Nemotron Speech Streaming 0.6B β LiteRT](https://huggingface.co/soniqo/Nemotron-Speech-Streaming-LiteRT)
- [Omnilingual ASR CTC 300M β LiteRT](https://huggingface.co/soniqo/Omnilingual-ASR-CTC-300M-LiteRT)
- [Omnilingual ASR CTC 300M β LiteRT (INT8)](https://huggingface.co/soniqo/Omnilingual-ASR-CTC-300M-LiteRT-INT8)
**VAD / Diarization**
- [Silero VAD v5 β LiteRT](https://huggingface.co/soniqo/Silero-VAD-v5-LiteRT)
- [Pyannote Segmentation 3.0 β LiteRT](https://huggingface.co/soniqo/Pyannote-Segmentation-LiteRT)
- [WeSpeaker ResNet34-LM β LiteRT](https://huggingface.co/soniqo/WeSpeaker-ResNet34-LM-LiteRT)
**TTS / Voice Cloning**
- [VoxCPM2 β LiteRT (INT8)](https://huggingface.co/soniqo/VoxCPM2-LiteRT-INT8)
## License
This bundle inherits the upstream model license (**apache-2.0**). See the
linked `base_model` repository for the full terms.
|