File size: 6,499 Bytes
f217fdd 3d9635c f217fdd 3d9635c f217fdd 3d9635c f217fdd 3d9635c f217fdd 02d52ba f217fdd f8e0bf3 3d9635c f217fdd 3d9635c f217fdd 3d9635c f217fdd 3d9635c f217fdd 3d9635c f217fdd 12cf437 f217fdd f8e0bf3 f217fdd 3d9635c f217fdd 3d9635c f217fdd 3d9635c f217fdd 3d9635c f217fdd 3d9635c f217fdd 3d9635c f217fdd 3d9635c f217fdd 3d9635c f217fdd 3d9635c f217fdd 3d9635c f217fdd 3d9635c f217fdd 3d9635c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 | ---
frameworks:
- ""
language:
- zh
license: apache-2.0
tags:
- speech
- asr
tasks: []
---
# Dolphin-CN-Dialect
[Paper](https://arxiv.org/abs/2605.08961)
[Github](https://github.com/DataoceanAI/Dolphin)
[Huggingface](https://huggingface.co/DataoceanAI)
[Modelscope](https://www.modelscope.cn/organization/DataoceanAI)
# Repository Notice
This model is officially maintained by **Dataocean AI**.
To ensure compatibility with existing user code and download links, we keep two official repositories for the same model:
- Original / legacy repository: DataoceanAI
- Organization / enterprise repository: DataoceanAI1
Both repositories are maintained by the same team and contain the same model files.
DataoceanAI1 is the newly created enterprise organization account, while DataoceanAI is kept to avoid breaking existing user download scripts and links.
Please do not regard either repository as an unofficial copy or unauthorized redistribution.
**Dolphin-CN-Dialect** is a multi-dialect ASR model developed by Dataocean AI and Tsinghua University, with a strong focus on Chinese dialect recognition and real-world deployment scenarios. Compared with the previous Dolphin series, Dolphin-CN-Dialect introduces significant improvements in tokenizer design, dialect-balanced training, streaming capability, hotword biasing, and deployment efficiency.
The model supports Mandarin Chinese and 22 Chinese dialects, while also maintaining multilingual ASR capability inherited from Dolphin. Dolphin-CN-Dialect supports both streaming and non-streaming inference, enabling practical deployment in latency-sensitive applications such as real-time transcription and industrial speech recognition systems.
## Approach
Dolphin-CN-Dialect is built upon the Dolphin architecture and follows a joint CTC-Attention framework with:
* Encoder: E-Branchformer
* Decoder: Transformer Decoder
* Training Objective: Joint CTC + Attention loss
Compared to Dolphin, Dolphin-CN-Dialect introduces several important improvements:
* Temperature-based data sampling for balancing standard Mandarin and low-resource dialects
* Redesigned tokenizer with:
* character-level modeling for Chinese
* BPE-based subword modeling for English
* extensible dialect tokens
* Streaming ASR support
* Hotword-biased decoding, including:
* encoder-level contextual biasing
* prompt-based decoder biasing
Experimental results show that Dolphin-CN-Dialect achieves:
* 38% improvement in dialect recognition accuracy
* 16.3% relative CER reduction over Dolphin
* Competitive performance with recent large-scale ASR systems while maintaining a smaller model size

See details in the [Paper](https://arxiv.org/abs/2605.08961).
## Setup
Dolphin-CN-Dialect requires FFmpeg to convert audio files into WAV format. Please install FFmpeg first if it is not already installed on your system.
```shell
# Ubuntu / Debian
sudo apt update && sudo apt install ffmpeg
# MacOS
brew install ffmpeg
# Windows
choco install ffmpeg
```
Install Dolphin with pip:
```shell
pip install -U dolphin
```
Alternatively, install from source:
```shell
pip install git+https://github.com/DataoceanAI/Dolphin.git
```
## Available Models
Currently, Dolphin-CN-Dialect provides multiple model sizes optimized for different deployment scenarios.
| Model | Parameters | Hotwords |
|:------:|:----------:|:----------:|
| base.cn | 0.1 B | ❌ |
| base.cn.streaming | 0.1 B |❌ |
| small.cn | 0.4 B | Encoder-biased Hotwords |
| small.cn.streaming | 0.4 B | Encoder-biased Hotwords |
| small.cn.prompt | 0.4 B | Prompt-based Hotwords |
## Hotword Biasing
Dolphin-CN-Dialect supports two hotword biasing approaches.
**Encoder-Level Contextual Biasing**
* Supports both streaming and non-streaming models
* Integrates contextual embeddings into encoder representations
* Efficient adaptation without retraining the full model
**Prompt-Based Hotword Biasing**
* Designed for non-streaming models
* Injects hotwords directly into decoder prompts
* Particularly effective for long-tail and rare phrases
Experimental results show significant reductions in hotword error rates while maintaining strong overall ASR performance.
## Supported Languages and Dialects
Dolphin-CN-Dialect primarily focuses on:
* Mandarin Chinese
* 22 Chinese dialects
* Regional accented Mandarin
Supported dialects include:
* Sichuan
* Wu
* Minnan
* Shanghai
* Gansu
* Guangdong
* Wenzhou
* Hunan
* Anhui
* Henan
* Fujian
* Hebei
* Liaoning
* Shaanxi
* Tianjin
* and more
For the complete language and dialect list, see [languages.md](./languages.md).
## Supported Devices
| Device Type | Support Status |
|:-------------:|:----------------:|
|**CUDA**|✅Supported|
|**MPS (Apple)**|✅Supported|
|**CPU**|✅Supported|
## Usage
### Command-line usage
```shell
dolphin audio.wav
# Download model and specify the model path
dolphin audio.wav --model small.cn --model_dir /data/models/dolphin/
# Specify language and region
dolphin audio.wav --model small.cn --model_dir /data/models/dolphin/ --lang_sym "zh" --region_sym "CN"
# Specify the hotwords file with Encoder-biased method
dolphin audio.wav --model small.cn --model_dir /data/models/dolphin/ --hotword_list_path hotwords.txt --use_deep_biasing true
# Using prompt-based model
dolphin audio.wav --model small.cn.prompt --model_dir /data/models/dolphin/ --hotword_list_path hotwords.txt --use_prompt_hotword true --use_two_stage_filter true
```
### Python usage
```python
import dolphin
from dolphin import transcribe
model_name = 'small.cn'
model = dolphin.load_model(model_name, device="cuda")
result = transcribe(model, 'audio.wav')
print(result.text)
# Specify language
result = transcribe(model, 'audio.wav', lang_sym="zh")
print(result.text)
# Specify language and region and encoder-biased hotwords
result = transcribe(model, 'audio.wav', lang_sym="zh", region_sym="CN", hotwords=['诺香丹青牌科研胶囊'], use_deep_biasing=True, use_two_stage_filter=True)
print(result.text)
## prompt-based hotwords
model_name = 'small.cn.prompt'
model = dolphin.load_model(model_name, device="cuda")
result = transcribe(model, 'audio.wav', hotwords=['诺香丹青牌科研胶囊'], use_prompt_hotword=True, use_two_stage_filter=True, decoding_method='attention')
print(result.text)
```
## License
Dolphin-CN-Dialect is released under the Apache 2.0 License.
|