DataoceanAI commited on
Commit
f217fdd
·
verified ·
1 Parent(s): 6b59667

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +223 -3
README.md CHANGED
@@ -1,3 +1,223 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+
3
+ license: apache-2.0
4
+
5
+ language:
6
+
7
+ - zh
8
+
9
+ tags:
10
+
11
+ - speech
12
+
13
+ - asr
14
+
15
+ frameworks:
16
+
17
+ - pytorch
18
+
19
+ ---
20
+
21
+ # Dolphin-Fangyan
22
+
23
+ [Paper](https://arxiv.org/abs/2503.20212)
24
+ [Github](https://github.com/DataoceanAI/Dolphin)
25
+ [Huggingface](https://huggingface.co/DataoceanAI)
26
+ [Modelscope](https://www.modelscope.cn/organization/DataoceanAI)
27
+ [Openi](https://openi.pcl.ac.cn/DataoceanAI/Dolphin)
28
+ [Wisemodel](https://wisemodel.cn/models/lijp22/dolphin-base)
29
+
30
+ **Dolphin-Fangyan** is a multi-dialect ASR model developed by Dataocean AI and Tsinghua University, with a strong focus on Chinese dialect recognition and real-world deployment scenarios. Compared with the previous Dolphin series, Dolphin-Fangyan introduces significant improvements in tokenizer design, dialect-balanced training, streaming capability, hotword biasing, and deployment efficiency.
31
+
32
+ The model supports Mandarin Chinese and 22 Chinese dialects, while also maintaining multilingual ASR capability inherited from Dolphin. Dolphin-Fangyan supports both streaming and non-streaming inference, enabling practical deployment in latency-sensitive applications such as real-time transcription and industrial speech recognition systems.
33
+
34
+
35
+ ## Approach
36
+
37
+ Dolphin-Fangyan is built upon the Dolphin architecture and follows a joint CTC-Attention framework with:
38
+
39
+ * Encoder: E-Branchformer
40
+ * Decoder: Transformer Decoder
41
+ * Training Objective: Joint CTC + Attention loss
42
+
43
+ Compared to Dolphin, Dolphin-Fangyan introduces several important improvements:
44
+
45
+ * Temperature-based data sampling for balancing standard Mandarin and low-resource dialects
46
+ * Redesigned tokenizer with:
47
+ * character-level modeling for Chinese
48
+ * BPE-based subword modeling for English
49
+ * extensible dialect tokens
50
+ * Streaming ASR support
51
+ * Hotword-biased decoding, including:
52
+ * encoder-level contextual biasing
53
+ * prompt-based decoder biasing
54
+
55
+ Experimental results show that Dolphin-Fangyan achieves:
56
+
57
+ * 38% improvement in dialect recognition accuracy
58
+ * 16.3% relative CER reduction over Dolphin
59
+ * Competitive performance with recent large-scale ASR systems while maintaining a smaller model size
60
+
61
+ ![Dolphin-FangYan 特色海报](dolphin_fangyan_feature_poster_v3.png)
62
+
63
+
64
+ See details in the [Paper](https://arxiv.org/abs/2503.20212).
65
+
66
+
67
+ ## Setup
68
+
69
+ Dolphin-Fangyan requires FFmpeg to convert audio files into WAV format. Please install FFmpeg first if it is not already installed on your system.
70
+
71
+ ```shell
72
+ # Ubuntu / Debian
73
+ sudo apt update && sudo apt install ffmpeg
74
+ # MacOS
75
+ brew install ffmpeg
76
+ # Windows
77
+ choco install ffmpeg
78
+ ```
79
+
80
+ Install Dolphin with pip:
81
+
82
+ ```shell
83
+ pip install -U dolphin
84
+ ```
85
+
86
+ Alternatively, install from source:
87
+
88
+ ```shell
89
+ pip install git+https://github.com/DataoceanAI/Dolphin.git
90
+ ```
91
+
92
+ ## Available Models
93
+
94
+ Currently, Dolphin-Fangyan provides multiple model sizes optimized for different deployment scenarios.
95
+
96
+ | Model | Parameters | Hotwords |
97
+ |:------:|:----------:|:----------:|
98
+ | base.fangyan | 74 M | ❌ |
99
+ | base.fangyan.streaming | 74 M |❌ |
100
+ | small.fangyan | 0.4 B | Encode-biased Hotwords |
101
+ | small.fangyan.streaming | 0.4 B | Encode-biased Hotwords |
102
+ | small.fangyan.prompt | 0.4 B | Prompt-based Hotwords |
103
+
104
+
105
+ ## Hotword Biasing
106
+
107
+ Dolphin-Fangyan supports two hotword biasing approaches.
108
+
109
+ **Encoder-Level Contextual Biasing**
110
+
111
+ * Supports both streaming and non-streaming models
112
+ * Integrates contextual embeddings into encoder representations
113
+ * Efficient adaptation without retraining the full model
114
+
115
+ **Prompt-Based Hotword Biasing**
116
+
117
+ * Designed for non-streaming models
118
+ * Injects hotwords directly into decoder prompts
119
+ * Particularly effective for long-tail and rare phrases
120
+
121
+ Experimental results show significant reductions in hotword error rates while maintaining strong overall ASR performance.
122
+
123
+
124
+
125
+ ## Supported Languages and Dialects
126
+
127
+ Dolphin-Fangyan primarily focuses on:
128
+
129
+ * Mandarin Chinese
130
+ * 22 Chinese dialects
131
+ * Regional accented Mandarin
132
+
133
+ Supported dialects include:
134
+
135
+ * Sichuan
136
+ * Wu
137
+ * Minnan
138
+ * Shanghai
139
+ * Gansu
140
+ * Guangdong
141
+ * Wenzhou
142
+ * Hunan
143
+ * Anhui
144
+ * Henan
145
+ * Fujian
146
+ * Hebei
147
+ * Liaoning
148
+ * Shaanxi
149
+ * Tianjin
150
+ * and more
151
+
152
+ For the complete language and dialect list, see [languages.md](./languages.md).
153
+
154
+ ## Supported Devices
155
+
156
+ | Device Type | Support Status |
157
+ |:-------------:|:----------------:|
158
+ |**CUDA**|✅Supported|
159
+ |**MPS (Apple)**|✅Supported|
160
+ |**Ascend NPU (Huawei)**|✅Supported|
161
+ |**CPU**|✅Supported|
162
+
163
+ To run Dolphin on Ascend NPU, you need to install the corresponding `torch_npu` package and configure the environment `ASCEND_RT_VISIBLE_DEVICES`. The tested configuration is: `CANN==8.0.1`, `torch==2.2.0`, `torch_npu==2.2.0`. With this setup, the model has been verified to run inference correctly on the Ascend NPU.
164
+
165
+
166
+
167
+ ## Usage
168
+
169
+ ### Command-line usage
170
+
171
+ ```shell
172
+ dolphin audio.wav
173
+
174
+ # Download model and specify the model path
175
+ dolphin audio.wav --model small.fangyan --model_dir /data/models/dolphin/
176
+
177
+ # Specify language and region
178
+ dolphin audio.wav --model small.fangyan --model_dir /data/models/dolphin/ --lang_sym "zh" --region_sym "CN"
179
+
180
+ # Specify the hotwords file with Encoder-biased method
181
+ dolphin audio.wav --model small.fangyan --model_dir /data/models/dolphin/ --hotword_list_path hotwords.txt --use_deep_biasing true
182
+
183
+ # Using prompt-based model
184
+ dolphin audio.wav --model small.fangyan.prompt --model_dir /data/models/dolphin/ --hotword_list_path hotwords.txt --use_prompt_hotword true --use_two_stage_filter true
185
+
186
+ ```
187
+
188
+ ### Python usage
189
+
190
+ ```python
191
+ import dolphin
192
+ from dolphin import transcribe
193
+
194
+ model_name = 'small.fangyan'
195
+ model = dolphin.load_model(model_name, f"/home/duhu/.cache/dolphin/{model_name}", "cpu")
196
+ # model = dolphin.load_model(model_name, f"/data/models/dolphin/{model_name}", "cpu")
197
+
198
+ result = transcribe(model, 'audio.wav')
199
+ print(result.text)
200
+
201
+ # Specify language
202
+ result = transcribe(model, 'audio.wav', lang_sym="zh")
203
+ print(result.text)
204
+
205
+ # Specify language and region and encoder-biased hotwords
206
+ result = transcribe(model, 'audio.wav', lang_sym="zh", region_sym="CN", hotwords=['诺香丹青牌科研胶囊'], use_deep_biasing=True, use_two_stage_filter=True)
207
+ print(result.text)
208
+
209
+ ## prompt-based hotwords
210
+
211
+ model_name = 'small.fangyan.prompt'
212
+ model = dolphin.load_model(model_name, f"/home/duhu/.cache/dolphin/{model_name}", "cpu")
213
+
214
+ result = transcribe(model, 'audio.wav', hotwords=['诺香丹青牌科研胶囊'], use_prompt_hotword=True, use_two_stage_filter=True, decoding_method='attention')
215
+
216
+ print(result.text)
217
+
218
+ ```
219
+
220
+
221
+ ## License
222
+
223
+ Dolphin-Fangyan is released under the Apache 2.0 License.