DataoceanAI commited on
Commit
09e1a61
·
verified ·
1 Parent(s): ae11800

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -37
README.md CHANGED
@@ -1,44 +1,36 @@
1
  ---
2
-
3
- license: apache-2.0
4
-
5
  language:
6
-
7
  - zh
8
-
9
  tags:
10
-
11
  - speech
12
-
13
  - asr
14
-
15
- frameworks:
16
-
17
- - pytorch
18
-
19
  ---
20
 
21
- # Dolphin-Fangyan
22
 
23
  [Paper](https://arxiv.org/abs/2503.20212)
24
  [Github](https://github.com/DataoceanAI/Dolphin)
25
  [Huggingface](https://huggingface.co/DataoceanAI)
26
  [Modelscope](https://www.modelscope.cn/organization/DataoceanAI)
27
 
28
- **Dolphin-Fangyan** is a multi-dialect ASR model developed by Dataocean AI and Tsinghua University, with a strong focus on Chinese dialect recognition and real-world deployment scenarios. Compared with the previous Dolphin series, Dolphin-Fangyan introduces significant improvements in tokenizer design, dialect-balanced training, streaming capability, hotword biasing, and deployment efficiency.
29
 
30
- The model supports Mandarin Chinese and 22 Chinese dialects, while also maintaining multilingual ASR capability inherited from Dolphin. Dolphin-Fangyan supports both streaming and non-streaming inference, enabling practical deployment in latency-sensitive applications such as real-time transcription and industrial speech recognition systems.
31
 
32
 
33
  ## Approach
34
 
35
- Dolphin-Fangyan is built upon the Dolphin architecture and follows a joint CTC-Attention framework with:
36
 
37
  * Encoder: E-Branchformer
38
  * Decoder: Transformer Decoder
39
  * Training Objective: Joint CTC + Attention loss
40
 
41
- Compared to Dolphin, Dolphin-Fangyan introduces several important improvements:
42
 
43
  * Temperature-based data sampling for balancing standard Mandarin and low-resource dialects
44
  * Redesigned tokenizer with:
@@ -50,13 +42,13 @@ Compared to Dolphin, Dolphin-Fangyan introduces several important improvements:
50
  * encoder-level contextual biasing
51
  * prompt-based decoder biasing
52
 
53
- Experimental results show that Dolphin-Fangyan achieves:
54
 
55
  * 38% improvement in dialect recognition accuracy
56
  * 16.3% relative CER reduction over Dolphin
57
  * Competitive performance with recent large-scale ASR systems while maintaining a smaller model size
58
 
59
- ![Dolphin-FangYan 特色海报](dolphin_fangyan_feature_poster_v3.png)
60
 
61
 
62
  See details in the [Paper](https://arxiv.org/abs/2503.20212).
@@ -64,7 +56,7 @@ See details in the [Paper](https://arxiv.org/abs/2503.20212).
64
 
65
  ## Setup
66
 
67
- Dolphin-Fangyan requires FFmpeg to convert audio files into WAV format. Please install FFmpeg first if it is not already installed on your system.
68
 
69
  ```shell
70
  # Ubuntu / Debian
@@ -89,20 +81,20 @@ pip install git+https://github.com/DataoceanAI/Dolphin.git
89
 
90
  ## Available Models
91
 
92
- Currently, Dolphin-Fangyan provides multiple model sizes optimized for different deployment scenarios.
93
 
94
  | Model | Parameters | Hotwords |
95
  |:------:|:----------:|:----------:|
96
- | base.fangyan | 0.1 B | ❌ |
97
- | base.fangyan.streaming | 0.1 B |❌ |
98
- | small.fangyan | 0.4 B | Encoder-biased Hotwords |
99
- | small.fangyan.streaming | 0.4 B | Encoder-biased Hotwords |
100
- | small.fangyan.prompt | 0.4 B | Prompt-based Hotwords |
101
 
102
 
103
  ## Hotword Biasing
104
 
105
- Dolphin-Fangyan supports two hotword biasing approaches.
106
 
107
  **Encoder-Level Contextual Biasing**
108
 
@@ -122,7 +114,7 @@ Experimental results show significant reductions in hotword error rates while ma
122
 
123
  ## Supported Languages and Dialects
124
 
125
- Dolphin-Fangyan primarily focuses on:
126
 
127
  * Mandarin Chinese
128
  * 22 Chinese dialects
@@ -170,16 +162,16 @@ To run Dolphin on Ascend NPU, you need to install the corresponding `torch_npu`
170
  dolphin audio.wav
171
 
172
  # Download model and specify the model path
173
- dolphin audio.wav --model small.fangyan --model_dir /data/models/dolphin/
174
 
175
  # Specify language and region
176
- dolphin audio.wav --model small.fangyan --model_dir /data/models/dolphin/ --lang_sym "zh" --region_sym "CN"
177
 
178
  # Specify the hotwords file with Encoder-biased method
179
- dolphin audio.wav --model small.fangyan --model_dir /data/models/dolphin/ --hotword_list_path hotwords.txt --use_deep_biasing true
180
 
181
  # Using prompt-based model
182
- dolphin audio.wav --model small.fangyan.prompt --model_dir /data/models/dolphin/ --hotword_list_path hotwords.txt --use_prompt_hotword true --use_two_stage_filter true
183
 
184
  ```
185
 
@@ -189,8 +181,8 @@ dolphin audio.wav --model small.fangyan.prompt --model_dir /data/models/dolphin/
189
  import dolphin
190
  from dolphin import transcribe
191
 
192
- model_name = 'small.fangyan'
193
- model = dolphin.load_model(model_name, f"/data/models/dolphin/{model_name}", "cuda")
194
 
195
  result = transcribe(model, 'audio.wav')
196
  print(result.text)
@@ -205,8 +197,8 @@ print(result.text)
205
 
206
  ## prompt-based hotwords
207
 
208
- model_name = 'small.fangyan.prompt'
209
- model = dolphin.load_model(model_name, f"/data/models/dolphin/{model_name}", "cuda")
210
 
211
  result = transcribe(model, 'audio.wav', hotwords=['诺香丹青牌科研胶囊'], use_prompt_hotword=True, use_two_stage_filter=True, decoding_method='attention')
212
 
@@ -217,4 +209,4 @@ print(result.text)
217
 
218
  ## License
219
 
220
- Dolphin-Fangyan is released under the Apache 2.0 License.
 
1
  ---
2
+ frameworks:
3
+ - ""
 
4
  language:
 
5
  - zh
6
+ license: apache-2.0
7
  tags:
 
8
  - speech
 
9
  - asr
10
+ tasks: []
 
 
 
 
11
  ---
12
 
13
+ # Dolphin-CN-Dialect
14
 
15
  [Paper](https://arxiv.org/abs/2503.20212)
16
  [Github](https://github.com/DataoceanAI/Dolphin)
17
  [Huggingface](https://huggingface.co/DataoceanAI)
18
  [Modelscope](https://www.modelscope.cn/organization/DataoceanAI)
19
 
20
+ **Dolphin-CN-Dialect** is a multi-dialect ASR model developed by Dataocean AI and Tsinghua University, with a strong focus on Chinese dialect recognition and real-world deployment scenarios. Compared with the previous Dolphin series, Dolphin-CN-Dialect introduces significant improvements in tokenizer design, dialect-balanced training, streaming capability, hotword biasing, and deployment efficiency.
21
 
22
+ The model supports Mandarin Chinese and 22 Chinese dialects, while also maintaining multilingual ASR capability inherited from Dolphin. Dolphin-CN-Dialect supports both streaming and non-streaming inference, enabling practical deployment in latency-sensitive applications such as real-time transcription and industrial speech recognition systems.
23
 
24
 
25
  ## Approach
26
 
27
+ Dolphin-CN-Dialect is built upon the Dolphin architecture and follows a joint CTC-Attention framework with:
28
 
29
  * Encoder: E-Branchformer
30
  * Decoder: Transformer Decoder
31
  * Training Objective: Joint CTC + Attention loss
32
 
33
+ Compared to Dolphin, Dolphin-CN-Dialect introduces several important improvements:
34
 
35
  * Temperature-based data sampling for balancing standard Mandarin and low-resource dialects
36
  * Redesigned tokenizer with:
 
42
  * encoder-level contextual biasing
43
  * prompt-based decoder biasing
44
 
45
+ Experimental results show that Dolphin-CN-Dialect achieves:
46
 
47
  * 38% improvement in dialect recognition accuracy
48
  * 16.3% relative CER reduction over Dolphin
49
  * Competitive performance with recent large-scale ASR systems while maintaining a smaller model size
50
 
51
+ ![Dolphin-CN-Dialect 特色海报](dolphin_fangyan_feature_poster_v3.png)
52
 
53
 
54
  See details in the [Paper](https://arxiv.org/abs/2503.20212).
 
56
 
57
  ## Setup
58
 
59
+ Dolphin-CN-Dialect requires FFmpeg to convert audio files into WAV format. Please install FFmpeg first if it is not already installed on your system.
60
 
61
  ```shell
62
  # Ubuntu / Debian
 
81
 
82
  ## Available Models
83
 
84
+ Currently, Dolphin-CN-Dialect provides multiple model sizes optimized for different deployment scenarios.
85
 
86
  | Model | Parameters | Hotwords |
87
  |:------:|:----------:|:----------:|
88
+ | base.cn | 0.1 B | ❌ |
89
+ | base.cn.streaming | 0.1 B |❌ |
90
+ | small.cn | 0.4 B | Encoder-biased Hotwords |
91
+ | small.cn.streaming | 0.4 B | Encoder-biased Hotwords |
92
+ | small.cn.prompt | 0.4 B | Prompt-based Hotwords |
93
 
94
 
95
  ## Hotword Biasing
96
 
97
+ Dolphin-CN-Dialect supports two hotword biasing approaches.
98
 
99
  **Encoder-Level Contextual Biasing**
100
 
 
114
 
115
  ## Supported Languages and Dialects
116
 
117
+ Dolphin-CN-Dialect primarily focuses on:
118
 
119
  * Mandarin Chinese
120
  * 22 Chinese dialects
 
162
  dolphin audio.wav
163
 
164
  # Download model and specify the model path
165
+ dolphin audio.wav --model small.cn --model_dir /data/models/dolphin/
166
 
167
  # Specify language and region
168
+ dolphin audio.wav --model small.cn --model_dir /data/models/dolphin/ --lang_sym "zh" --region_sym "CN"
169
 
170
  # Specify the hotwords file with Encoder-biased method
171
+ dolphin audio.wav --model small.cn --model_dir /data/models/dolphin/ --hotword_list_path hotwords.txt --use_deep_biasing true
172
 
173
  # Using prompt-based model
174
+ dolphin audio.wav --model small.cn.prompt --model_dir /data/models/dolphin/ --hotword_list_path hotwords.txt --use_prompt_hotword true --use_two_stage_filter true
175
 
176
  ```
177
 
 
181
  import dolphin
182
  from dolphin import transcribe
183
 
184
+ model_name = 'small.cn'
185
+ model = dolphin.load_model(model_name, device="cuda")
186
 
187
  result = transcribe(model, 'audio.wav')
188
  print(result.text)
 
197
 
198
  ## prompt-based hotwords
199
 
200
+ model_name = 'small.cn.prompt'
201
+ model = dolphin.load_model(model_name, device="cuda")
202
 
203
  result = transcribe(model, 'audio.wav', hotwords=['诺香丹青牌科研胶囊'], use_prompt_hotword=True, use_two_stage_filter=True, decoding_method='attention')
204
 
 
209
 
210
  ## License
211
 
212
+ Dolphin-CN-Dialect is released under the Apache 2.0 License.