DataoceanAI1
/

dolphi-cn-dialect-small

Chinese

speech

asr

Model card Files Files and versions

xet

Community

DataoceanAI commited on 5 days ago

Commit

09e1a61

verified ·

1 Parent(s): ae11800

Update README.md

Browse files

Files changed (1) hide show

README.md +29 -37

README.md CHANGED Viewed

@@ -1,44 +1,36 @@
 ---
-license: apache-2.0
 language:
 - zh
 tags:
 - speech
 - asr
-frameworks:
-- pytorch
 ---
-# Dolphin-Fangyan
 [Paper](https://arxiv.org/abs/2503.20212)
 [Github](https://github.com/DataoceanAI/Dolphin)
 [Huggingface](https://huggingface.co/DataoceanAI)
 [Modelscope](https://www.modelscope.cn/organization/DataoceanAI)
-**Dolphin-Fangyan** is a multi-dialect ASR model developed by Dataocean AI and Tsinghua University, with a strong focus on Chinese dialect recognition and real-world deployment scenarios. Compared with the previous Dolphin series, Dolphin-Fangyan introduces significant improvements in tokenizer design, dialect-balanced training, streaming capability, hotword biasing, and deployment efficiency.
-The model supports Mandarin Chinese and 22 Chinese dialects, while also maintaining multilingual ASR capability inherited from Dolphin. Dolphin-Fangyan supports both streaming and non-streaming inference, enabling practical deployment in latency-sensitive applications such as real-time transcription and industrial speech recognition systems.
 ## Approach
-Dolphin-Fangyan is built upon the Dolphin architecture and follows a joint CTC-Attention framework with:
 * Encoder: E-Branchformer
 * Decoder: Transformer Decoder
 * Training Objective: Joint CTC + Attention loss
-Compared to Dolphin, Dolphin-Fangyan introduces several important improvements:
 * Temperature-based data sampling for balancing standard Mandarin and low-resource dialects
 * Redesigned tokenizer with:
@@ -50,13 +42,13 @@ Compared to Dolphin, Dolphin-Fangyan introduces several important improvements:
     * encoder-level contextual biasing
     * prompt-based decoder biasing
-Experimental results show that Dolphin-Fangyan achieves:
 * 38% improvement in dialect recognition accuracy
 * 16.3% relative CER reduction over Dolphin
 * Competitive performance with recent large-scale ASR systems while maintaining a smaller model size
-![Dolphin-FangYan 特色海报](dolphin_fangyan_feature_poster_v3.png)
 See details in the [Paper](https://arxiv.org/abs/2503.20212).
@@ -64,7 +56,7 @@ See details in the [Paper](https://arxiv.org/abs/2503.20212).
 ## Setup
-Dolphin-Fangyan requires FFmpeg to convert audio files into WAV format. Please install FFmpeg first if it is not already installed on your system.
 ```shell
 # Ubuntu / Debian
@@ -89,20 +81,20 @@ pip install git+https://github.com/DataoceanAI/Dolphin.git
 ## Available Models
-Currently, Dolphin-Fangyan provides multiple model sizes optimized for different deployment scenarios.
 |  Model  | Parameters  | Hotwords |
 |:------:|:----------:|:----------:|
-|  base.fangyan  |    0.1 B   | ❌ |
-|  base.fangyan.streaming  |    0.1 B   |❌  |
-| small.fangyan  |   0.4 B      | Encoder-biased Hotwords |
-| small.fangyan.streaming  |   0.4 B      | Encoder-biased Hotwords |
-| small.fangyan.prompt |   0.4 B      | Prompt-based Hotwords |
 ## Hotword Biasing
-Dolphin-Fangyan supports two hotword biasing approaches.
 **Encoder-Level Contextual Biasing**
@@ -122,7 +114,7 @@ Experimental results show significant reductions in hotword error rates while ma
 ## Supported Languages and Dialects
-Dolphin-Fangyan primarily focuses on:
 * Mandarin Chinese
 * 22 Chinese dialects
@@ -170,16 +162,16 @@ To run Dolphin on Ascend NPU, you need to install the corresponding `torch_npu`
 dolphin audio.wav
 # Download model and specify the model path
-dolphin audio.wav --model small.fangyan --model_dir /data/models/dolphin/
 # Specify language and region
-dolphin audio.wav --model small.fangyan --model_dir /data/models/dolphin/ --lang_sym "zh" --region_sym "CN"
 # Specify the hotwords file with Encoder-biased method
-dolphin audio.wav --model small.fangyan --model_dir /data/models/dolphin/ --hotword_list_path hotwords.txt --use_deep_biasing true
 # Using prompt-based model
-dolphin audio.wav --model small.fangyan.prompt --model_dir /data/models/dolphin/ --hotword_list_path hotwords.txt --use_prompt_hotword true --use_two_stage_filter true
 ```
@@ -189,8 +181,8 @@ dolphin audio.wav --model small.fangyan.prompt --model_dir /data/models/dolphin/
 import dolphin
 from dolphin import transcribe
-model_name = 'small.fangyan'
-model = dolphin.load_model(model_name, f"/data/models/dolphin/{model_name}", "cuda")
 result = transcribe(model, 'audio.wav')
 print(result.text)
@@ -205,8 +197,8 @@ print(result.text)
 ## prompt-based hotwords
-model_name = 'small.fangyan.prompt'
-model = dolphin.load_model(model_name, f"/data/models/dolphin/{model_name}", "cuda")
 result = transcribe(model, 'audio.wav', hotwords=['诺香丹青牌科研胶囊'], use_prompt_hotword=True, use_two_stage_filter=True, decoding_method='attention')
@@ -217,4 +209,4 @@ print(result.text)
 ## License
-Dolphin-Fangyan is released under the Apache 2.0 License.

 ---
+frameworks:
+- ""
 language:
 - zh
+license: apache-2.0
 tags:
 - speech
 - asr
+tasks: []
 ---
+# Dolphin-CN-Dialect
 [Paper](https://arxiv.org/abs/2503.20212)
 [Github](https://github.com/DataoceanAI/Dolphin)
 [Huggingface](https://huggingface.co/DataoceanAI)
 [Modelscope](https://www.modelscope.cn/organization/DataoceanAI)
+**Dolphin-CN-Dialect** is a multi-dialect ASR model developed by Dataocean AI and Tsinghua University, with a strong focus on Chinese dialect recognition and real-world deployment scenarios. Compared with the previous Dolphin series, Dolphin-CN-Dialect introduces significant improvements in tokenizer design, dialect-balanced training, streaming capability, hotword biasing, and deployment efficiency.
+The model supports Mandarin Chinese and 22 Chinese dialects, while also maintaining multilingual ASR capability inherited from Dolphin. Dolphin-CN-Dialect supports both streaming and non-streaming inference, enabling practical deployment in latency-sensitive applications such as real-time transcription and industrial speech recognition systems.
 ## Approach
+Dolphin-CN-Dialect is built upon the Dolphin architecture and follows a joint CTC-Attention framework with:
 * Encoder: E-Branchformer
 * Decoder: Transformer Decoder
 * Training Objective: Joint CTC + Attention loss
+Compared to Dolphin, Dolphin-CN-Dialect introduces several important improvements:
 * Temperature-based data sampling for balancing standard Mandarin and low-resource dialects
 * Redesigned tokenizer with:
     * encoder-level contextual biasing
     * prompt-based decoder biasing
+Experimental results show that Dolphin-CN-Dialect achieves:
 * 38% improvement in dialect recognition accuracy
 * 16.3% relative CER reduction over Dolphin
 * Competitive performance with recent large-scale ASR systems while maintaining a smaller model size
+![Dolphin-CN-Dialect 特色海报](dolphin_fangyan_feature_poster_v3.png)
 See details in the [Paper](https://arxiv.org/abs/2503.20212).
 ## Setup
+Dolphin-CN-Dialect requires FFmpeg to convert audio files into WAV format. Please install FFmpeg first if it is not already installed on your system.
 ```shell
 # Ubuntu / Debian
 ## Available Models
+Currently, Dolphin-CN-Dialect provides multiple model sizes optimized for different deployment scenarios.
 |  Model  | Parameters  | Hotwords |
 |:------:|:----------:|:----------:|
+|  base.cn  |    0.1 B   | ❌ |
+|  base.cn.streaming  |    0.1 B   |❌  |
+| small.cn  |   0.4 B      | Encoder-biased Hotwords |
+| small.cn.streaming  |   0.4 B      | Encoder-biased Hotwords |
+| small.cn.prompt |   0.4 B      | Prompt-based Hotwords |
 ## Hotword Biasing
+Dolphin-CN-Dialect supports two hotword biasing approaches.
 **Encoder-Level Contextual Biasing**
 ## Supported Languages and Dialects
+Dolphin-CN-Dialect primarily focuses on:
 * Mandarin Chinese
 * 22 Chinese dialects
 dolphin audio.wav
 # Download model and specify the model path
+dolphin audio.wav --model small.cn --model_dir /data/models/dolphin/
 # Specify language and region
+dolphin audio.wav --model small.cn --model_dir /data/models/dolphin/ --lang_sym "zh" --region_sym "CN"
 # Specify the hotwords file with Encoder-biased method
+dolphin audio.wav --model small.cn --model_dir /data/models/dolphin/ --hotword_list_path hotwords.txt --use_deep_biasing true
 # Using prompt-based model
+dolphin audio.wav --model small.cn.prompt --model_dir /data/models/dolphin/ --hotword_list_path hotwords.txt --use_prompt_hotword true --use_two_stage_filter true
 ```
 import dolphin
 from dolphin import transcribe
+model_name = 'small.cn'
+model = dolphin.load_model(model_name, device="cuda")
 result = transcribe(model, 'audio.wav')
 print(result.text)
 ## prompt-based hotwords
+model_name = 'small.cn.prompt'
+model = dolphin.load_model(model_name, device="cuda")
 result = transcribe(model, 'audio.wav', hotwords=['诺香丹青牌科研胶囊'], use_prompt_hotword=True, use_two_stage_filter=True, decoding_method='attention')
 ## License
+Dolphin-CN-Dialect is released under the Apache 2.0 License.