Update README.md

3d9635c verified 6 days ago

6.24 kB

	---
	frameworks:
	- ""
	language:
	- zh
	license: apache-2.0
	tags:
	- speech
	- asr
	tasks: []
	---

	# Dolphin-CN-Dialect

	[Paper](https://arxiv.org/abs/2503.20212)
	[Github](https://github.com/DataoceanAI/Dolphin)
	[Huggingface](https://huggingface.co/DataoceanAI)
	[Modelscope](https://www.modelscope.cn/organization/DataoceanAI)

	Dolphin-CN-Dialect is a multi-dialect ASR model developed by Dataocean AI and Tsinghua University, with a strong focus on Chinese dialect recognition and real-world deployment scenarios. Compared with the previous Dolphin series, Dolphin-CN-Dialect introduces significant improvements in tokenizer design, dialect-balanced training, streaming capability, hotword biasing, and deployment efficiency.

	The model supports Mandarin Chinese and 22 Chinese dialects, while also maintaining multilingual ASR capability inherited from Dolphin. Dolphin-CN-Dialect supports both streaming and non-streaming inference, enabling practical deployment in latency-sensitive applications such as real-time transcription and industrial speech recognition systems.


	## Approach

	Dolphin-CN-Dialect is built upon the Dolphin architecture and follows a joint CTC-Attention framework with:

	* Encoder: E-Branchformer
	* Decoder: Transformer Decoder
	* Training Objective: Joint CTC + Attention loss

	Compared to Dolphin, Dolphin-CN-Dialect introduces several important improvements:

	* Temperature-based data sampling for balancing standard Mandarin and low-resource dialects
	* Redesigned tokenizer with:
	* character-level modeling for Chinese
	* BPE-based subword modeling for English
	* extensible dialect tokens
	* Streaming ASR support
	* Hotword-biased decoding, including:
	* encoder-level contextual biasing
	* prompt-based decoder biasing

	Experimental results show that Dolphin-CN-Dialect achieves:

	* 38% improvement in dialect recognition accuracy
	* 16.3% relative CER reduction over Dolphin
	* Competitive performance with recent large-scale ASR systems while maintaining a smaller model size

	![Dolphin-CN-Dialect 特色海报](dolphin_fangyan_feature_poster_v3.png)


	See details in the [Paper](https://arxiv.org/abs/2503.20212).


	## Setup

	Dolphin-CN-Dialect requires FFmpeg to convert audio files into WAV format. Please install FFmpeg first if it is not already installed on your system.

	```shell
	# Ubuntu / Debian
	sudo apt update && sudo apt install ffmpeg
	# MacOS
	brew install ffmpeg
	# Windows
	choco install ffmpeg
	```

	Install Dolphin with pip:

	```shell
	pip install -U dolphin
	```

	Alternatively, install from source:

	```shell
	pip install git+https://github.com/DataoceanAI/Dolphin.git
	```

	## Available Models

	Currently, Dolphin-CN-Dialect provides multiple model sizes optimized for different deployment scenarios.

	\| Model \| Parameters \| Hotwords \|
	\|:------:\|:----------:\|:----------:\|
	\| base.cn \| 0.1 B \| ❌ \|
	\| base.cn.streaming \| 0.1 B \|❌ \|
	\| small.cn \| 0.4 B \| Encoder-biased Hotwords \|
	\| small.cn.streaming \| 0.4 B \| Encoder-biased Hotwords \|
	\| small.cn.prompt \| 0.4 B \| Prompt-based Hotwords \|


	## Hotword Biasing

	Dolphin-CN-Dialect supports two hotword biasing approaches.

	Encoder-Level Contextual Biasing

	* Supports both streaming and non-streaming models
	* Integrates contextual embeddings into encoder representations
	* Efficient adaptation without retraining the full model

	Prompt-Based Hotword Biasing

	* Designed for non-streaming models
	* Injects hotwords directly into decoder prompts
	* Particularly effective for long-tail and rare phrases

	Experimental results show significant reductions in hotword error rates while maintaining strong overall ASR performance.



	## Supported Languages and Dialects

	Dolphin-CN-Dialect primarily focuses on:

	* Mandarin Chinese
	* 22 Chinese dialects
	* Regional accented Mandarin

	Supported dialects include:

	* Sichuan
	* Wu
	* Minnan
	* Shanghai
	* Gansu
	* Guangdong
	* Wenzhou
	* Hunan
	* Anhui
	* Henan
	* Fujian
	* Hebei
	* Liaoning
	* Shaanxi
	* Tianjin
	* and more

	For the complete language and dialect list, see [languages.md](./languages.md).

	## Supported Devices

	\| Device Type \| Support Status \|
	\|:-------------:\|:----------------:\|
	\|CUDA\|✅Supported\|
	\|MPS (Apple)\|✅Supported\|
	\|Ascend NPU (Huawei)\|✅Supported\|
	\|CPU\|✅Supported\|

	To run Dolphin on Ascend NPU, you need to install the corresponding `torch_npu` package and configure the environment `ASCEND_RT_VISIBLE_DEVICES`. The tested configuration is: `CANN==8.0.1`, `torch==2.2.0`, `torch_npu==2.2.0`. With this setup, the model has been verified to run inference correctly on the Ascend NPU.



	## Usage

	### Command-line usage

	```shell
	dolphin audio.wav

	# Download model and specify the model path
	dolphin audio.wav --model small.cn --model_dir /data/models/dolphin/

	# Specify language and region
	dolphin audio.wav --model small.cn --model_dir /data/models/dolphin/ --lang_sym "zh" --region_sym "CN"

	# Specify the hotwords file with Encoder-biased method
	dolphin audio.wav --model small.cn --model_dir /data/models/dolphin/ --hotword_list_path hotwords.txt --use_deep_biasing true

	# Using prompt-based model
	dolphin audio.wav --model small.cn.prompt --model_dir /data/models/dolphin/ --hotword_list_path hotwords.txt --use_prompt_hotword true --use_two_stage_filter true

	```

	### Python usage

	```python
	import dolphin
	from dolphin import transcribe

	model_name = 'small.cn'
	model = dolphin.load_model(model_name, device="cuda")

	result = transcribe(model, 'audio.wav')
	print(result.text)

	# Specify language
	result = transcribe(model, 'audio.wav', lang_sym="zh")
	print(result.text)

	# Specify language and region and encoder-biased hotwords
	result = transcribe(model, 'audio.wav', lang_sym="zh", region_sym="CN", hotwords=['诺香丹青牌科研胶囊'], use_deep_biasing=True, use_two_stage_filter=True)
	print(result.text)

	## prompt-based hotwords

	model_name = 'small.cn.prompt'
	model = dolphin.load_model(model_name, device="cuda")

	result = transcribe(model, 'audio.wav', hotwords=['诺香丹青牌科研胶囊'], use_prompt_hotword=True, use_two_stage_filter=True, decoding_method='attention')

	print(result.text)

	```


	## License

	Dolphin-CN-Dialect is released under the Apache 2.0 License.