Batuka0901
/

wenet-mn

Automatic Speech Recognition

Eval Results (legacy)

Model card Files Files and versions

Metrics Training metrics Community

wenet-mn / README.md

Batuka0901's picture

Update README.md

b0f3660 verified 15 days ago

|

history blame contribute delete

2.53 kB

	---
	language:
	- mn
	license: apache-2.0
	tags:
	- automatic-speech-recognition
	- speech
	- wenet
	- conformer
	- mongolian
	- mn
	datasets:
	- google/fleurs-mn
	metrics:
	- cer
	- wer
	model-index:
	- name: wenet-mn-conformer
	results:
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	type: google/fleurs-mn
	name: FLEURS Mongolian
	metrics:
	- type: loss
	value: 374.93737238103694
	name: cv_loss (best epoch)
	- type: accuracy
	value: 0.25305086622635525
	name: attention accuracy (best epoch)
	- type: cer
	value: 0.8696
	name: CER on 3-example dev set
	- type: wer
	value: 1.0000
	name: WER on 3-example dev set
	---

	# WeNet Conformer — Mongolian (Монгол хэл)

	WeNet U2++ Conformer model trained on [`google/fleurs-mn`](https://huggingface.co/datasets/google/fleurs)
	for Mongolian (Cyrillic) automatic speech recognition.

	## Model architecture

	- Encoder: Conformer, 12 blocks × 256 dim, 4 heads
	- Decoder: Bi-transformer (U2++), 3 L→R + 3 R→L blocks
	- Tokenizer: char-level (38 Cyrillic tokens)
	- Loss: CTC + Attention hybrid (ctc_weight=0.3, reverse_weight=0.3)

	## Training data

	- Dataset: `google/fleurs-mn`
	- Train: 3,074 utterances · ~11.5 h
	- Test: 949 utterances · ~2.85 h
	- Audio: 16 kHz mono

	## Training results

	- Epochs run: 100
	- Final train loss: N/A
	- Final epoch: 99 — cv_loss N/A, acc N/A
	- Best epoch: 21 — cv_loss N/A, acc N/A
	- TensorBoard: this repo has a TensorBoard tab (see `runs/`).



	## Files

	\| File \| Description \|
	\|------\|-------------\|
	\| `avg_10.pt` \| Best model (averaged top-10 checkpoints by default) \|
	\| `train.yaml` \| Training config \|
	\| `lang_char.txt` \| Character vocabulary (38 tokens) \|
	\| `global_cmvn` \| Feature normalization stats \|
	\| `train.log` \| Full training log \|
	\| `runs/` \| TensorBoard events \|

	# Download model files from this repo, then:
	python wenet/bin/recognize.py \
	--config train.yaml \
	--checkpoint avg_10.pt \
	--dict lang_char.txt \
	--test_data your_data.list \
	--mode attention_rescoring \
	--beam_size 10 \
	--result_file result.txt
	```

	## Limitations

	- Trained on ~11.5 h of FLEURS Mongolian — small-scale; WER/CER will be relatively high on out-of-domain speech.
	- Only Cyrillic script supported; Latin characters and digits are stripped.
	- No language model rescoring applied.