wenet-mn / README.md
Batuka0901's picture
Update README.md
b0f3660 verified
---
language:
- mn
license: apache-2.0
tags:
- automatic-speech-recognition
- speech
- wenet
- conformer
- mongolian
- mn
datasets:
- google/fleurs-mn
metrics:
- cer
- wer
model-index:
- name: wenet-mn-conformer
results:
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
type: google/fleurs-mn
name: FLEURS Mongolian
metrics:
- type: loss
value: 374.93737238103694
name: cv_loss (best epoch)
- type: accuracy
value: 0.25305086622635525
name: attention accuracy (best epoch)
- type: cer
value: 0.8696
name: CER on 3-example dev set
- type: wer
value: 1.0000
name: WER on 3-example dev set
---
# WeNet Conformer — Mongolian (Монгол хэл)
WeNet U2++ Conformer model trained on [`google/fleurs-mn`](https://huggingface.co/datasets/google/fleurs)
for Mongolian (Cyrillic) automatic speech recognition.
## Model architecture
- **Encoder**: Conformer, 12 blocks × 256 dim, 4 heads
- **Decoder**: Bi-transformer (U2++), 3 L→R + 3 R→L blocks
- **Tokenizer**: char-level (38 Cyrillic tokens)
- **Loss**: CTC + Attention hybrid (ctc_weight=0.3, reverse_weight=0.3)
## Training data
- **Dataset**: `google/fleurs-mn`
- **Train**: 3,074 utterances · ~11.5 h
- **Test**: 949 utterances · ~2.85 h
- **Audio**: 16 kHz mono
## Training results
- Epochs run: **100**
- Final train loss: **N/A**
- Final epoch: **99** — cv_loss **N/A**, acc **N/A**
- Best epoch: **21** — cv_loss **N/A**, acc **N/A**
- TensorBoard: this repo has a **TensorBoard** tab (see `runs/`).
## Files
| File | Description |
|------|-------------|
| `avg_10.pt` | Best model (averaged top-10 checkpoints by default) |
| `train.yaml` | Training config |
| `lang_char.txt` | Character vocabulary (38 tokens) |
| `global_cmvn` | Feature normalization stats |
| `train.log` | Full training log |
| `runs/` | TensorBoard events |
# Download model files from this repo, then:
python wenet/bin/recognize.py \
--config train.yaml \
--checkpoint avg_10.pt \
--dict lang_char.txt \
--test_data your_data.list \
--mode attention_rescoring \
--beam_size 10 \
--result_file result.txt
```
## Limitations
- Trained on ~11.5 h of FLEURS Mongolian — small-scale; WER/CER will be relatively high on out-of-domain speech.
- Only Cyrillic script supported; Latin characters and digits are stripped.
- No language model rescoring applied.