--- language: - mn license: apache-2.0 tags: - automatic-speech-recognition - speech - wenet - conformer - mongolian - mn datasets: - google/fleurs-mn metrics: - cer - wer model-index: - name: wenet-mn-conformer results: - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: type: google/fleurs-mn name: FLEURS Mongolian metrics: - type: loss value: 374.93737238103694 name: cv_loss (best epoch) - type: accuracy value: 0.25305086622635525 name: attention accuracy (best epoch) - type: cer value: 0.8696 name: CER on 3-example dev set - type: wer value: 1.0000 name: WER on 3-example dev set --- # WeNet Conformer — Mongolian (Монгол хэл) WeNet U2++ Conformer model trained on [`google/fleurs-mn`](https://huggingface.co/datasets/google/fleurs) for Mongolian (Cyrillic) automatic speech recognition. ## Model architecture - **Encoder**: Conformer, 12 blocks × 256 dim, 4 heads - **Decoder**: Bi-transformer (U2++), 3 L→R + 3 R→L blocks - **Tokenizer**: char-level (38 Cyrillic tokens) - **Loss**: CTC + Attention hybrid (ctc_weight=0.3, reverse_weight=0.3) ## Training data - **Dataset**: `google/fleurs-mn` - **Train**: 3,074 utterances · ~11.5 h - **Test**: 949 utterances · ~2.85 h - **Audio**: 16 kHz mono ## Training results - Epochs run: **100** - Final train loss: **N/A** - Final epoch: **99** — cv_loss **N/A**, acc **N/A** - Best epoch: **21** — cv_loss **N/A**, acc **N/A** - TensorBoard: this repo has a **TensorBoard** tab (see `runs/`). ## Files | File | Description | |------|-------------| | `avg_10.pt` | Best model (averaged top-10 checkpoints by default) | | `train.yaml` | Training config | | `lang_char.txt` | Character vocabulary (38 tokens) | | `global_cmvn` | Feature normalization stats | | `train.log` | Full training log | | `runs/` | TensorBoard events | # Download model files from this repo, then: python wenet/bin/recognize.py \ --config train.yaml \ --checkpoint avg_10.pt \ --dict lang_char.txt \ --test_data your_data.list \ --mode attention_rescoring \ --beam_size 10 \ --result_file result.txt ``` ## Limitations - Trained on ~11.5 h of FLEURS Mongolian — small-scale; WER/CER will be relatively high on out-of-domain speech. - Only Cyrillic script supported; Latin characters and digits are stripped. - No language model rescoring applied.