File size: 2,525 Bytes
c8eb5bd
 
 
 
 
 
 
 
 
 
 
 
3619695
c8eb5bd
 
 
 
 
 
 
 
 
 
5eb94ff
c8eb5bd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c21cb0a
c8eb5bd
 
 
 
 
 
 
 
 
 
 
6574917
c8eb5bd
 
 
 
 
 
 
 
b0f3660
 
c8eb5bd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
language:
- mn
license: apache-2.0
tags:
- automatic-speech-recognition
- speech
- wenet
- conformer
- mongolian
- mn
datasets:
- google/fleurs-mn
metrics:
- cer
- wer
model-index:
- name: wenet-mn-conformer
  results:
  - task:
      type: automatic-speech-recognition
      name: Automatic Speech Recognition
    dataset:
      type: google/fleurs-mn
      name: FLEURS Mongolian
    metrics:
      - type: loss
        value: 374.93737238103694
        name: cv_loss (best epoch)
      - type: accuracy
        value: 0.25305086622635525
        name: attention accuracy (best epoch)
      - type: cer
        value: 0.8696
        name: CER on 3-example dev set
      - type: wer
        value: 1.0000
        name: WER on 3-example dev set
---

# WeNet Conformer — Mongolian (Монгол хэл)

WeNet U2++ Conformer model trained on [`google/fleurs-mn`](https://huggingface.co/datasets/google/fleurs)
for Mongolian (Cyrillic) automatic speech recognition.

## Model architecture

- **Encoder**: Conformer, 12 blocks × 256 dim, 4 heads
- **Decoder**: Bi-transformer (U2++), 3 L→R + 3 R→L blocks
- **Tokenizer**: char-level (38 Cyrillic tokens)
- **Loss**: CTC + Attention hybrid (ctc_weight=0.3, reverse_weight=0.3)

## Training data

- **Dataset**: `google/fleurs-mn`
- **Train**: 3,074 utterances · ~11.5 h
- **Test**: 949 utterances · ~2.85 h
- **Audio**: 16 kHz mono

## Training results

- Epochs run: **100**
- Final train loss: **N/A**
- Final epoch: **99** — cv_loss **N/A**, acc **N/A**
- Best epoch: **21** — cv_loss **N/A**, acc **N/A**
- TensorBoard: this repo has a **TensorBoard** tab (see `runs/`).



## Files

| File | Description |
|------|-------------|
| `avg_10.pt`   | Best model (averaged top-10 checkpoints by default) |
| `train.yaml`     | Training config |
| `lang_char.txt`  | Character vocabulary (38 tokens) |
| `global_cmvn`    | Feature normalization stats |
| `train.log`      | Full training log |
| `runs/`          | TensorBoard events |

# Download model files from this repo, then:
python wenet/bin/recognize.py \
    --config train.yaml \
    --checkpoint avg_10.pt \
    --dict lang_char.txt \
    --test_data your_data.list \
    --mode attention_rescoring \
    --beam_size 10 \
    --result_file result.txt
```

## Limitations

- Trained on ~11.5 h of FLEURS Mongolian — small-scale; WER/CER will be relatively high on out-of-domain speech.
- Only Cyrillic script supported; Latin characters and digits are stripped.
- No language model rescoring applied.