wenet-mn / README.md
Batuka0901's picture
Update README.md
b0f3660 verified
metadata
language:
  - mn
license: apache-2.0
tags:
  - automatic-speech-recognition
  - speech
  - wenet
  - conformer
  - mongolian
  - mn
datasets:
  - google/fleurs-mn
metrics:
  - cer
  - wer
model-index:
  - name: wenet-mn-conformer
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          type: google/fleurs-mn
          name: FLEURS Mongolian
        metrics:
          - type: loss
            value: 374.93737238103694
            name: cv_loss (best epoch)
          - type: accuracy
            value: 0.25305086622635525
            name: attention accuracy (best epoch)
          - type: cer
            value: 0.8696
            name: CER on 3-example dev set
          - type: wer
            value: 1
            name: WER on 3-example dev set

WeNet Conformer — Mongolian (Монгол хэл)

WeNet U2++ Conformer model trained on google/fleurs-mn for Mongolian (Cyrillic) automatic speech recognition.

Model architecture

  • Encoder: Conformer, 12 blocks × 256 dim, 4 heads
  • Decoder: Bi-transformer (U2++), 3 L→R + 3 R→L blocks
  • Tokenizer: char-level (38 Cyrillic tokens)
  • Loss: CTC + Attention hybrid (ctc_weight=0.3, reverse_weight=0.3)

Training data

  • Dataset: google/fleurs-mn
  • Train: 3,074 utterances · ~11.5 h
  • Test: 949 utterances · ~2.85 h
  • Audio: 16 kHz mono

Training results

  • Epochs run: 100
  • Final train loss: N/A
  • Final epoch: 99 — cv_loss N/A, acc N/A
  • Best epoch: 21 — cv_loss N/A, acc N/A
  • TensorBoard: this repo has a TensorBoard tab (see runs/).

Files

File Description
avg_10.pt Best model (averaged top-10 checkpoints by default)
train.yaml Training config
lang_char.txt Character vocabulary (38 tokens)
global_cmvn Feature normalization stats
train.log Full training log
runs/ TensorBoard events

Download model files from this repo, then:

python wenet/bin/recognize.py
--config train.yaml
--checkpoint avg_10.pt
--dict lang_char.txt
--test_data your_data.list
--mode attention_rescoring
--beam_size 10
--result_file result.txt


## Limitations

- Trained on ~11.5 h of FLEURS Mongolian — small-scale; WER/CER will be relatively high on out-of-domain speech.
- Only Cyrillic script supported; Latin characters and digits are stripped.
- No language model rescoring applied.