wenet-mn / README.md

Batuka0901

Update README.md

b0f3660 verified 15 days ago

preview code

raw

history blame contribute delete

2.53 kB

metadata

language:
  - mn
license: apache-2.0
tags:
  - automatic-speech-recognition
  - speech
  - wenet
  - conformer
  - mongolian
  - mn
datasets:
  - google/fleurs-mn
metrics:
  - cer
  - wer
model-index:
  - name: wenet-mn-conformer
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          type: google/fleurs-mn
          name: FLEURS Mongolian
        metrics:
          - type: loss
            value: 374.93737238103694
            name: cv_loss (best epoch)
          - type: accuracy
            value: 0.25305086622635525
            name: attention accuracy (best epoch)
          - type: cer
            value: 0.8696
            name: CER on 3-example dev set
          - type: wer
            value: 1
            name: WER on 3-example dev set

WeNet Conformer — Mongolian (Монгол хэл)

WeNet U2++ Conformer model trained on google/fleurs-mn for Mongolian (Cyrillic) automatic speech recognition.

Model architecture

Encoder: Conformer, 12 blocks × 256 dim, 4 heads
Decoder: Bi-transformer (U2++), 3 L→R + 3 R→L blocks
Tokenizer: char-level (38 Cyrillic tokens)
Loss: CTC + Attention hybrid (ctc_weight=0.3, reverse_weight=0.3)

Training data

Dataset: google/fleurs-mn
Train: 3,074 utterances · ~11.5 h
Test: 949 utterances · ~2.85 h
Audio: 16 kHz mono

Training results

Epochs run: 100
Final train loss: N/A
Final epoch: 99 — cv_loss N/A, acc N/A
Best epoch: 21 — cv_loss N/A, acc N/A
TensorBoard: this repo has a TensorBoard tab (see runs/).

Files

File	Description
`avg_10.pt`	Best model (averaged top-10 checkpoints by default)
`train.yaml`	Training config
`lang_char.txt`	Character vocabulary (38 tokens)
`global_cmvn`	Feature normalization stats
`train.log`	Full training log
`runs/`	TensorBoard events

Download model files from this repo, then:

python wenet/bin/recognize.py
--config train.yaml
--checkpoint avg_10.pt
--dict lang_char.txt
--test_data your_data.list
--mode attention_rescoring
--beam_size 10
--result_file result.txt


## Limitations

- Trained on ~11.5 h of FLEURS Mongolian — small-scale; WER/CER will be relatively high on out-of-domain speech.
- Only Cyrillic script supported; Latin characters and digits are stripped.
- No language model rescoring applied.