Fongbe ASR model w/out diacritics

How to use for inference

from speechbrain.inference.ASR import StreamingASR
from speechbrain.utils.dynamic_chunk_training import DynChunkTrainConfig

asr_model = StreamingASR.from_hparams(
    source="whettenr/asr-fon-streaming-conformer-without-diacritics",
    savedir="pretrained_models/asr-fon-streaming-conformer-without-diacritics"
)

asr_model.transcribe_file(
    "whettenr/asr-fon-without-diacritics/example.wav",
    # select a chunk size of ~960ms with 4 chunks of left context
    DynChunkTrainConfig(24, 8),
    # disable torchaudio streaming to allow fetching from HuggingFace
    # set this to True for your own files or streams to allow for streaming file decoding
    use_torchaudio_streaming=False,
)

# expected output:
# huzuhuzu gɔngɔn ɖe ɖo dandan

Details of model

~100M parameters, 12 layer conformer encoder, Transducer (LSTM) decoder

Details of training

pretrained using BEST-RQ on 700 hours for 400k steps
- 140 hours of Fongbé from:
  - FFSTC 2 + beethogedeon/fongbe-speech (~40 hours)
  - cappfm (~100 hours)
- 140 hours of English and French (from Librispeech)
- 140 hours of Hausa and Yoruba from VoxLingua107,CommonVoice 23.0 and BibleTTS
finetuned with Transducer decoder loss on training sets of
- FFSTC 2
- beethogedeon/fongbe-speech
- a small portion of automatically generated transcriptions from cappfm audio
- Sentence Piece BPE (set to 100)

# other citation coming soon

# dataset citation
@inproceedings{kponou25_interspeech,
  title     = {{Extending the Fongbe to French Speech Translation Corpus:  resources, models and benchmark}},
  author    = {D. Fortuné Kponou and Salima Mdhaffar and Fréjus A. A. Laleye and Eugène C. Ezin and Yannick Estève},
  year      = {2025},
  booktitle = {{Interspeech 2025}},
  pages     = {4533--4537},
  doi       = {10.21437/Interspeech.2025-1801},
  issn      = {2958-1796},
}

Downloads last month: 2

whettenr
/

asr-fon-streaming-conformer-without-diacritics

Fongbe ASR model w/out diacritics

How to use for inference

Details of model

Details of training

Dataset used to train whettenr/asr-fon-streaming-conformer-without-diacritics