Fongbe ASR model w/out diacritics
How to use for inference
from speechbrain.inference.ASR import StreamingASR
from speechbrain.utils.dynamic_chunk_training import DynChunkTrainConfig
asr_model = StreamingASR.from_hparams(
source="whettenr/asr-fon-streaming-conformer-without-diacritics",
savedir="pretrained_models/asr-fon-streaming-conformer-without-diacritics"
)
asr_model.transcribe_file(
"whettenr/asr-fon-without-diacritics/example.wav",
# select a chunk size of ~960ms with 4 chunks of left context
DynChunkTrainConfig(24, 8),
# disable torchaudio streaming to allow fetching from HuggingFace
# set this to True for your own files or streams to allow for streaming file decoding
use_torchaudio_streaming=False,
)
# expected output:
# huzuhuzu gɔngɔn ɖe ɖo dandan
Details of model
~100M parameters, 12 layer conformer encoder, Transducer (LSTM) decoder
Details of training
pretrained using BEST-RQ on 700 hours for 400k steps
- 140 hours of Fongbé from:
- FFSTC 2 + beethogedeon/fongbe-speech (~40 hours)
- cappfm (~100 hours)
- 140 hours of English and French (from Librispeech)
- 140 hours of Hausa and Yoruba from VoxLingua107,CommonVoice 23.0 and BibleTTS
- 140 hours of Fongbé from:
finetuned with Transducer decoder loss on training sets of
- FFSTC 2
- beethogedeon/fongbe-speech
- a small portion of automatically generated transcriptions from cappfm audio
- Sentence Piece BPE (set to 100)
# other citation coming soon
# dataset citation
@inproceedings{kponou25_interspeech,
title = {{Extending the Fongbe to French Speech Translation Corpus: resources, models and benchmark}},
author = {D. Fortuné Kponou and Salima Mdhaffar and Fréjus A. A. Laleye and Eugène C. Ezin and Yannick Estève},
year = {2025},
booktitle = {{Interspeech 2025}},
pages = {4533--4537},
doi = {10.21437/Interspeech.2025-1801},
issn = {2958-1796},
}
- Downloads last month
- 2