Twi
kenlm
asr
twi
ghananlp

Twi KenLM 5-gram Language Model (Quantized (Small))

This is a 5-gram KenLM language model trained on a massive Twi corpus of 278 Million tokens and a vocabulary of 381,218 unique words.

It is specifically designed to be used with Wav2Vec2 or MMS (Massively Multilingual Speech) models for Speech-to-Text (STT) tasks to improve transcription accuracy and word correction.

Model Details

  • Order: 5-gram
  • Type: Quantized (Small)
  • RAM Usage: ~1.3GB
  • Dataset: GhanaNLP Pristine Twi (999k rows)

Usage

This model can be integrated into Hugging Face Wav2Vec2ProcessorWithLM.

from transformers import Wav2Vec2ProcessorWithLM
# Load your processor and point to this .bin file

Created with love by Mich-Seth Owusu for the Ghana NLP Community.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ghananlpcommunity/twi-kenlm-5gram-small

Space using ghananlpcommunity/twi-kenlm-5gram-small 1