Twi KenLM 5-gram Language Model (Quantized (Small))

This is a 5-gram KenLM language model trained on a massive Twi corpus of 278 Million tokens and a vocabulary of 381,218 unique words.

It is specifically designed to be used with Wav2Vec2 or MMS (Massively Multilingual Speech) models for Speech-to-Text (STT) tasks to improve transcription accuracy and word correction.

Model Details

Order: 5-gram
Type: Quantized (Small)
RAM Usage: ~1.3GB
Dataset: GhanaNLP Pristine Twi (999k rows)

Usage

This model can be integrated into Hugging Face Wav2Vec2ProcessorWithLM.

from transformers import Wav2Vec2ProcessorWithLM
# Load your processor and point to this .bin file

Created with love by Mich-Seth Owusu for the Ghana NLP Community.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

ghananlpcommunity
/

twi-kenlm-5gram-small

Twi KenLM 5-gram Language Model (Quantized (Small))

Model Details

Usage

Dataset used to train ghananlpcommunity/twi-kenlm-5gram-small

Space using ghananlpcommunity/twi-kenlm-5gram-small 1