Twi KenLM 5-gram Language Model (Quantized (Small))
This is a 5-gram KenLM language model trained on a massive Twi corpus of 278 Million tokens and a vocabulary of 381,218 unique words.
It is specifically designed to be used with Wav2Vec2 or MMS (Massively Multilingual Speech) models for Speech-to-Text (STT) tasks to improve transcription accuracy and word correction.
Model Details
- Order: 5-gram
- Type: Quantized (Small)
- RAM Usage: ~1.3GB
- Dataset: GhanaNLP Pristine Twi (999k rows)
Usage
This model can be integrated into Hugging Face Wav2Vec2ProcessorWithLM.
from transformers import Wav2Vec2ProcessorWithLM
# Load your processor and point to this .bin file
Created with love by Mich-Seth Owusu for the Ghana NLP Community.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support