Whisper Twi v2

Whisper Twi v2 is an improved fine-tuned version of openai/whisper-small designed for automatic speech recognition (ASR) in Akan.
This version focuses on improving transcription accuracy for Twi speech by refining the training process and optimizing preprocessing and decoding strategies for a low-resource language setting.

The goal of this model is to help expand speech technology support for African languages, particularly Akan, which is widely spoken in Ghana but currently underrepresented in modern ASR systems.


Model Details

Model Description

Whisper Twi v2 adapts the Whisper transformer-based speech recognition architecture to the Twi language. By fine-tuning the multilingual Whisper model on Twi speech data, the model learns to better represent Twi phonetics and vocabulary patterns.

  • Developed by: Tiffany Degbotse
  • Funded by: Academic research / independent project
  • Shared by: Tiffany Degbotse
  • Model type: Automatic Speech Recognition (Speech-to-Text)
  • Language(s): Twi (Akan)
  • License: Apache-2.0
  • Finetuned from model: openai/whisper-small

Model Sources


Uses

Direct Use

This model can be used for:

  • Transcribing spoken Akan audio into text
  • Building voice interfaces for Twi speakers
  • Speech accessibility tools
  • Linguistic analysis of Akan speech data

Downstream Use

Possible downstream applications include:

  • Fine-tuning on additional Twi datasets
  • Multilingual speech recognition systems
  • Voice assistants supporting Ghanaian languages
  • Speech-enabled applications for education and accessibility

Out-of-Scope Use

The model may perform poorly when:

  • Audio recordings are extremely noisy
  • Speakers frequently code-switch between multiple languages
  • Speech contains dialects not represented in the training data

Bias, Risks, and Limitations

Since the model is trained on a limited dataset, performance may vary across accents, speaking styles, and recording environments. Some dialects of Twi may be underrepresented, which can lead to reduced transcription accuracy for certain speakers.

Recommendations

Users should evaluate the model on their own datasets before deploying it in production environments and remain aware of potential performance differences across speakers and audio conditions.


How to Get Started with the Model

from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="tiffany101/whisper-twi-v2"
)

pipe("audio.wav")
Downloads last month
19
Safetensors
Model size
37.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tiffany101/whisper-twi-v2

Finetuned
(3435)
this model