A newer version of this model is available: Phonepadith/whisper-3-large-turbo-lao-finetuned-v9

πŸ—£οΈ Whisper Large V3 Turbo Lao Fine-tuned (Laos V10)

Model Overview

This model is a fine-tuned version of OpenAI's Whisper Large V3 Turbo for Lao (ພາΰΊͺΰΊ²ΰΊ₯ΰΊ²ΰΊ§) automatic speech recognition (ASR). It has been fine-tuned on the Phonepadith/laos-speech-dataset, a curated dataset containing Lao speech samples and transcriptions.

🧠 Model Details

Property Description
Base model openai/whisper-large-v3-turbo
Fine-tuned by @Phonepadith
Language Lao (lo)
Task Automatic Speech Recognition (ASR)
Framework πŸ€— Transformers, PyTorch
Dataset Phonepadith/laos-speech-dataset
Sampling rate 16 kHz
License MIT (same as base model unless otherwise stated)

πŸ“Š Training Details

  • Fine-tuned on: Lao speech dataset 100 hours
  • Input: 16kHz mono audio
  • Output: Lao text transcription
  • Epochs: 13
  • Batch size: 2
  • Learning rate: 1e-5
  • Optimizer: AdamW
  • Evaluation metric: Word Error Rate (WER)

πŸš€ Usage Example

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import torchaudio

# Load model and processor
model_id = "Phonepadith/whisper-3-large-turbo-lao-finetuned-v10"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)

# Load an audio file (16kHz mono)
speech_array, sampling_rate = torchaudio.load("example.wav")
speech_array = torchaudio.functional.resample(speech_array, sampling_rate, 16000)

# Preprocess and generate transcription
input_features = processor(
    speech_array.squeeze().numpy(), 
    sampling_rate=16000, 
    return_tensors="pt"
).input_features

predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

print("Transcription:", transcription)

πŸ“ˆ Evaluation Results

Training Metrics

TrainOutput(global_step=45000, training_loss=0.03216575982156727, metrics={'train_runtime': 57148.3905, 'train_samples_per_second': 6.299, 'train_steps_per_second': 0.787, 'total_flos': 6.137824149504e+20, 'train_loss': 0.03216575982156727, 'epoch': 12.49305941143809})

🧩 Intended Use

This model is designed for speech-to-text transcription in Lao, such as:

  • Voice command systems
  • Lao language learning apps
  • Accessibility tools (subtitles, transcripts)
  • Cultural and linguistic research

⚠️ Limitations

  • May struggle with code-switching (mix of Lao and English)
  • Background noise or strong dialectal accents may reduce accuracy
  • Whisper's built-in tokenizer may occasionally normalize Lao text (tone marks or spacing)

πŸͺͺ Citation

If you use this model in your research, please cite:

@misc{phonepadith2025whisperlaoturbo,
  title = {Whisper Large V3 Turbo Fine-tuned for Lao ASR},
  author = {Phonepadith Phoummavong},
  year = {2025},
  howpublished = {\url{https://huggingface.co/Phonepadith/whisper-3-large-turbo-lao-finetuned-v10}},
}

πŸ’¬ Contact

For questions, collaboration, or dataset contributions:


Note: This model is part of ongoing efforts to improve ASR capabilities for low-resource languages like Lao. Contributions and feedback are welcome!

Downloads last month
7
Safetensors
Model size
0.8B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for aidctech/whisper-3-large-turbo-lao-finetuned-v10

Finetuned
(815)
this model

Dataset used to train aidctech/whisper-3-large-turbo-lao-finetuned-v10