πŸŽ™οΈ Wav2Vec2 XLS-R 300M Karakalpak ASR

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m for Automatic Speech Recognition (ASR) in the Karakalpak language.


πŸ‘€ Founder

Quyashbek Allanazarov


🀝 Acknowledgements

  • πŸŽ“ New Uzbekistan University β€” for knowledge, research environment, and academic direction
  • 🏦 Xalq Banki AI Lab β€” for providing GPU resources and supporting the technical direction of the project

πŸ“Š Model Performance

Evaluation was performed on a held-out test set.

Metric Score
WER (Word Error Rate) 21.21%
CER (Character Error Rate) 4.34%

Test Details

  • Total samples: 504
  • Successful transcriptions: 504
  • Missing files: 0
  • Errors: 0
  • Sampling rate: 16 kHz

βš™οΈ Model Details

  • Base model: facebook/wav2vec2-xls-r-300m
  • Architecture: Wav2Vec2ForCTC
  • Language: Karakalpak
  • Task: Speech-to-Text (ASR)
  • Framework: PyTorch + Hugging Face Transformers

πŸ§ͺ Inference Example

import torch
import librosa
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC

model_id = "Quyashbek/wav2vec2-xls-r-300m-karakalpak-asr"

processor = Wav2Vec2Processor.from_pretrained(model_id)
model = Wav2Vec2ForCTC.from_pretrained(model_id)

speech, sr = librosa.load("your_audio.wav", sr=16000)

inputs = processor(speech, sampling_rate=16000, return_tensors="pt", padding=True)

with torch.no_grad():
    logits = model(inputs.input_values).logits

pred_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(pred_ids)[0]

print(transcription)
Downloads last month
35
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Quyashbek/wav2vec2-xls-r-300m-karakalpak

Finetuned
(833)
this model

Collection including Quyashbek/wav2vec2-xls-r-300m-karakalpak