Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string
Whisper Large v3 LoRA for Karakalpak ASR
This repository contains a LoRA adapter fine-tuned on top of openai/whisper-large-v3 for automatic speech recognition (ASR) in Karakalpak.
The model is intended for transcribing Karakalpak speech from audio into text. This repository contains the PEFT/LoRA adapter weights, not the full base model weights.
Model Details
Model Description
This is a parameter-efficient fine-tuned Whisper Large v3 model for Karakalpak speech transcription. It was trained using LoRA on top of the pretrained Whisper encoder-decoder model.
- Developed by: Quyashbek
- Funded by: Self-directed / internal project
- Shared by: Quyashbek
- Model type: Automatic Speech Recognition (ASR), Whisper LoRA adapter
- Language(s): Karakalpak
- License: Apache-2.0
- Finetuned from model:
openai/whisper-large-v3
Model Sources
- Repository: TODO
- Base model:
openai/whisper-large-v3
Uses
Direct Use
This model is intended for:
- Karakalpak speech transcription
- research and experimentation in low-resource ASR
- evaluating Whisper adaptation to Karakalpak speech
Because this repository contains a LoRA adapter, it should be loaded together with the original Whisper base model.
Downstream Use
Possible downstream uses include:
- subtitle generation
- speech-to-text preprocessing
- transcription pipelines for Karakalpak audio archives
- ASR benchmarking for Karakalpak speech datasets
Out-of-Scope Use
This model is not intended for:
- high-stakes transcription where errors may cause harm
- speaker identification
- emotion recognition
- reliable multilingual transcription outside Karakalpak-focused usage
- heavily noisy, far-field, overlapping-speaker audio without further adaptation
Bias, Risks, and Limitations
This model may perform unevenly depending on:
- speaker accent and dialect variation
- recording quality
- background noise
- speaking speed
- domain mismatch between training and test audio
Since Karakalpak is a relatively low-resource language setting, the model may underperform on speech styles or vocabulary not well represented in the training data.
The model may also hallucinate, truncate long audio, or repeat text if used without proper long-form chunking.
Recommendations
Users should:
- validate transcriptions before production use
- use chunked inference for long audio
- test on their own domain data before deployment
- avoid relying on this model alone for sensitive or high-risk applications
How to Get Started with the Model
This repository contains a LoRA adapter. Load it with the Whisper Large v3 base model.
import os
os.environ["OPENBLAS_NUM_THREADS"] = "1"
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["NUMEXPR_NUM_THREADS"] = "1"
import torch
import librosa
from peft import PeftModel
from transformers import WhisperProcessor, WhisperForConditionalGeneration
BASE_MODEL = "openai/whisper-large-v3"
ADAPTER_MODEL = "Quyashbek/whisper-large-v3-lora-karakalpak" # change if needed
AUDIO_PATH = "sample.wav"
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = WhisperProcessor.from_pretrained(BASE_MODEL, task="transcribe")
base_model = WhisperForConditionalGeneration.from_pretrained(BASE_MODEL)
model = PeftModel.from_pretrained(base_model, ADAPTER_MODEL).to(device)
model.eval()
audio, sr = librosa.load(AUDIO_PATH, sr=16000, mono=True)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
input_features = inputs.input_features.to(device)
with torch.no_grad():
predicted_ids = model.generate(
input_features,
max_new_tokens=225,
)
text = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(text)
- Downloads last month
- 18
Model tree for Quyashbek/whisper-large-v3-lora-karakalpak
Base model
openai/whisper-large-v3