Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

Whisper Large v3 LoRA for Karakalpak ASR

This repository contains a LoRA adapter fine-tuned on top of openai/whisper-large-v3 for automatic speech recognition (ASR) in Karakalpak.

The model is intended for transcribing Karakalpak speech from audio into text. This repository contains the PEFT/LoRA adapter weights, not the full base model weights.

Model Details

Model Description

This is a parameter-efficient fine-tuned Whisper Large v3 model for Karakalpak speech transcription. It was trained using LoRA on top of the pretrained Whisper encoder-decoder model.

Developed by: Quyashbek
Funded by: Self-directed / internal project
Shared by: Quyashbek
Model type: Automatic Speech Recognition (ASR), Whisper LoRA adapter
Language(s): Karakalpak
License: Apache-2.0
Finetuned from model: openai/whisper-large-v3

Model Sources

Repository: TODO
Base model: openai/whisper-large-v3

Uses

Direct Use

This model is intended for:

Karakalpak speech transcription
research and experimentation in low-resource ASR
evaluating Whisper adaptation to Karakalpak speech

Because this repository contains a LoRA adapter, it should be loaded together with the original Whisper base model.

Downstream Use

Possible downstream uses include:

subtitle generation
speech-to-text preprocessing
transcription pipelines for Karakalpak audio archives
ASR benchmarking for Karakalpak speech datasets

Out-of-Scope Use

This model is not intended for:

high-stakes transcription where errors may cause harm
speaker identification
emotion recognition
reliable multilingual transcription outside Karakalpak-focused usage
heavily noisy, far-field, overlapping-speaker audio without further adaptation

Bias, Risks, and Limitations

This model may perform unevenly depending on:

speaker accent and dialect variation
recording quality
background noise
speaking speed
domain mismatch between training and test audio

Since Karakalpak is a relatively low-resource language setting, the model may underperform on speech styles or vocabulary not well represented in the training data.

The model may also hallucinate, truncate long audio, or repeat text if used without proper long-form chunking.

Recommendations

Users should:

validate transcriptions before production use
use chunked inference for long audio
test on their own domain data before deployment
avoid relying on this model alone for sensitive or high-risk applications

How to Get Started with the Model

This repository contains a LoRA adapter. Load it with the Whisper Large v3 base model.

import os
os.environ["OPENBLAS_NUM_THREADS"] = "1"
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["NUMEXPR_NUM_THREADS"] = "1"

import torch
import librosa
from peft import PeftModel
from transformers import WhisperProcessor, WhisperForConditionalGeneration

BASE_MODEL = "openai/whisper-large-v3"
ADAPTER_MODEL = "Quyashbek/whisper-large-v3-lora-karakalpak"  # change if needed
AUDIO_PATH = "sample.wav"

device = "cuda" if torch.cuda.is_available() else "cpu"

processor = WhisperProcessor.from_pretrained(BASE_MODEL, task="transcribe")
base_model = WhisperForConditionalGeneration.from_pretrained(BASE_MODEL)
model = PeftModel.from_pretrained(base_model, ADAPTER_MODEL).to(device)
model.eval()

audio, sr = librosa.load(AUDIO_PATH, sr=16000, mono=True)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
input_features = inputs.input_features.to(device)

with torch.no_grad():
    predicted_ids = model.generate(
        input_features,
        max_new_tokens=225,
    )

text = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(text)

Downloads last month: 18

Model tree for Quyashbek/whisper-large-v3-lora-karakalpak

Base model

openai/whisper-large-v3

Adapter

(197)

this model

Collection including Quyashbek/whisper-large-v3-lora-karakalpak

Karakalpak ASR

Collection

The collection of the Fine tuned Karakalpak models • 6 items • Updated 9 days ago • 1