CDLI Qwen3-ASR 1.7B English Fine-Tune
This repository contains a Qwen3-ASR model fine-tuned from
Qwen/Qwen3-ASR-1.7B on the gated
cdli/ugandan_english_nonstandard_speech_v1.0 dataset.
The task is English automatic speech recognition for atypical or non-standard speech from Ugandan speakers, including dysarthric speech.
Model Details
- Base model:
Qwen/Qwen3-ASR-1.7B - Fine-tuning framework: Qwen3-ASR SFT with PEFT/LoRA
- Language: English
- Checkpoint reported:
checkpoint-3882
Dataset
- Dataset:
cdli/ugandan_english_nonstandard_speech_v1.0 - License:
cc-by-sa-4.0(dataset card) - Split sizes used by the source dataset card:
- train:
5176 - validation:
638 - test:
1017
- train:
Evaluation artifacts in this run contain 1013 scored rows and 1013 valid normalized references.
Training Configuration
- Work root:
/jupyter_kernel/qwen3_asr_cdli_en - Base checkpoint:
Qwen/Qwen3-ASR-1.7B - Max manifest audio length:
30.0 s - Max training audio length:
30.0 s - Min audio length:
0.2 s - Train batch size:
4 - Gradient accumulation steps:
2 - Effective train batch size:
8 - Learning rate:
2e-5 - Scheduler:
cosine
Evaluation
Evaluation was run on the held-out test split using both raw transcript
comparison and normalized transcript comparison.
Corpus Metrics
- Raw WER:
29.55% - Raw CER:
16.15% - Normalized WER:
24.06% - Normalized CER:
14.93%
Average Utterance Metrics
- Average normalized utterance WER (capped at 1.0):
22.72% - Average normalized utterance CER (capped at 1.0):
14.35%
Usage
from qwen_asr import Qwen3ASRModel
model = Qwen3ASRModel.from_pretrained('KasuleTrevor/cdli-qwen3-asr-en-finetune-1')
predictions = model.transcribe(['path/to/audio.wav'])
print(predictions[0].text if hasattr(predictions[0], 'text') else predictions[0])
Files
model.safetensors: fine-tuned model weightstokenizer.json,tokenizer_config.json,preprocessor_config.json,chat_template.jsonresults/checkpoint-3882/test_predictions.csvresults/checkpoint-3882/test_predictions.jsonlresults/checkpoint-3882/test_predictions_scored.csvresults/checkpoint-3882/test_predictions_scored.jsonlresults/checkpoint-3882/test_predictions_grouped_analysis.csv
Notes
- Results are stored in a checkpoint-scoped folder under
results/. - Normalized metrics use transcript normalization to reduce punctuation, casing, and formatting noise during evaluation.
- Access to the source dataset is gated. Review dataset terms before requesting access.
- Downloads last month
- 101
Dataset used to train KasuleTrevor/cdli-qwen3-asr-en-finetune-1
Evaluation results
- Test WER (raw) on CDLI Ugandan English Non-Standard Speech v1.0test set self-reported29.547
- Test CER (raw) on CDLI Ugandan English Non-Standard Speech v1.0test set self-reported16.150
- Test WER (normalized) on CDLI Ugandan English Non-Standard Speech v1.0test set self-reported24.060
- Test CER (normalized) on CDLI Ugandan English Non-Standard Speech v1.0test set self-reported14.934