CDLI SLAM-ASR English Atypical Speech MEUSLI v1 Projector-Only Checkpoint (Epoch 3 Step 208)

Projector-only atypical-speech adaptation checkpoint for SLAM-ASR on the CDLI Ugandan English atypical speech dataset. This run starts from the SpeechTek multilingual speech LLM linear projector v1 stack and fine-tunes only the linear projector while keeping the Whisper encoder and EuroLLM decoder frozen.

What this repository contains

This Hub repository stores a partial SLAM-ASR checkpoint for use with the SLAM-LLM codebase. It is not a standalone transformers checkpoint.

  • Checkpoint type: projector_only
  • Architecture: Whisper-large-v3-turbo encoder + linear projector (mEUltilingual_speechllm_linear_projector_v1 initialization) + EuroLLM-1.7B decoder; encoder frozen; LLM frozen; no prompt; no PEFT adapters during training or decode.
  • Base encoder: openai/whisper-large-v3-turbo
  • Base LLM: utter-project/EuroLLM-1.7B
  • Exported files: model.pt

Training / evaluation context

  • Dataset: cdli/ugandan_english_nonstandard_speech_v1.0
  • Evaluation split: test
  • Training speakers: 36
  • Validation speakers: 5
  • Speaker overlap: No speaker overlap between train and validation/test

Reported metrics

  • Normalized WER (JiWER scorer): 29.38%
  • Normalized CER (JiWER scorer): 19.79%
  • Atypical overall normalized WER: 29.67%
  • Atypical overall normalized CER: 19.81%
  • Atypical averaged utterance WER: 28.05%
  • Atypical averaged utterance CER: 18.80%

Decode settings used for the reported metrics

Decode used the English MEUSLI no-prompt configuration with MAX_NEW_TOKENS=200, NUM_BEAMS=4, REPETITION_PENALTY=2.0, NO_REPEAT_NGRAM_SIZE=2, and USE_LLM_PEFT=false.

Additional results notes

This checkpoint improved the English zero-shot MEUSLI v1 baseline from 41.59% to 28.05% averaged utterance WER. Best speakers were UG014 (14.36%), UG021 (20.06%), and UG042 (24.16%). Hardest groups remained voice disorder (39.17%) and acquired hearing impairment (39.17%).

Loading notes

Load through SLAM-LLM with the exact English MEUSLI/OpenAI stack: openai/whisper-large-v3-turbo encoder, utter-project/EuroLLM-1.7B decoder, and no prompt template. This repository stores a partial SLAM-ASR checkpoint, not a standalone Transformers model.

Typical decode flow in this project uses project-specific wrappers such as:

  • examples/asr_luganda/scripts/decode_luganda_sunflower.sh for the Sunflower/Luganda stack
  • examples/asr_luganda/scripts/decode_english_meusli_openai.sh for the English MEUSLI/OpenAI stack
  • matching PEFT settings at decode time when adapters are part of the checkpoint

Caveats

  • This repository stores SLAM-ASR training artifacts intended for research use.
  • The checkpoint must be used with the matching SLAM-LLM model code and base components.
  • Results can be sensitive to decode settings and evaluation protocol.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KasuleTrevor/cdli-slam-asr-english-atypical-meusli-v1-projector-only-e3s208

Adapter
(120)
this model

Dataset used to train KasuleTrevor/cdli-slam-asr-english-atypical-meusli-v1-projector-only-e3s208