NbAiLab / nb-asr-beta-qwen06b-310326-forced-aligner
Norwegian forced alignment for the NB-ASR beta program
This repository hosts a Norwegian forced-alignment checkpoint for the NB-ASR beta group, built on top of the Qwen3 ForcedAligner stack and adapted for the project's Norwegian speech workflows.
Internal reference: 310326-forced-aligner
Uploaded: 31.03.2026
This model is intended for beta evaluation and pipeline integration. It is designed to be used either:
- as a standalone aligner when you already have reference text, or
- alongside an NB-ASR beta transcription model when you want timestamps in the same workflow.
Beta notice: this checkpoint is for controlled evaluation and integration work. APIs, naming, packaging advice, and companion repo IDs may still change during the beta period.
Provenance
This HF repo was prepared from the local training artifact:
Qwen3-0.6B-Forced-Aligner-310326
The packaged checkpoint uses:
model.safetensorsandgeneration_config.jsonfrom the new local training outputconfig.jsonfromassets/TEMPLATE_FORCED_ALIGNER_config.json, which replaces the default forced-aligner config- tokenizer and processor support files copied from the known-working repo
hfrepos/nb-asr-beta1-qwen-forced-aligner
This follows the project rule that the default forced-aligner config should not be used as-is for future releases.
Overview
The aligner predicts time-aligned spans for reference text given an audio input. In practical terms, it is useful when you want word- or segment-level timing for known text, or when you want to augment an ASR system with alignment output.
This repository follows the Qwen forced-alignment usage pattern and is intended to work with the broader NB-ASR evaluation setup. For upstream architecture and package behavior, use the public Qwen references as the implementation baseline:
- Base model: Qwen/Qwen3-ForcedAligner-0.6B
- Technical report: Qwen3-ASR Technical Report
Recommended Installation
The recommended route is the official qwen-asr package, which provides the compatible classes and loading behavior for Qwen ASR and forced-aligner checkpoints.
pip install -U "qwen-asr"
Optional packages for supported GPU setups:
pip install -U flash-attn --no-build-isolation
If you want the broader serving stack as well:
pip install -U "qwen-asr[vllm]"
Quickstart
Load the aligner directly and call align with audio, text, and language.
import torch
from qwen_asr import Qwen3ForcedAligner
model = Qwen3ForcedAligner.from_pretrained(
"NbAiLab/nb-asr-beta-qwen06b-310326-forced-aligner",
dtype=torch.bfloat16,
device_map="cuda:0",
# attn_implementation="flash_attention_2",
)
results = model.align(
audio="audio.wav",
text="Hun er oversatt til en rekke språk, men ikke norsk.",
language="Norwegian",
)
print(results[0])
first = results[0][0]
print(first.text, first.start_time, first.end_time)
Use Together With an NB-ASR Beta ASR Model
For transcription plus timestamps, load the ASR model and point forced_aligner to this repo.
import torch
from qwen_asr import Qwen3ASRModel
ASR_MODEL = "NbAiLab/nb-asr-beta-qwen06b-lunde03-reading"
ALIGNER_MODEL = "NbAiLab/nb-asr-beta-qwen06b-310326-forced-aligner"
model = Qwen3ASRModel.from_pretrained(
ASR_MODEL,
dtype=torch.bfloat16,
device_map="cuda:0",
max_inference_batch_size=8,
max_new_tokens=1024,
forced_aligner=ALIGNER_MODEL,
forced_aligner_kwargs=dict(
dtype=torch.bfloat16,
device_map="cuda:0",
),
)
results = model.transcribe(
audio=["/path/to/utt1.wav", "/path/to/utt2.wav"],
language=["Norwegian", "Norwegian"],
return_time_stamps=True,
)
for r in results:
print(r.language, r.text, r.time_stamps[0] if r.time_stamps else None)
vLLM Pattern
When the ASR side runs with vLLM, keep the aligner as a dedicated companion checkpoint.
import torch
from qwen_asr import Qwen3ASRModel
if __name__ == "__main__":
ASR_MODEL = "NbAiLab/nb-asr-beta-qwen06b-lunde03-reading"
ALIGNER_MODEL = "NbAiLab/nb-asr-beta-qwen06b-310326-forced-aligner"
model = Qwen3ASRModel.LLM(
model=ASR_MODEL,
gpu_memory_utilization=0.7,
max_inference_batch_size=32,
max_new_tokens=4096,
forced_aligner=ALIGNER_MODEL,
forced_aligner_kwargs=dict(
dtype=torch.bfloat16,
device_map="cuda:0",
),
)
results = model.transcribe(
audio=["/path/to/audio.wav"],
language=["Norwegian"],
return_time_stamps=True,
)
for r in results:
print(r.language, r.text, r.time_stamps)
Demo and Local Download
For Gradio-style testing with the Qwen demo tooling, pass this repo as the aligner checkpoint:
--aligner-checkpoint NbAiLab/nb-asr-beta-qwen06b-310326-forced-aligner
To download locally:
pip install -U "huggingface_hub[cli]"
hf download NbAiLab/nb-asr-beta-qwen06b-310326-forced-aligner --local-dir ./nb-asr-beta-qwen06b-310326-forced-aligner
Included Files
This staged HF repository includes:
model.safetensorsgeneration_config.jsonconfig.jsontokenizer.jsontokenizer_config.jsonspecial_tokens_map.jsonvocab.jsonmerges.txtadded_tokens.jsonchat_template.jinjapreprocessor_config.jsonaudio.wav
Training-state files such as optimizer state, scheduler state, RNG snapshots, and trainer metadata were intentionally left out of this HF package.
Intended Use
This model is intended for:
- beta evaluation,
- alignment experiments,
- timestamp generation in NB-ASR workflows,
- and integration into internal or semi-controlled ASR pipelines.
It should not be presented as a final public production release without additional validation, packaging review, and naming review.
Acknowledgements
This model is based on the open Qwen3-ASR framework and adapted by NB-ASR project at the National Library.
The following persons have contributed to the dataset creation and training:
- Freddy Wetjen
- Thea Tollersrud
- Phoebe Parsons
- Per Egil Kummervold
- Downloads last month
- 15