You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

NbAiLab / nb-asr-beta-qwen06b-310326-forced-aligner

Norwegian forced alignment for the NB-ASR beta program

This repository hosts a Norwegian forced-alignment checkpoint for the NB-ASR beta group, built on top of the Qwen3 ForcedAligner stack and adapted for the project's Norwegian speech workflows.

Internal reference: 310326-forced-aligner

Uploaded: 31.03.2026

This model is intended for beta evaluation and pipeline integration. It is designed to be used either:

  • as a standalone aligner when you already have reference text, or
  • alongside an NB-ASR beta transcription model when you want timestamps in the same workflow.

Beta notice: this checkpoint is for controlled evaluation and integration work. APIs, naming, packaging advice, and companion repo IDs may still change during the beta period.

Provenance

This HF repo was prepared from the local training artifact:

Qwen3-0.6B-Forced-Aligner-310326

The packaged checkpoint uses:

  • model.safetensors and generation_config.json from the new local training output
  • config.json from assets/TEMPLATE_FORCED_ALIGNER_config.json, which replaces the default forced-aligner config
  • tokenizer and processor support files copied from the known-working repo hfrepos/nb-asr-beta1-qwen-forced-aligner

This follows the project rule that the default forced-aligner config should not be used as-is for future releases.

Overview

The aligner predicts time-aligned spans for reference text given an audio input. In practical terms, it is useful when you want word- or segment-level timing for known text, or when you want to augment an ASR system with alignment output.

This repository follows the Qwen forced-alignment usage pattern and is intended to work with the broader NB-ASR evaluation setup. For upstream architecture and package behavior, use the public Qwen references as the implementation baseline:

Recommended Installation

The recommended route is the official qwen-asr package, which provides the compatible classes and loading behavior for Qwen ASR and forced-aligner checkpoints.

pip install -U "qwen-asr"

Optional packages for supported GPU setups:

pip install -U flash-attn --no-build-isolation

If you want the broader serving stack as well:

pip install -U "qwen-asr[vllm]"

Quickstart

Load the aligner directly and call align with audio, text, and language.

import torch
from qwen_asr import Qwen3ForcedAligner

model = Qwen3ForcedAligner.from_pretrained(
    "NbAiLab/nb-asr-beta-qwen06b-310326-forced-aligner",
    dtype=torch.bfloat16,
    device_map="cuda:0",
    # attn_implementation="flash_attention_2",
)

results = model.align(
    audio="audio.wav",
    text="Hun er oversatt til en rekke språk, men ikke norsk.",
    language="Norwegian",
)

print(results[0])
first = results[0][0]
print(first.text, first.start_time, first.end_time)

Use Together With an NB-ASR Beta ASR Model

For transcription plus timestamps, load the ASR model and point forced_aligner to this repo.

import torch
from qwen_asr import Qwen3ASRModel

ASR_MODEL = "NbAiLab/nb-asr-beta-qwen06b-lunde03-reading"
ALIGNER_MODEL = "NbAiLab/nb-asr-beta-qwen06b-310326-forced-aligner"

model = Qwen3ASRModel.from_pretrained(
    ASR_MODEL,
    dtype=torch.bfloat16,
    device_map="cuda:0",
    max_inference_batch_size=8,
    max_new_tokens=1024,
    forced_aligner=ALIGNER_MODEL,
    forced_aligner_kwargs=dict(
        dtype=torch.bfloat16,
        device_map="cuda:0",
    ),
)

results = model.transcribe(
    audio=["/path/to/utt1.wav", "/path/to/utt2.wav"],
    language=["Norwegian", "Norwegian"],
    return_time_stamps=True,
)

for r in results:
    print(r.language, r.text, r.time_stamps[0] if r.time_stamps else None)

vLLM Pattern

When the ASR side runs with vLLM, keep the aligner as a dedicated companion checkpoint.

import torch
from qwen_asr import Qwen3ASRModel

if __name__ == "__main__":
    ASR_MODEL = "NbAiLab/nb-asr-beta-qwen06b-lunde03-reading"
    ALIGNER_MODEL = "NbAiLab/nb-asr-beta-qwen06b-310326-forced-aligner"

    model = Qwen3ASRModel.LLM(
        model=ASR_MODEL,
        gpu_memory_utilization=0.7,
        max_inference_batch_size=32,
        max_new_tokens=4096,
        forced_aligner=ALIGNER_MODEL,
        forced_aligner_kwargs=dict(
            dtype=torch.bfloat16,
            device_map="cuda:0",
        ),
    )

    results = model.transcribe(
        audio=["/path/to/audio.wav"],
        language=["Norwegian"],
        return_time_stamps=True,
    )

    for r in results:
        print(r.language, r.text, r.time_stamps)

Demo and Local Download

For Gradio-style testing with the Qwen demo tooling, pass this repo as the aligner checkpoint:

--aligner-checkpoint NbAiLab/nb-asr-beta-qwen06b-310326-forced-aligner

To download locally:

pip install -U "huggingface_hub[cli]"
hf download NbAiLab/nb-asr-beta-qwen06b-310326-forced-aligner --local-dir ./nb-asr-beta-qwen06b-310326-forced-aligner

Included Files

This staged HF repository includes:

  • model.safetensors
  • generation_config.json
  • config.json
  • tokenizer.json
  • tokenizer_config.json
  • special_tokens_map.json
  • vocab.json
  • merges.txt
  • added_tokens.json
  • chat_template.jinja
  • preprocessor_config.json
  • audio.wav

Training-state files such as optimizer state, scheduler state, RNG snapshots, and trainer metadata were intentionally left out of this HF package.

Intended Use

This model is intended for:

  • beta evaluation,
  • alignment experiments,
  • timestamp generation in NB-ASR workflows,
  • and integration into internal or semi-controlled ASR pipelines.

It should not be presented as a final public production release without additional validation, packaging review, and naming review.

Acknowledgements

This model is based on the open Qwen3-ASR framework and adapted by NB-ASR project at the National Library.

The following persons have contributed to the dataset creation and training:

  • Freddy Wetjen
  • Thea Tollersrud
  • Phoebe Parsons
  • Per Egil Kummervold
Downloads last month
15
Safetensors
Model size
0.9B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including NbAiLab/nb-asr-beta-qwen06b-310326-forced-aligner

Paper for NbAiLab/nb-asr-beta-qwen06b-310326-forced-aligner