Parakeet TDT 0.6B v3 — Basque (Euskara) · ONNX-ASR

ONNX export of itzune/parakeet-tdt-0.6b-v3-basque packaged for onnx-asr — a lightweight, pure-Python speech recognition library that runs entirely on ONNX Runtime, no PyTorch or NeMo required.

The encoder is INT8 dynamically quantised, reducing its size from ~2.3 GB to ~623 MB.

Model details

Property	Value
Architecture	FastConformer RNNT-TDT (Parakeet TDT 0.6B v3)
Language	Basque (`eu`)
Sample rate	16 kHz mono
Parameters	~600 M
Vocabulary size	1024 tokens (SentencePiece BPE)
Quantisation	INT8 dynamic encoder (`encoder-model.int8.onnx`)
onnx-asr model type	`nemo-conformer-tdt`
Features size	128 log-mel filterbanks
Subsampling factor	8
Max tokens per step	10
Base model	nvidia/parakeet-tdt-0.6b-v3
Fine-tuned model	itzune/parakeet-tdt-0.6b-v3-basque
Fine-tuning framework	NVIDIA NeMo
Hardware	NVIDIA L40 (48 GB)

Files

File	Size	Description
`encoder-model.int8.onnx`	623 MB	INT8 encoder (FastConformer, self-contained)
`decoder_joint-model.onnx`	70 MB	Decoder + joint network (FP32)
`vocab.txt`	92 KB	Vocabulary (one token per line)
`config.json`	—	onnx-asr model configuration

config.json contents:

{
  "model_type": "nemo-conformer-tdt",
  "features_size": 128,
  "subsampling_factor": 8,
  "max_tokens_per_step": 10
}

Evaluation

WER measured on held-out test splits from asierhv/composite_corpus_eu_v2.1:

Split	Baseline (base model on Basque)	Fine-tuned
`test_cv` (Common Voice)	108.47%	6.92%
`test_parl` (Parliament)	107.61%	4.36%
`test_oslr` (OpenSLR)	108.52%	14.52%

The base model is English-oriented. WER > 100% on Basque is expected for it.

Quick start

Install onnx-asr

pip install onnx-asr

Transcribe from Hugging Face (automatic download)

import onnx_asr

# Load directly from this HF repo — files are downloaded automatically
model = onnx_asr.load_model("xezpeleta/parakeet-tdt-0.6b-v3-basque-onnx-asr")

text = model.transcribe("/path/to/audio.wav")
print(text)

Transcribe a local folder of files

import onnx_asr
from pathlib import Path

model = onnx_asr.load_model("xezpeleta/parakeet-tdt-0.6b-v3-basque-onnx-asr")

audio_files = list(Path("/path/to/wavs").glob("*.wav"))
for audio_path in audio_files:
    text = model.transcribe(str(audio_path))
    print(f"{audio_path.name}: {text}")

Load from local directory

import onnx_asr

model = onnx_asr.load_model("/path/to/parakeet-tdt-0.6b-v3-basque-onnx-asr")
text = model.transcribe("/path/to/audio.wav")
print(text)

Batch transcription

import onnx_asr

model = onnx_asr.load_model("xezpeleta/parakeet-tdt-0.6b-v3-basque-onnx-asr")

audio_paths = [
    "/path/a.wav",
    "/path/b.wav",
    "/path/c.wav",
]

results = model.transcribe_batch(audio_paths)
for path, text in zip(audio_paths, results):
    print(f"{path}: {text}")

Use FP32 encoder instead of INT8

The INT8 encoder is used by default (smaller, faster). If you want the full-precision encoder, pass it explicitly:

import onnx_asr

model = onnx_asr.load_model(
    "xezpeleta/parakeet-tdt-0.6b-v3-basque-onnx-asr",
    encoder="encoder-model.onnx",  # FP32 variant
)

Use GPU (CUDA) with ONNX Runtime

pip install onnxruntime-gpu

import onnx_asr

model = onnx_asr.load_model(
    "xezpeleta/parakeet-tdt-0.6b-v3-basque-onnx-asr",
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
)

Audio requirements

Sample rate: 16 kHz
Channels: mono (single channel)
Format: WAV, FLAC, or any format readable by soundfile/librosa

When to choose this format vs. sherpa-onnx

	onnx-asr (this repo)	sherpa-onnx
Install	`pip install onnx-asr`	`pip install sherpa-onnx`
API	Simple `model.transcribe()`	More control (streaming, chunk size)
Streaming	No	Yes
Mobile / embedded	No	Yes (Android, iOS, WASM)
C++ / native binary	No	Yes
Best for	Server-side Python batch transcription	Edge, real-time, multi-platform

For the sherpa-onnx version of this model, see: xezpeleta/parakeet-tdt-0.6b-v3-basque-sherpa-onnx

Export recipe

This model was exported from the .nemo checkpoint using a custom script:

Load fine-tuned NeMo model
Use NeMo's built-in asr_model.export("model.onnx") to produce split graphs
Consolidate external data tensors into self-contained ONNX files
Generate vocab.txt from the SentencePiece tokenizer
Write config.json with model_type: nemo-conformer-tdt
Apply INT8 dynamic quantisation to the encoder via onnxruntime.quantization.quantize_dynamic

The export and fine-tuning code is available at: xezpeleta/parakeet-tdt-0.6b-v3-basque.

Related models

Repo	Format	Use case
itzune/parakeet-tdt-0.6b-v3-basque	NeMo `.nemo`	Full NeMo / PyTorch inference & fine-tuning
xezpeleta/parakeet-tdt-0.6b-v3-basque-sherpa-onnx	sherpa-onnx INT8	On-device / real-time / cross-platform
This repo	ONNX-ASR	Simple Python batch inference

Citation and acknowledgements

If you use this model, please credit:

Base model: nvidia/parakeet-tdt-0.6b-v3
Fine-tuned model: itzune/parakeet-tdt-0.6b-v3-basque
Training dataset: asierhv/composite_corpus_eu_v2.1
onnx-asr: istupakov/onnx-asr

Underlying source collections in the training corpus:

Mozilla Common Voice (Basque)
Basque Parliament corpus
OpenSLR Basque resources

License

CC BY 4.0. Inherit license obligations from the base model and dataset.

Downloads last month: 1

Model tree for xezpeleta/parakeet-tdt-0.6b-v3-basque-onnx-asr

Base model

nvidia/parakeet-tdt-0.6b-v3

Finetuned

itzune/parakeet-tdt-0.6b-v3-basque

Quantized

(2)

this model

Evaluation results

test_cv WER on Composite Basque test splits (CV/Parliament/OSLR)
self-reported

6.920
test_parl WER on Composite Basque test splits (CV/Parliament/OSLR)
self-reported

4.360
test_oslr WER on Composite Basque test splits (CV/Parliament/OSLR)
self-reported

14.520