Sherpa Vietnamese ASR
Collection
ONNX models for github.com/welcomyou/sherpa-vietnamese-asr — offline Vietnamese ASR, CPU-only. • 4 items • Updated
Vietnamese punctuation restoration + capitalization model — ONNX Runtime version of dragonSwing/vibert-capu. PyTorch dependency removed (~2 GB → ~50 MB onnxruntime).
| Variant | File | Size | Use case |
|---|---|---|---|
| FP32 | vibert-capu.onnx |
438 MB | Best accuracy, server / web service |
| INT8 | vibert-capu.int8.onnx |
110 MB | Desktop, embedded — dynamic-quantized weights, ~99% of FP32 accuracy |
Architecture: BERT (FPTAI/vibert-base-cased) fine-tuned by dragonSwing on 5.6M OSCAR-2109 samples for the Seq2Labels punctuation+capitalization task (15 GECToR-style edit actions).
| PyTorch (original) | ONNX Runtime (this repo) | |
|---|---|---|
| Cold start | ~6 s | ~0.8 s |
| Runtime deps | torch (~2 GB) | onnxruntime (~50 MB) |
| Portable build | very heavy | lightweight |
import numpy as np
import onnxruntime as ort
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer
local = snapshot_download("welcomyou/vibert-capu-onnx")
tok = AutoTokenizer.from_pretrained(local)
sess = ort.InferenceSession(f"{local}/vibert-capu.int8.onnx",
providers=["CPUExecutionProvider"])
text = "hà nội là thủ đô việt nam tôi yêu nó"
enc = tok(text.split(), is_split_into_words=True, return_tensors="np")
# input_offsets: index of first subword for each word
word_ids = enc.word_ids()
offsets = []
prev = None
for i, w in enumerate(word_ids):
if w is not None and w != prev:
offsets.append(i); prev = w
input_offsets = np.array([offsets], dtype=np.int64)
logits, detect_logits = sess.run(None, {
"input_ids": enc["input_ids"].astype(np.int64),
"attention_mask": enc["attention_mask"].astype(np.int64),
"token_type_ids": enc["token_type_ids"].astype(np.int64),
"input_offsets": input_offsets,
})
# logits: (1, num_words, 15) — 15 GECToR actions
# detect_logits: (1, num_words, 4) — error detection
Inputs (all int64):
| Name | Shape | Description |
|---|---|---|
input_ids |
(batch, seq_len) |
BPE token IDs from BertTokenizer |
attention_mask |
(batch, seq_len) |
1 = real token, 0 = padding |
token_type_ids |
(batch, seq_len) |
Segment IDs (always 0) |
input_offsets |
(batch, num_words) |
Index of first subword for each whitespace-separated word |
Outputs (float32):
| Name | Shape | Description |
|---|---|---|
logits |
(batch, num_words, 15) |
Action probabilities (15 GECToR-style edits) |
detect_logits |
(batch, num_words, 4) |
Error-detection probabilities |
15 actions:
$KEEP Giữ nguyên
$TRANSFORM_CASE_CAPITAL Viết hoa chữ cái đầu (hà nội → Hà Nội)
$APPEND_, Thêm dấu phẩy
$APPEND_. Thêm dấu chấm
$TRANSFORM_VERB_VB_VBN (không dùng cho tiếng Việt)
$TRANSFORM_CASE_UPPER Viết hoa toàn bộ (who → WHO)
$APPEND_: Thêm dấu hai chấm
$APPEND_? Thêm dấu hỏi
$TRANSFORM_VERB_VB_VBC (không dùng cho tiếng Việt)
$TRANSFORM_CASE_LOWER Viết thường
$TRANSFORM_CASE_CAPITAL_1 Viết hoa ký tự thứ 2
$TRANSFORM_CASE_UPPER_-1 Viết hoa trừ ký tự cuối
$MERGE_SPACE Nối từ
@@UNKNOWN@@
@@PADDING@@
git clone https://huggingface.co/dragonSwing/vibert-capu
pip install torch transformers onnxruntime numpy
# Export FP32 + dynamic-quantize INT8 in one step:
python convert_onnx/export_vibert_onnx.py \
--model_dir vibert-capu \
--output vibert-capu.onnx \
--opset 14 \
--verify
Script: convert_onnx/export_vibert_onnx.py.
config.json BERT config (from dragonSwing)
vocab.txt BERT vocabulary (from dragonSwing)
vocabulary/ GECToR action labels
d_tags.txt
labels.txt
non_padded_namespaces.txt
verb-form-vocab.txt Verb form vocabulary
vibert-capu.onnx FP32 ONNX (438 MB)
vibert-capu.int8.onnx INT8 ONNX (110 MB)
configuration_seq2labels.py Seq2Labels HF config class
modeling_seq2labels.py Seq2Labels HF model class (PyTorch reference, not used at runtime)
gec_model.py GECToR inference helpers
utils.py Tokenization helpers
vocabulary.py GECToR Vocabulary class
License: CC-BY-SA-4.0 (inherited from dragonSwing/vibert-capu — derivative works must use the same license).
Base model
dragonSwing/vibert-capu