ViBERT-capu ONNX (FP32 + INT8)

Vietnamese punctuation restoration + capitalization model — ONNX Runtime version of dragonSwing/vibert-capu. PyTorch dependency removed (~2 GB → ~50 MB onnxruntime).

Variant	File	Size	Use case
FP32	`vibert-capu.onnx`	438 MB	Best accuracy, server / web service
INT8	`vibert-capu.int8.onnx`	110 MB	Desktop, embedded — dynamic-quantized weights, ~99% of FP32 accuracy

Architecture: BERT (FPTAI/vibert-base-cased) fine-tuned by dragonSwing on 5.6M OSCAR-2109 samples for the Seq2Labels punctuation+capitalization task (15 GECToR-style edit actions).

Why ONNX?

	PyTorch (original)	ONNX Runtime (this repo)
Cold start	~6 s	~0.8 s
Runtime deps	torch (~2 GB)	onnxruntime (~50 MB)
Portable build	very heavy	lightweight

Quick start

import numpy as np
import onnxruntime as ort
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer

local = snapshot_download("welcomyou/vibert-capu-onnx")
tok = AutoTokenizer.from_pretrained(local)
sess = ort.InferenceSession(f"{local}/vibert-capu.int8.onnx",
                            providers=["CPUExecutionProvider"])

text = "hà nội là thủ đô việt nam tôi yêu nó"
enc = tok(text.split(), is_split_into_words=True, return_tensors="np")
# input_offsets: index of first subword for each word
word_ids = enc.word_ids()
offsets = []
prev = None
for i, w in enumerate(word_ids):
    if w is not None and w != prev:
        offsets.append(i); prev = w
input_offsets = np.array([offsets], dtype=np.int64)

logits, detect_logits = sess.run(None, {
    "input_ids": enc["input_ids"].astype(np.int64),
    "attention_mask": enc["attention_mask"].astype(np.int64),
    "token_type_ids": enc["token_type_ids"].astype(np.int64),
    "input_offsets": input_offsets,
})
# logits: (1, num_words, 15)  — 15 GECToR actions
# detect_logits: (1, num_words, 4) — error detection

Model I/O

Inputs (all int64):

Name	Shape	Description
`input_ids`	`(batch, seq_len)`	BPE token IDs from BertTokenizer
`attention_mask`	`(batch, seq_len)`	1 = real token, 0 = padding
`token_type_ids`	`(batch, seq_len)`	Segment IDs (always 0)
`input_offsets`	`(batch, num_words)`	Index of first subword for each whitespace-separated word

Outputs (float32):

Name	Shape	Description
`logits`	`(batch, num_words, 15)`	Action probabilities (15 GECToR-style edits)
`detect_logits`	`(batch, num_words, 4)`	Error-detection probabilities

15 actions:

$KEEP                      Giữ nguyên
$TRANSFORM_CASE_CAPITAL    Viết hoa chữ cái đầu (hà nội → Hà Nội)
$APPEND_,                  Thêm dấu phẩy
$APPEND_.                  Thêm dấu chấm
$TRANSFORM_VERB_VB_VBN     (không dùng cho tiếng Việt)
$TRANSFORM_CASE_UPPER      Viết hoa toàn bộ (who → WHO)
$APPEND_:                  Thêm dấu hai chấm
$APPEND_?                  Thêm dấu hỏi
$TRANSFORM_VERB_VB_VBC     (không dùng cho tiếng Việt)
$TRANSFORM_CASE_LOWER      Viết thường
$TRANSFORM_CASE_CAPITAL_1  Viết hoa ký tự thứ 2
$TRANSFORM_CASE_UPPER_-1   Viết hoa trừ ký tự cuối
$MERGE_SPACE               Nối từ
@@UNKNOWN@@
@@PADDING@@

Reproducing the export

git clone https://huggingface.co/dragonSwing/vibert-capu
pip install torch transformers onnxruntime numpy

# Export FP32 + dynamic-quantize INT8 in one step:
python convert_onnx/export_vibert_onnx.py \
    --model_dir vibert-capu \
    --output    vibert-capu.onnx \
    --opset     14 \
    --verify

Script: convert_onnx/export_vibert_onnx.py.

Files

config.json                    BERT config (from dragonSwing)
vocab.txt                      BERT vocabulary (from dragonSwing)
vocabulary/                    GECToR action labels
  d_tags.txt
  labels.txt
  non_padded_namespaces.txt
verb-form-vocab.txt            Verb form vocabulary
vibert-capu.onnx              FP32 ONNX (438 MB)
vibert-capu.int8.onnx         INT8 ONNX (110 MB)
configuration_seq2labels.py   Seq2Labels HF config class
modeling_seq2labels.py        Seq2Labels HF model class (PyTorch reference, not used at runtime)
gec_model.py                  GECToR inference helpers
utils.py                      Tokenization helpers
vocabulary.py                 GECToR Vocabulary class

Credits & License

Original model: dragonSwing/vibert-capu
Base BERT: FPTAI/vibert-base-cased
Training data: OSCAR-2109 Vietnamese subset (5.6M samples)

License: CC-BY-SA-4.0 (inherited from dragonSwing/vibert-capu — derivative works must use the same license).

Used by

Sherpa Vietnamese ASR — offline Vietnamese ASR for desktop and web (CPU-only).

Downloads last month: 13

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for welcomyou/vibert-capu-onnx

Base model

dragonSwing/vibert-capu

Quantized

(1)

this model

Collection including welcomyou/vibert-capu-onnx

Sherpa Vietnamese ASR

Collection

ONNX models for github.com/welcomyou/sherpa-vietnamese-asr — offline Vietnamese ASR, CPU-only. • 4 items • Updated 2 days ago