ViBERT-capu ONNX (FP32 + INT8)

Vietnamese punctuation restoration + capitalization model — ONNX Runtime version of dragonSwing/vibert-capu. PyTorch dependency removed (~2 GB → ~50 MB onnxruntime).

Variant File Size Use case
FP32 vibert-capu.onnx 438 MB Best accuracy, server / web service
INT8 vibert-capu.int8.onnx 110 MB Desktop, embedded — dynamic-quantized weights, ~99% of FP32 accuracy

Architecture: BERT (FPTAI/vibert-base-cased) fine-tuned by dragonSwing on 5.6M OSCAR-2109 samples for the Seq2Labels punctuation+capitalization task (15 GECToR-style edit actions).

Why ONNX?

PyTorch (original) ONNX Runtime (this repo)
Cold start ~6 s ~0.8 s
Runtime deps torch (~2 GB) onnxruntime (~50 MB)
Portable build very heavy lightweight

Quick start

import numpy as np
import onnxruntime as ort
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer

local = snapshot_download("welcomyou/vibert-capu-onnx")
tok = AutoTokenizer.from_pretrained(local)
sess = ort.InferenceSession(f"{local}/vibert-capu.int8.onnx",
                            providers=["CPUExecutionProvider"])

text = "hà nội là thủ đô việt nam tôi yêu nó"
enc = tok(text.split(), is_split_into_words=True, return_tensors="np")
# input_offsets: index of first subword for each word
word_ids = enc.word_ids()
offsets = []
prev = None
for i, w in enumerate(word_ids):
    if w is not None and w != prev:
        offsets.append(i); prev = w
input_offsets = np.array([offsets], dtype=np.int64)

logits, detect_logits = sess.run(None, {
    "input_ids": enc["input_ids"].astype(np.int64),
    "attention_mask": enc["attention_mask"].astype(np.int64),
    "token_type_ids": enc["token_type_ids"].astype(np.int64),
    "input_offsets": input_offsets,
})
# logits: (1, num_words, 15)  — 15 GECToR actions
# detect_logits: (1, num_words, 4) — error detection

Model I/O

Inputs (all int64):

Name Shape Description
input_ids (batch, seq_len) BPE token IDs from BertTokenizer
attention_mask (batch, seq_len) 1 = real token, 0 = padding
token_type_ids (batch, seq_len) Segment IDs (always 0)
input_offsets (batch, num_words) Index of first subword for each whitespace-separated word

Outputs (float32):

Name Shape Description
logits (batch, num_words, 15) Action probabilities (15 GECToR-style edits)
detect_logits (batch, num_words, 4) Error-detection probabilities

15 actions:

$KEEP                      Giữ nguyên
$TRANSFORM_CASE_CAPITAL    Viết hoa chữ cái đầu (hà nội → Hà Nội)
$APPEND_,                  Thêm dấu phẩy
$APPEND_.                  Thêm dấu chấm
$TRANSFORM_VERB_VB_VBN     (không dùng cho tiếng Việt)
$TRANSFORM_CASE_UPPER      Viết hoa toàn bộ (who → WHO)
$APPEND_:                  Thêm dấu hai chấm
$APPEND_?                  Thêm dấu hỏi
$TRANSFORM_VERB_VB_VBC     (không dùng cho tiếng Việt)
$TRANSFORM_CASE_LOWER      Viết thường
$TRANSFORM_CASE_CAPITAL_1  Viết hoa ký tự thứ 2
$TRANSFORM_CASE_UPPER_-1   Viết hoa trừ ký tự cuối
$MERGE_SPACE               Nối từ
@@UNKNOWN@@
@@PADDING@@

Reproducing the export

git clone https://huggingface.co/dragonSwing/vibert-capu
pip install torch transformers onnxruntime numpy

# Export FP32 + dynamic-quantize INT8 in one step:
python convert_onnx/export_vibert_onnx.py \
    --model_dir vibert-capu \
    --output    vibert-capu.onnx \
    --opset     14 \
    --verify

Script: convert_onnx/export_vibert_onnx.py.

Files

config.json                    BERT config (from dragonSwing)
vocab.txt                      BERT vocabulary (from dragonSwing)
vocabulary/                    GECToR action labels
  d_tags.txt
  labels.txt
  non_padded_namespaces.txt
verb-form-vocab.txt            Verb form vocabulary
vibert-capu.onnx              FP32 ONNX (438 MB)
vibert-capu.int8.onnx         INT8 ONNX (110 MB)
configuration_seq2labels.py   Seq2Labels HF config class
modeling_seq2labels.py        Seq2Labels HF model class (PyTorch reference, not used at runtime)
gec_model.py                  GECToR inference helpers
utils.py                      Tokenization helpers
vocabulary.py                 GECToR Vocabulary class

Credits & License

License: CC-BY-SA-4.0 (inherited from dragonSwing/vibert-capu — derivative works must use the same license).

Used by

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for welcomyou/vibert-capu-onnx

Quantized
(1)
this model

Collection including welcomyou/vibert-capu-onnx