LayoutLMv3 — Vietnamese administrative document KIE

Fine-tuned microsoft/layoutlmv3-base for key-information extraction on Vietnamese administrative documents (Quyết định, Công văn, Tờ trình, Báo cáo, ...).

The variant is the fontgray-norm flavour: token_type_ids encode three style buckets derived from per-word font size, foreground gray level, and word height (see style_emphasis_ids in the training pipeline).

Files

layoutlmv3_fontgray_norm_final_epoch25/layoutlmv3_fontgray_norm_final_epoch25.int8.onnx — quantized INT8 ONNX model
layoutlmv3_fontgray_norm_final_epoch25/label_list.json — label vocabulary (BIO tags)
layoutlmv3_fontgray_norm_final_epoch25/layoutlmv3_fontgray_config.json — runtime config (style buckets, line position buckets)
Tokenizer files (tokenizer.json, tokenizer_config.json, special_tokens_map.json, vocab.txt, …)

Intended use

This model is designed for the ScanIndex pipeline. It expects the canonical OCR JSON profile produced by ScreenAI + the project's preprocessing (layoutlmv3_runtime_v1). Using it standalone requires reproducing that input format.

Loading

from huggingface_hub import snapshot_download
local = snapshot_download("welcomyou/layoutlmv3-vn-admin-kie", local_dir="models")
# Then point ScanIndex at <repo>/models/layoutlmv3_fontgray_norm_final_epoch25/

Training & data

See train-convert/kie/train_kie/layoutlmv3_fontgray_norm/ for the training scripts and decision records.

Trained on internal annotated Vietnamese admin documents (not redistributed).

License

Inherits LayoutLMv3 base license: CC-BY-NC-SA-4.0 (research / non-commercial). Commercial use requires a separate agreement with Microsoft for the base model.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for welcomyou/layoutlmv3-vn-admin-kie

Base model

microsoft/layoutlmv3-base

Quantized

(2)

this model

Collection including welcomyou/layoutlmv3-vn-admin-kie

ScanIndex

Collection

Models loaded by https://github.com/welcomyou/scanindex — OCR, KIE, layout, tables, embedder for Vietnamese admin docs. • 8 items • Updated 2 days ago