welcomyou's picture
docs: model card
7af3196 verified
metadata
library_name: onnxruntime
pipeline_tag: object-detection
license: mit
base_model: ds4sd/docling-models
tags:
  - table-structure-recognition
  - tableformer
  - docling
  - onnx
  - stepcache
  - kv-cache

Docling TableFormer v1 — ONNX stepcache export

ONNX export of Docling's TableFormer v1 structure recognizer, split into encoder + step-cached decoder + bbox-head sub-graphs so the autoregressive decoder can be run one step at a time with a KV-cache from Python — without pulling in the Docling runtime.

Why stepcache?

Docling's stock decoder runs the full sequence per call. For desktop CPU inference you want to cache K/V across decoder steps to amortize cost. This export materializes that pattern at the ONNX level so onnxruntime (or any ONNX runtime) handles it without custom Docling code.

Files (docling_tableformer_v1_stepcache_onnx/)

File Role
docling_v1_encoder.onnx Encodes the cropped table image once
docling_v1_decoder_step.onnx One decoder step; consumes encoder features + previous KV
docling_v1_bbox_head.onnx Maps decoder hidden states to per-cell bboxes
vocab.json, tableformer_config.json Tokenizer + model config

Loading

import onnxruntime as ort
from huggingface_hub import snapshot_download
local = snapshot_download("welcomyou/docling-tableformer-v1-onnx-stepcache", local_dir="models")
sub = f"{local}/docling_tableformer_v1_stepcache_onnx"
encoder = ort.InferenceSession(f"{sub}/docling_v1_encoder.onnx")
decoder = ort.InferenceSession(f"{sub}/docling_v1_decoder_step.onnx")
bbox    = ort.InferenceSession(f"{sub}/docling_v1_bbox_head.onnx")

A reference Python loop that ties these three sessions into a stepcache decoder lives at train-convert/docling-tableformer-v1/convert/onnx_stepcache_runner_reference.py.

Re-export reproduction

See train-convert/docling-tableformer-v1/convert/export_docling_v1_tableformer_stepcache_onnx.py.

License

MIT, inherited from Docling.