SLANet-1M (ONNX export)

ONNX-converted weights of dimtri009/SLANet-1M for table structure recognition. Repackaged for direct onnxruntime inference without the PaddlePaddle runtime dependency.

What this is

Source weights: trained PaddlePaddle inference model (~7.6MB params) from dimtri009/SLANet-1M
Conversion: paddle2onnx 1.3.1 with opset_version=16
License: MIT (inherited from the source model)
Architecture: SLANet variant with depthwise separable convolutions, ~9.2M params, transformer-free
Input: BGR image, resized so longest side ≤488 then padded to 488×488, ImageNet normalized
Output: 30-token structure sequence + 4-coord bbox per token

Files

File	Size	Purpose
`slanet_1m.onnx`	7.6 MB	ONNX-exported model weights
`inference.yml`	1.3 KB	Preprocessing config + 28-token character dictionary

Why this repo exists

The upstream repo ships PaddlePaddle artifacts (.pdmodel + .pdiparams) which require installing the full paddlepaddle package (~300 MB) for inference. This repo provides the same trained weights in ONNX so they can be loaded with just onnxruntime.

The weights are not modified — paddle2onnx performs a graph-to-graph translation, not retraining or quantization. SHA-256 of the produced slanet_1m.onnx:

8a8aa31bf964c1c05039f02814e5f425a354a37552eaca8e6d5dc513048759f5

Reproducing the conversion

pip install paddlepaddle paddle2onnx==1.3.1
huggingface-cli download dimtri009/SLANet-1M --local-dir ./slanet-1m-paddle
paddle2onnx \
  --model_dir ./slanet-1m-paddle \
  --model_filename inference.pdmodel \
  --params_filename inference.pdiparams \
  --save_file slanet_1m.onnx \
  --opset_version 16

Usage

import cv2
import numpy as np
import onnxruntime as ort
import yaml

# Load model and char dict
session = ort.InferenceSession("slanet_1m.onnx", providers=["CPUExecutionProvider"])
config = yaml.safe_load(open("inference.yml"))
character_dict = config["PostProcess"]["character_dict"]

# Preprocess: BGR → resize longest side ≤488 → ImageNet normalize → pad to 488×488 → CHW
def preprocess(img_bgr, max_len=488):
    h, w = img_bgr.shape[:2]
    ratio = max_len / max(h, w)
    rh, rw = int(h * ratio), int(w * ratio)
    resized = cv2.resize(img_bgr, (rw, rh))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    norm = (resized.astype(np.float32) / 255.0 - mean) / std
    padded = np.zeros((max_len, max_len, 3), dtype=np.float32)
    padded[:rh, :rw] = norm
    chw = padded.transpose(2, 0, 1)
    return chw[None].astype(np.float32), [h, w, ratio, ratio, max_len, max_len]

img = cv2.imread("table.png")
batch, shape_info = preprocess(img)
bbox_preds, struct_probs = session.run(None, {"x": batch})

# Decode: argmax over struct_probs → token indices → HTML; index bbox_preds for <td> tokens
# Full decoder: see PaddleOCR's TableLabelDecode or rapid_table's pp_structure post_process

Model details

Property	Value
Parameters	~9.2M
Input size	488×488 RGB (variable, padded to square)
Output 1	`(N, T, 4)` xyxy bbox per structure token
Output 2	`(N, T, 30)` structure-token logits
Vocab	28 PaddleOCR table tokens + `sos` + `eos`
S-TEDS (PubTabNet, author-reported)	97.36
S-TEDS (SynthTabNet, author-reported)	99.36

Citation

Original SLANet-1M model — Master's thesis at the University of Florence and Swiss AI Center (iCoSys, Fribourg), presented at SwissText 2025:

SLANet-1M paper

License

MIT — inherited from the source model. See dimtri009/SLANet-1M for upstream attribution.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for bdatdo0601/slanet-1m-onnx

Base model

dimtri009/SLANet-1M

Quantized

(1)

this model