SLANet-1M (ONNX export)

ONNX-converted weights of dimtri009/SLANet-1M for table structure recognition. Repackaged for direct onnxruntime inference without the PaddlePaddle runtime dependency.

What this is

  • Source weights: trained PaddlePaddle inference model (~7.6MB params) from dimtri009/SLANet-1M
  • Conversion: paddle2onnx 1.3.1 with opset_version=16
  • License: MIT (inherited from the source model)
  • Architecture: SLANet variant with depthwise separable convolutions, ~9.2M params, transformer-free
  • Input: BGR image, resized so longest side ≀488 then padded to 488Γ—488, ImageNet normalized
  • Output: 30-token structure sequence + 4-coord bbox per token

Files

File Size Purpose
slanet_1m.onnx 7.6 MB ONNX-exported model weights
inference.yml 1.3 KB Preprocessing config + 28-token character dictionary

Why this repo exists

The upstream repo ships PaddlePaddle artifacts (.pdmodel + .pdiparams) which require installing the full paddlepaddle package (~300 MB) for inference. This repo provides the same trained weights in ONNX so they can be loaded with just onnxruntime.

The weights are not modified β€” paddle2onnx performs a graph-to-graph translation, not retraining or quantization. SHA-256 of the produced slanet_1m.onnx:

8a8aa31bf964c1c05039f02814e5f425a354a37552eaca8e6d5dc513048759f5

Reproducing the conversion

pip install paddlepaddle paddle2onnx==1.3.1
huggingface-cli download dimtri009/SLANet-1M --local-dir ./slanet-1m-paddle
paddle2onnx \
  --model_dir ./slanet-1m-paddle \
  --model_filename inference.pdmodel \
  --params_filename inference.pdiparams \
  --save_file slanet_1m.onnx \
  --opset_version 16

Usage

import cv2
import numpy as np
import onnxruntime as ort
import yaml

# Load model and char dict
session = ort.InferenceSession("slanet_1m.onnx", providers=["CPUExecutionProvider"])
config = yaml.safe_load(open("inference.yml"))
character_dict = config["PostProcess"]["character_dict"]

# Preprocess: BGR β†’ resize longest side ≀488 β†’ ImageNet normalize β†’ pad to 488Γ—488 β†’ CHW
def preprocess(img_bgr, max_len=488):
    h, w = img_bgr.shape[:2]
    ratio = max_len / max(h, w)
    rh, rw = int(h * ratio), int(w * ratio)
    resized = cv2.resize(img_bgr, (rw, rh))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    norm = (resized.astype(np.float32) / 255.0 - mean) / std
    padded = np.zeros((max_len, max_len, 3), dtype=np.float32)
    padded[:rh, :rw] = norm
    chw = padded.transpose(2, 0, 1)
    return chw[None].astype(np.float32), [h, w, ratio, ratio, max_len, max_len]

img = cv2.imread("table.png")
batch, shape_info = preprocess(img)
bbox_preds, struct_probs = session.run(None, {"x": batch})

# Decode: argmax over struct_probs β†’ token indices β†’ HTML; index bbox_preds for <td> tokens
# Full decoder: see PaddleOCR's TableLabelDecode or rapid_table's pp_structure post_process

Model details

Property Value
Parameters ~9.2M
Input size 488Γ—488 RGB (variable, padded to square)
Output 1 (N, T, 4) xyxy bbox per structure token
Output 2 (N, T, 30) structure-token logits
Vocab 28 PaddleOCR table tokens + sos + eos
S-TEDS (PubTabNet, author-reported) 97.36
S-TEDS (SynthTabNet, author-reported) 99.36

Citation

Original SLANet-1M model β€” Master's thesis at the University of Florence and Swiss AI Center (iCoSys, Fribourg), presented at SwissText 2025:

SLANet-1M paper

License

MIT β€” inherited from the source model. See dimtri009/SLANet-1M for upstream attribution.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for bdatdo0601/slanet-1m-onnx

Quantized
(1)
this model