SLANet-1M (ONNX export)
ONNX-converted weights of dimtri009/SLANet-1M for table structure recognition. Repackaged for direct onnxruntime inference without the PaddlePaddle runtime dependency.
What this is
- Source weights: trained PaddlePaddle inference model (~7.6MB params) from dimtri009/SLANet-1M
- Conversion:
paddle2onnx 1.3.1withopset_version=16 - License: MIT (inherited from the source model)
- Architecture: SLANet variant with depthwise separable convolutions, ~9.2M params, transformer-free
- Input: BGR image, resized so longest side β€488 then padded to 488Γ488, ImageNet normalized
- Output: 30-token structure sequence + 4-coord bbox per token
Files
| File | Size | Purpose |
|---|---|---|
slanet_1m.onnx |
7.6 MB | ONNX-exported model weights |
inference.yml |
1.3 KB | Preprocessing config + 28-token character dictionary |
Why this repo exists
The upstream repo ships PaddlePaddle artifacts (.pdmodel + .pdiparams) which require installing the full paddlepaddle package (~300 MB) for inference. This repo provides the same trained weights in ONNX so they can be loaded with just onnxruntime.
The weights are not modified β paddle2onnx performs a graph-to-graph translation, not retraining or quantization. SHA-256 of the produced slanet_1m.onnx:
8a8aa31bf964c1c05039f02814e5f425a354a37552eaca8e6d5dc513048759f5
Reproducing the conversion
pip install paddlepaddle paddle2onnx==1.3.1
huggingface-cli download dimtri009/SLANet-1M --local-dir ./slanet-1m-paddle
paddle2onnx \
--model_dir ./slanet-1m-paddle \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--save_file slanet_1m.onnx \
--opset_version 16
Usage
import cv2
import numpy as np
import onnxruntime as ort
import yaml
# Load model and char dict
session = ort.InferenceSession("slanet_1m.onnx", providers=["CPUExecutionProvider"])
config = yaml.safe_load(open("inference.yml"))
character_dict = config["PostProcess"]["character_dict"]
# Preprocess: BGR β resize longest side β€488 β ImageNet normalize β pad to 488Γ488 β CHW
def preprocess(img_bgr, max_len=488):
h, w = img_bgr.shape[:2]
ratio = max_len / max(h, w)
rh, rw = int(h * ratio), int(w * ratio)
resized = cv2.resize(img_bgr, (rw, rh))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
norm = (resized.astype(np.float32) / 255.0 - mean) / std
padded = np.zeros((max_len, max_len, 3), dtype=np.float32)
padded[:rh, :rw] = norm
chw = padded.transpose(2, 0, 1)
return chw[None].astype(np.float32), [h, w, ratio, ratio, max_len, max_len]
img = cv2.imread("table.png")
batch, shape_info = preprocess(img)
bbox_preds, struct_probs = session.run(None, {"x": batch})
# Decode: argmax over struct_probs β token indices β HTML; index bbox_preds for <td> tokens
# Full decoder: see PaddleOCR's TableLabelDecode or rapid_table's pp_structure post_process
Model details
| Property | Value |
|---|---|
| Parameters | ~9.2M |
| Input size | 488Γ488 RGB (variable, padded to square) |
| Output 1 | (N, T, 4) xyxy bbox per structure token |
| Output 2 | (N, T, 30) structure-token logits |
| Vocab | 28 PaddleOCR table tokens + sos + eos |
| S-TEDS (PubTabNet, author-reported) | 97.36 |
| S-TEDS (SynthTabNet, author-reported) | 99.36 |
Citation
Original SLANet-1M model β Master's thesis at the University of Florence and Swiss AI Center (iCoSys, Fribourg), presented at SwissText 2025:
License
MIT β inherited from the source model. See dimtri009/SLANet-1M for upstream attribution.
Model tree for bdatdo0601/slanet-1m-onnx
Base model
dimtri009/SLANet-1M