Symbolic Capsule Network (SCN)

What if a detector could tell you not just what it found, but why it is confident?

SCN is a real-time object detection and instance segmentation model that replaces the conventional convolutional head with a capsule-based neck and head. By encoding visual entities as pose-aware vectors rather than scalar activations, SCN explicitly captures part-whole relationships — the structural agreements between object parts and the wholes they compose. Every detection is backed by a symbolic routing path: a traceable chain of capsule agreements that exposes which parts voted for which object, turning each prediction into an auditable reasoning trace.

Live Demo

Try the interactive demo ↗

Example Results

Key Ideas

Standard convolutional detectors reduce every visual entity to a scalar confidence score, discarding the compositional structure that makes objects recognisable. SCN addresses this with three tightly integrated contributions:

1. Part-Whole Relation Modelling CapsRoute layers propagate evidence upward from low-level part capsules — encoding local features such as wheels, windows, and body panels — to high-level object capsules through dynamic routing-by-agreement. Agreement is only reached when the geometric votes from multiple parts are mutually consistent, giving the model an inductive bias toward spatially coherent detections.

2. Symbolic Routing Paths The routing coefficients produced at each capsule layer form an explicit, directed evidence graph. Unlike Grad-CAM or SHAP, which reconstruct explanations after the fact, SCN's routing weights are native model outputs — first-class signals that describe the model's reasoning as it happens, without any additional computation.

3. Concept-Based Detection Auditing Routing paths enable structured inspection that scalar networks cannot support:

Verify that a predicted "car" is grounded in consistent wheel, body, and windshield part activations.
Diagnose which part capsule collapsed when the model misses an object under occlusion or viewpoint change.
Detect bias by aggregating routing statistics across a dataset to reveal which visual parts the model over-relies on.

Architecture

The pipeline flows through four capsule-specific modules:

Module	Role
`CapsProj`	Projects multi-scale CNN feature maps into capsule space
`CapsAlign`	Aligns capsule resolutions across FPN levels
`CapsRoute` / `CapsRouteV2-4`	Dynamic routing-by-agreement across part-to-whole levels
`CapsDecode`	Decodes final capsule activations into boxes and masks

Performance

Detection — COCO 2017 val

SCN sets a new state of the art among nano-scale detectors, surpassing every YOLO variant at comparable FLOPs.

Model	mAP50	mAP50:95	mAP50 (E2E)	mAP50:95 (E2E)	Speed (ms)	Params (M)	FLOPs (B)
YOLOv6n	53.1%	37.5%	52.1%	36.9%	20.8	4.7	11.4
YOLOv7-tiny	56.7%	38.7%	55.7%	38.1%	20.9	6.2	13.8
YOLOv8n	52.5%	37.3%	51.5%	36.6%	18.3	3.2	8.7
YOLOv9t	53.1%	38.3%	52.1%	37.6%	20.1	2.0	7.7
YOLOv10n	53.8%	38.5%	52.8%	37.8%	16.7	2.3	6.7
YOLOv11n	55.1%	39.5%	54.1%	38.8%	19.3	2.6	6.5
YOLOv12n	56.7%	40.4%	55.7%	39.7%	19.4	2.5	6.0
YOLO26n	56.8%	40.8%	55.7%	40.0%	14.4	2.6	6.1
SCN-n (Ours)	57.1%	41.6%	56.1%	40.4%	29.6	3.3	6.5

SCN-n achieves +0.3% mAP50 and +0.8% mAP50:95 over the previous best (YOLO26n) at the same 6.5B FLOPs budget — accuracy gains that come entirely from structural reasoning, not extra capacity.

Accuracy–Efficiency Frontier

SCN occupies the top of the accuracy–efficiency frontier across all model scales (n / s / m / l / x). At every FLOPs level, SCN variants outperform their YOLO counterparts, demonstrating that part-whole routing is a principled and scalable improvement.

Instance Segmentation — COCO 2017 val

Model	Input	Mask mAP50	Mask mAP50:95
SCN Segmentation	640	53.3%	34.1%

Quick Start

pip install ultralytics huggingface_hub

from huggingface_hub import hf_hub_download
from ultralytics import YOLO
from models import register_ultralytics_modules

weights = hf_hub_download(
    repo_id="zpyuan/SymbolicCapsuleNetwork",
    filename="weights/symbolic_capsule_network_segmentation.pt",
)
register_ultralytics_modules()
model = YOLO(weights)
results = model.predict("image.jpg", imgsz=640, conf=0.25)
results[0].show()

Command-line:

python predict.py path/to/image.jpg
python predict.py path/to/image.jpg --conf 0.3 --imgsz 1280

Repository Structure

Path	Description
`weights/symbolic_capsule_network_segmentation.pt`	Pretrained segmentation checkpoint
`modules/`	Capsule modules: `CapsProj`, `CapsAlign`, `CapsRoute`, `CapsRouteV2-4`, `CapsDecode`
`models/custom_yolo.py`	Ultralytics hook that registers capsule layers before model load
`configs/seg_model/`	YAML defining the capsule neck and head architecture
`predict.py`	Minimal inference entry point

Downloads last month: -

Space using zpyuan/SymbolicCapsuleNetwork 1

Evaluation results

mAP50 on COCO 2017
self-reported

0.571
mAP50:95 on COCO 2017
self-reported

0.416
Mask mAP50 on COCO 2017
self-reported

0.533
Mask mAP50:95 on COCO 2017
self-reported

0.341