Symbolic Capsule Network (SCN)

What if a detector could tell you not just what it found, but why it is confident?

SCN is a real-time object detection and instance segmentation model that replaces the conventional convolutional head with a capsule-based neck and head. By encoding visual entities as pose-aware vectors rather than scalar activations, SCN explicitly captures part-whole relationships — the structural agreements between object parts and the wholes they compose. Every detection is backed by a symbolic routing path: a traceable chain of capsule agreements that exposes which parts voted for which object, turning each prediction into an auditable reasoning trace.

Live Demo

Try the interactive demo ↗

Example Results

bus people
scene scene2

Key Ideas

Standard convolutional detectors reduce every visual entity to a scalar confidence score, discarding the compositional structure that makes objects recognisable. SCN addresses this with three tightly integrated contributions:

1. Part-Whole Relation Modelling CapsRoute layers propagate evidence upward from low-level part capsules — encoding local features such as wheels, windows, and body panels — to high-level object capsules through dynamic routing-by-agreement. Agreement is only reached when the geometric votes from multiple parts are mutually consistent, giving the model an inductive bias toward spatially coherent detections.

2. Symbolic Routing Paths The routing coefficients produced at each capsule layer form an explicit, directed evidence graph. Unlike Grad-CAM or SHAP, which reconstruct explanations after the fact, SCN's routing weights are native model outputs — first-class signals that describe the model's reasoning as it happens, without any additional computation.

3. Concept-Based Detection Auditing Routing paths enable structured inspection that scalar networks cannot support:

  • Verify that a predicted "car" is grounded in consistent wheel, body, and windshield part activations.
  • Diagnose which part capsule collapsed when the model misses an object under occlusion or viewpoint change.
  • Detect bias by aggregating routing statistics across a dataset to reveal which visual parts the model over-relies on.

Architecture

Architecture Overview

The pipeline flows through four capsule-specific modules:

Module Role
CapsProj Projects multi-scale CNN feature maps into capsule space
CapsAlign Aligns capsule resolutions across FPN levels
CapsRoute / CapsRouteV2-4 Dynamic routing-by-agreement across part-to-whole levels
CapsDecode Decodes final capsule activations into boxes and masks

Performance

Detection — COCO 2017 val

SCN sets a new state of the art among nano-scale detectors, surpassing every YOLO variant at comparable FLOPs.

Model mAP50 mAP50:95 mAP50 (E2E) mAP50:95 (E2E) Speed (ms) Params (M) FLOPs (B)
YOLOv6n 53.1% 37.5% 52.1% 36.9% 20.8 4.7 11.4
YOLOv7-tiny 56.7% 38.7% 55.7% 38.1% 20.9 6.2 13.8
YOLOv8n 52.5% 37.3% 51.5% 36.6% 18.3 3.2 8.7
YOLOv9t 53.1% 38.3% 52.1% 37.6% 20.1 2.0 7.7
YOLOv10n 53.8% 38.5% 52.8% 37.8% 16.7 2.3 6.7
YOLOv11n 55.1% 39.5% 54.1% 38.8% 19.3 2.6 6.5
YOLOv12n 56.7% 40.4% 55.7% 39.7% 19.4 2.5 6.0
YOLO26n 56.8% 40.8% 55.7% 40.0% 14.4 2.6 6.1
SCN-n (Ours) 57.1% 41.6% 56.1% 40.4% 29.6 3.3 6.5

SCN-n achieves +0.3% mAP50 and +0.8% mAP50:95 over the previous best (YOLO26n) at the same 6.5B FLOPs budget — accuracy gains that come entirely from structural reasoning, not extra capacity.

Accuracy–Efficiency Frontier

COCO mAP50:95 vs FLOPs

SCN occupies the top of the accuracy–efficiency frontier across all model scales (n / s / m / l / x). At every FLOPs level, SCN variants outperform their YOLO counterparts, demonstrating that part-whole routing is a principled and scalable improvement.

Instance Segmentation — COCO 2017 val

Model Input Mask mAP50 Mask mAP50:95
SCN Segmentation 640 53.3% 34.1%

Quick Start

pip install ultralytics huggingface_hub
from huggingface_hub import hf_hub_download
from ultralytics import YOLO
from models import register_ultralytics_modules

weights = hf_hub_download(
    repo_id="zpyuan/SymbolicCapsuleNetwork",
    filename="weights/symbolic_capsule_network_segmentation.pt",
)
register_ultralytics_modules()
model = YOLO(weights)
results = model.predict("image.jpg", imgsz=640, conf=0.25)
results[0].show()

Command-line:

python predict.py path/to/image.jpg
python predict.py path/to/image.jpg --conf 0.3 --imgsz 1280

Repository Structure

Path Description
weights/symbolic_capsule_network_segmentation.pt Pretrained segmentation checkpoint
modules/ Capsule modules: CapsProj, CapsAlign, CapsRoute, CapsRouteV2-4, CapsDecode
models/custom_yolo.py Ultralytics hook that registers capsule layers before model load
configs/seg_model/ YAML defining the capsule neck and head architecture
predict.py Minimal inference entry point

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using zpyuan/SymbolicCapsuleNetwork 1

Evaluation results