Symbolic Capsule Network (SCN)
What if a detector could tell you not just what it found, but why it is confident?
SCN is a real-time object detection and instance segmentation model that replaces the conventional convolutional head with a capsule-based neck and head. By encoding visual entities as pose-aware vectors rather than scalar activations, SCN explicitly captures part-whole relationships — the structural agreements between object parts and the wholes they compose. Every detection is backed by a symbolic routing path: a traceable chain of capsule agreements that exposes which parts voted for which object, turning each prediction into an auditable reasoning trace.
Live Demo
Example Results
Key Ideas
Standard convolutional detectors reduce every visual entity to a scalar confidence score, discarding the compositional structure that makes objects recognisable. SCN addresses this with three tightly integrated contributions:
1. Part-Whole Relation Modelling
CapsRoute layers propagate evidence upward from low-level part capsules — encoding local features such as wheels, windows, and body panels — to high-level object capsules through dynamic routing-by-agreement. Agreement is only reached when the geometric votes from multiple parts are mutually consistent, giving the model an inductive bias toward spatially coherent detections.
2. Symbolic Routing Paths The routing coefficients produced at each capsule layer form an explicit, directed evidence graph. Unlike Grad-CAM or SHAP, which reconstruct explanations after the fact, SCN's routing weights are native model outputs — first-class signals that describe the model's reasoning as it happens, without any additional computation.
3. Concept-Based Detection Auditing Routing paths enable structured inspection that scalar networks cannot support:
- Verify that a predicted "car" is grounded in consistent wheel, body, and windshield part activations.
- Diagnose which part capsule collapsed when the model misses an object under occlusion or viewpoint change.
- Detect bias by aggregating routing statistics across a dataset to reveal which visual parts the model over-relies on.
Architecture
The pipeline flows through four capsule-specific modules:
| Module | Role |
|---|---|
CapsProj |
Projects multi-scale CNN feature maps into capsule space |
CapsAlign |
Aligns capsule resolutions across FPN levels |
CapsRoute / CapsRouteV2-4 |
Dynamic routing-by-agreement across part-to-whole levels |
CapsDecode |
Decodes final capsule activations into boxes and masks |
Performance
Detection — COCO 2017 val
SCN sets a new state of the art among nano-scale detectors, surpassing every YOLO variant at comparable FLOPs.
| Model | mAP50 | mAP50:95 | mAP50 (E2E) | mAP50:95 (E2E) | Speed (ms) | Params (M) | FLOPs (B) |
|---|---|---|---|---|---|---|---|
| YOLOv6n | 53.1% | 37.5% | 52.1% | 36.9% | 20.8 | 4.7 | 11.4 |
| YOLOv7-tiny | 56.7% | 38.7% | 55.7% | 38.1% | 20.9 | 6.2 | 13.8 |
| YOLOv8n | 52.5% | 37.3% | 51.5% | 36.6% | 18.3 | 3.2 | 8.7 |
| YOLOv9t | 53.1% | 38.3% | 52.1% | 37.6% | 20.1 | 2.0 | 7.7 |
| YOLOv10n | 53.8% | 38.5% | 52.8% | 37.8% | 16.7 | 2.3 | 6.7 |
| YOLOv11n | 55.1% | 39.5% | 54.1% | 38.8% | 19.3 | 2.6 | 6.5 |
| YOLOv12n | 56.7% | 40.4% | 55.7% | 39.7% | 19.4 | 2.5 | 6.0 |
| YOLO26n | 56.8% | 40.8% | 55.7% | 40.0% | 14.4 | 2.6 | 6.1 |
| SCN-n (Ours) | 57.1% | 41.6% | 56.1% | 40.4% | 29.6 | 3.3 | 6.5 |
SCN-n achieves +0.3% mAP50 and +0.8% mAP50:95 over the previous best (YOLO26n) at the same 6.5B FLOPs budget — accuracy gains that come entirely from structural reasoning, not extra capacity.
Accuracy–Efficiency Frontier
SCN occupies the top of the accuracy–efficiency frontier across all model scales (n / s / m / l / x). At every FLOPs level, SCN variants outperform their YOLO counterparts, demonstrating that part-whole routing is a principled and scalable improvement.
Instance Segmentation — COCO 2017 val
| Model | Input | Mask mAP50 | Mask mAP50:95 |
|---|---|---|---|
| SCN Segmentation | 640 | 53.3% | 34.1% |
Quick Start
pip install ultralytics huggingface_hub
from huggingface_hub import hf_hub_download
from ultralytics import YOLO
from models import register_ultralytics_modules
weights = hf_hub_download(
repo_id="zpyuan/SymbolicCapsuleNetwork",
filename="weights/symbolic_capsule_network_segmentation.pt",
)
register_ultralytics_modules()
model = YOLO(weights)
results = model.predict("image.jpg", imgsz=640, conf=0.25)
results[0].show()
Command-line:
python predict.py path/to/image.jpg
python predict.py path/to/image.jpg --conf 0.3 --imgsz 1280
Repository Structure
| Path | Description |
|---|---|
weights/symbolic_capsule_network_segmentation.pt |
Pretrained segmentation checkpoint |
modules/ |
Capsule modules: CapsProj, CapsAlign, CapsRoute, CapsRouteV2-4, CapsDecode |
models/custom_yolo.py |
Ultralytics hook that registers capsule layers before model load |
configs/seg_model/ |
YAML defining the capsule neck and head architecture |
predict.py |
Minimal inference entry point |
- Downloads last month
- -
Space using zpyuan/SymbolicCapsuleNetwork 1
Evaluation results
- mAP50 on COCO 2017self-reported0.571
- mAP50:95 on COCO 2017self-reported0.416
- Mask mAP50 on COCO 2017self-reported0.533
- Mask mAP50:95 on COCO 2017self-reported0.341





