Cofiber Detection Circuit

A depth-3 threshold gate network for multi-scale object detection on frozen vision transformer features. 61,520 INT8 learned parameters. 2,184,000 gates. The multi-scale decomposition is analytic (zero parameters); only the classification layer is learned.

The reference backbone is EUPE-ViT-B (86M parameters, frozen), the same encoder used by Argus. The circuit consumes the backbone's stride-16 patch features and produces per-class detections without modifying any backbone weights.

Circuit

Input: feature grid [768, 40, 40] from frozen ViT at stride 16

Layer 0 β€” Pool (fixed, 0 learned params)
  pool(x)_{i,j} = 0.25 * (x_{2i,2j} + x_{2i+1,2j} + x_{2i,2j+1} + x_{2i+1,2j+1})

Layer 1 β€” Cofiber (fixed, 0 learned params)
  cofib(x)_{i,j} = x_{i,j} - upsample(pool(x))_{i,j}

Layer 2 β€” Classify (61,520 INT8 params)
  detect(i,j,c) = H( Ξ£_d w_{c,d} Β· cofib(x)_{i,j,d} + b_c )

Output: per-location, per-class binary detection decisions

The cofiber x - upsample(pool(x)) isolates information present at a given spatial scale but absent from the next coarser scale. Applied iteratively, it decomposes the feature grid into three scale bands (strides 16, 32, 64) with zero learned parameters. The classification layer operates on each band independently.

Equations

The decomposition satisfies three properties, proven in CofiberDecomposition.v:

  1. Block diagonal: The low-frequency block of any morphism between decomposed features equals the functorial low-frequency component. Classification on cofibers is equivalent to multi-scale classification on the original features.

  2. Cross-term vanishing (high→low): Low-frequency input produces zero high-frequency output. A large object detected at stride 32 creates no signal in the stride-16 cofiber.

  3. Cross-term vanishing (low→high): High-frequency input produces zero low-frequency output. Scale bands do not interfere.

These properties are consequences of the adjoint pair (bilinear upsample ⊣ average pool) forming a counit in a semi-additive category. The cofiber of the counit decomposes objects along an exact sequence, guaranteeing lossless scale separation.

Parameters

Layer Operation Weights Learned
0 Average pool 2x {0.25, 0.25, 0.25, 0.25} No
1 Subtract: x - upsample(pool(x)) {1, -1} No
2 Classify: H(w Β· cofib + b) 80 Γ— 768 + 80 Yes (INT8)
Total 61,520

Gates

Scale Stride Spatial Pool gates Subtract gates Classify gates
0 16 40 Γ— 40 307,200 1,228,800 128,000
1 32 20 Γ— 20 76,800 307,200 32,000
2 64 10 Γ— 10 β€” β€” 8,000
Total 2,184,000

All layer 0–1 gates use integer weights from {-1, 0, 1}. Layer 2 gates use INT8 quantized weights. INT8 quantization produces 99.7% detection agreement with FP32.

COCO val2017 Results

Two variants trained on COCO 2017 train (117,266 images), 8 epochs, batch 64, frozen EUPE-ViT-B backbone. Evaluated with pycocotools on the full 5000-image val set.

Variant Architecture Params Nonzero mAP@[0.5:0.95] mAP@0.50 mAP@0.75
linear_70k 768β†’4 box regression 69,976 69,976 4.0 15.8 0.8
box32_92k 768β†’32β†’4 box regression 91,640 91,640 5.7 20.6 1.3
box32 pruned R1 768β†’32β†’4, 15K weights zeroed 91,640 ~76,640 5.7 20.7 1.3
box32 pruned R2 768β†’32β†’4, 30K weights zeroed 91,640 ~62,000 5.9 20.4 1.5
box32 pruned R3 768β†’32β†’4, 45K weights zeroed 91,640 ~47,000 5.1 17.1 1.4
dim20 768β†’20β†’80 bottleneck, SVD-init 22,076 22,076 3.9 14.8 0.9
dim20 R1 (project 25.7% sparse) dim20 with 3,955 project weights zeroed 22,076 18,121 3.9 14.6 0.8
dim20 R2 (project 26.6% sparse) dim20 with 4,088 project weights zeroed 22,076 17,988 3.8 14.5 0.7
dim20 cls_weight pruned (37%) 596 of 1600 cls weights zeroed 22,076 21,480 3.8 14.4 0.7
dim20 reg_hidden pruned (17%) 55 of 320 reg_hidden weights zeroed 22,076 22,021 3.8 14.5 0.7
dim20 reg_out pruned (12%) 8 of 64 reg_out weights zeroed 22,076 22,068 3.8 14.5 0.7
dim20 ctr_weight pruned (90%) 18 of 20 centerness weights zeroed 22,076 22,058 3.7 14.2 0.7
dim20 R1 + cls greedy project 25.7% + cls_weight 45% sparse 22,076 17,406 3.5 13.4 0.6
dim20 joint (from R1) whole-head magnitude pruning from R1 22,076 17,129 3.6 13.7 0.6
dim15 768β†’15β†’80 bottleneck, SVD-init 17,751 17,751 3.0 11.5 0.7
dim10 768β†’10β†’80 bottleneck, SVD-init 13,426 13,426 1.5 5.6 0.4
dim5 768β†’5β†’80 bottleneck, SVD-init 9,101 9,101 0.3 1.3 0.1

Pruning improved mAP from 5.7 to 5.9 by removing noisy prototype weights (box32 R2). Further pruning degraded performance (box32 R3). The dim20/dim15/dim10/dim5 variants project features to 20, 15, 10, and 5 dimensions before classifying, with each projection initialized from the top-K right singular vectors of the pruned box32 R2 prototype matrix. Dim20 retains 72% of the SVD energy and produces 3.9 mAP. Dim15 retains 67% and produces 3.0 mAP. Dim10 retains 61% and produces 1.5 mAP β€” the smallest 80-class COCO detector to clear the 1.0 mAP threshold. Dim5 retains 53% and drops to 0.3 mAP. The mAP scaling across dim20 β†’ dim15 β†’ dim10 is roughly geometric (3.9 β†’ 3.0 β†’ 1.5), but reverses sharply between 10 and 5 dimensions where the curve falls off a cliff. Five directions sit below the intrinsic capacity needed for 80-class separation; the floor lies between 5 and 10 bottleneck dimensions.

The dim20 head was then itself pruned. The mAP-driven pruner bisects over the magnitude-sorted weight list of a target parameter, uses full pycocotools mAP@[0.5:0.95] as the retention metric (1000 val images), and rolls back any pass that fails the 95% retention floor on full verification. It was run separately on each learned parameter of dim20 plus a joint-magnitude variant that ranks every weight in the head against every other.

The leading point is R1 (project layer 25.7% sparse, 18,121 nonzero, 3.9 mAP) β€” same mAP as unpruned dim20 with 18% fewer effective parameters, the highest mAP-per-10K-parameter ratio in the table at 2.15. R2 pushes to 26.6% project sparsity (17,988 nonzero) at a small mAP cost (3.8). Per-parameter slack measurements ran independently against the unpruned dim20 baseline: project 26.6%, cls_weight 37%, reg_hidden 17%, reg_out 12%, ctr_weight 90% (only 2 of 20 centerness weights load-bear). Greedy stacking of cls_weight pruning on top of R1 reaches 17,406 nonzero but drops to 3.5 mAP β€” interaction between parameters: cls_weight slack measured on unpruned dim20 partly compensates for the surviving project subspace, so removing it after pruning project costs more mAP than the per-parameter measurement suggested. Joint magnitude pruning across all 22K head weights (starting from R1) finds 17,129 nonzero at 3.6 mAP, which is the smallest dim20 found but does not Pareto-dominate R1 β€” the bisection's 1000-image mAP proxy was systematically optimistic relative to the full 5000-image eval, so the 95% retention floor measured during pruning gave more aggressive cuts than the full eval would have accepted. R1 remains the leading point of the dim20 pruning Pareto.

All these variants are the smallest detection heads to produce standard COCO mAP numbers on the 80-class benchmark.

Training

Variant Epochs Batch Optimizer LR Schedule Initialization
linear_70k 8 64 AdamW (wd 1e-4) 1e-3 cosine, 3% warmup random
box32_92k 8 64 AdamW (wd 1e-4) 1e-3 cosine, 3% warmup random
box32 pruned R1/R2/R3 β€” β€” β€” β€” β€” from box32_92k checkpoint
dim20 8 64 AdamW (wd 1e-4) 1e-3 cosine, 3% warmup SVD of pruned R2 prototypes
dim15 8 128 AdamW (wd 1e-4) 1e-3 cosine, 3% warmup SVD of pruned R2 prototypes + analytical least-squares cls init
dim10 8 128 AdamW (wd 1e-4) 1e-3 cosine, 3% warmup SVD of pruned R2 prototypes + analytical least-squares cls init
dim5 8 128 AdamW (wd 1e-4) 1e-3 cosine, 3% warmup SVD of pruned R2 prototypes + analytical least-squares cls init
dim20 pruned R1/R2 β€” β€” β€” β€” β€” from dim20 checkpoint, mAP-driven bisection on project layer

All trained variants use the same FCOS-style loss: focal classification (alpha 0.25, gamma 2.0), GIoU box regression, and BCE centerness, summed over three cofiber scales at strides 16, 32, 64. The backbone is frozen throughout β€” gradients flow only through the head. Gradient clipping is set to 5.0.

Pruning is iterative magnitude reduction with a 95% TP-retention threshold on 1000 COCO val images. Each pass tests up to 5000 weights for zeroing (or halving as a fallback) and verifies on the full 1000-image set. R1, R2, R3 are successive passes from the same starting checkpoint.

The training and pruning scripts that produced these checkpoints live in the phanerozoic/detection-heads repository under heads/cofiber_threshold/<variant>/train.py and prune.py.

Usage

from model import CofiberDetector

detector = CofiberDetector.from_safetensors("model.safetensors")

# features: [768, 40, 40] numpy array from any frozen ViT at stride 16
detections = detector.detect(features, score_thresh=0.3)

for d in detections:
    print(f"class {d['label']} at {d['box']} score {d['score']:.3f} scale {d['scale']}")

Proof

CofiberDecomposition.v contains a machine-checked proof (Coq/HoTT) of the three cross-term vanishing theorems. The proof establishes that the block structure of the decomposition is exact in any semi-additive category with a suspension-loop adjunction. The concrete instantiation (average pool, bilinear upsample, float32 tensors) satisfies the hypotheses up to machine precision (reconstruction error < 3e-7).

Files

threshold-cofiber-detection/
β”œβ”€β”€ model.safetensors                                        # 241 KB trained INT8 circuit (linear_70k)
β”œβ”€β”€ model_untrained.safetensors                              # 241 KB untrained INT8 circuit
β”œβ”€β”€ model.py                                                 # standalone circuit inference
β”œβ”€β”€ model_box32.py                                           # box32 variant architecture
β”œβ”€β”€ cofiber_threshold_coco_8ep_70k.pth                       # trained PyTorch weights (linear_70k, 4.0 mAP)
β”œβ”€β”€ cofiber_threshold_untrained_70k.pth                      # untrained PyTorch weights
β”œβ”€β”€ cofiber_threshold_coco_8ep_70k_eval.json                 # pycocotools eval (linear_70k)
β”œβ”€β”€ cofiber_threshold_box32_coco_8ep_92k.pth                 # trained (box32, 5.7 mAP)
β”œβ”€β”€ cofiber_threshold_box32_coco_8ep_92k_eval.json           # pycocotools eval (box32)
β”œβ”€β”€ cofiber_threshold_box32_coco_8ep_92k_pruned76k.pth       # pruned R1 (5.7 mAP, ~76K nonzero)
β”œβ”€β”€ cofiber_threshold_box32_coco_8ep_92k_pruned76k_eval.json # pycocotools eval (pruned R1)
β”œβ”€β”€ cofiber_threshold_box32_coco_8ep_pruned_62k.pth          # pruned R2 (5.9 mAP, ~62K nonzero, best)
β”œβ”€β”€ cofiber_threshold_box32_coco_8ep_pruned_62k_eval.json    # pycocotools eval (pruned R2)
β”œβ”€β”€ cofiber_threshold_box32_pruned_46k.pth                   # pruned R3 (5.1 mAP, ~46K nonzero)
β”œβ”€β”€ cofiber_threshold_box32_pruned_46k_eval.json             # pycocotools eval (pruned R3)
β”œβ”€β”€ cofiber_threshold_dim20_coco_8ep_22k.pth                 # dim20 (3.9 mAP, 22K params)
β”œβ”€β”€ cofiber_threshold_dim20_coco_8ep_22k_eval.json           # pycocotools eval (dim20)
β”œβ”€β”€ cofiber_threshold_dim20_pruned_R1_18k.pth                # dim20 pruned R1 (3.9 mAP, 18K nonzero)
β”œβ”€β”€ cofiber_threshold_dim20_pruned_R1_18k_eval.json          # pycocotools eval (dim20 R1)
β”œβ”€β”€ cofiber_threshold_dim20_pruned_R2_18k.pth                # dim20 pruned R2 (3.8 mAP, 18K nonzero)
β”œβ”€β”€ cofiber_threshold_dim20_pruned_R2_18k_eval.json          # pycocotools eval (dim20 R2)
β”œβ”€β”€ cofiber_threshold_dim20_pruning_pareto.json              # full Pareto front + pruner config
β”œβ”€β”€ cofiber_threshold_dim15_coco_8ep_17k.pth                 # dim15 (3.0 mAP, 17K params)
β”œβ”€β”€ cofiber_threshold_dim15_coco_8ep_17k_eval.json           # pycocotools eval (dim15)
β”œβ”€β”€ cofiber_threshold_dim10_coco_8ep_13k.pth                 # dim10 (1.5 mAP, 13K params)
β”œβ”€β”€ cofiber_threshold_dim10_coco_8ep_13k_eval.json           # pycocotools eval (dim10)
β”œβ”€β”€ cofiber_threshold_dim5_coco_8ep_9k.pth                   # dim5 (0.3 mAP, 9K params)
β”œβ”€β”€ cofiber_threshold_dim5_coco_8ep_9k_eval.json             # pycocotools eval (dim5)
β”œβ”€β”€ config.json                                              # architecture metadata
β”œβ”€β”€ CofiberDecomposition.v                                   # machine-checked proof
β”œβ”€β”€ TODO.md                                                  # research directions
└── README.md

License

Apache 2.0

Downloads last month
313
Safetensors
Model size
61.5k params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for phanerozoic/threshold-cofiber-detection

Finetuned
(5)
this model

Collection including phanerozoic/threshold-cofiber-detection