NeuroGolf Solver — Roadmap

Current: v5.2 · 51 Kaggle validated · LB 594.84 · Target: 3000+ Philosophy: Research → Design → Experiment → Analyze → Research loop until confirmed score increase. Rule: NEVER claim a feature works without full arc-gen validation on representative tasks. Updated: 2026-04-27 — LB 594.84 confirmed. Phase 3 redesigned from expert review + literature. All 400 tasks count. There are NO excluded tasks. Unsolved = 1.0 pt (Kaggle adds automatically).

Current Solver Breakdown (51/400 solved, LB 594.84)

Category	Tasks	Solvers
Conv (lstsq)	25	conv_fixed, conv_var, conv_diff, conv_var_diff
Analytical	24	identity, constant, color_map, transpose, flip, rotate, shift, tile, upscale, mirror, concat, spatial_gather, etc.
Gravity	1	gravity_unrolled (Task 78)
Mode fill	1	mode_fill (Task 129)
Unsolved	349	—

Phase 1: Score Optimization on Existing Tasks

1a: Opset 17 Slice-Based Analytical Solvers ⬜

Convert Gather-based solvers to Slice(step=-1) + Transpose for ~0 MACs.

1b: ONNX Optimizer Pass ⬜

onnxoptimizer.optimize() for dead-code elimination.

Phase 2: Regularization — EXHAUSTED

Exps 0-3 tested. Architecture mismatch, not overfitting. Conv ceiling = ~25 tasks.

Phase 3: New Solver Types

Organized by architecture type. Each solver is a separate .py file. Build rule: Scan for matches FIRST, build only what has hits, validate on arc-gen.

Category A: Static Spatial Remapping (Gather/Slice/Pad)

These are cheap, zero/low-MAC solvers that use precomputed index mappings. Highest score per task. Build these first.

#	Solver	Pattern	Key Ops	Status
A1	`extract_inner`	Remove N-pixel border frame → smaller output	Gather	⬜
A2	`add_border`	Add constant-color border → larger output	Gather+const	⬜
A3	`pad_align`	Input pasted into larger canvas at fixed offset	Gather+const	⬜
A4	`downsample_stride`	`out[r,c] = inp[rsH, csW]`	Gather	⬜
A5	`extract_and_tile`	Find smallest repeating unit, tile to fill output	Gather	⬜
A6	`sparse_fill`	Each non-zero pixel becomes NxN block	Gather	⬜
A7	`symmetry_complete`	Mirror sparse data to complete L-R or T-B symmetry	Gather	⬜
A8	`multi_stamp`	Union of shifted copies of input at fixed offsets	Gather+Add	⬜
A9	`affine_remap`	General integer coordinate remap: stride+offset, axis swap	Gather	⬜
A10	`crop_paste`	Crop from input, paste at different position in output	Gather+const	⬜

Category B: Channel/Color Operations

Color-level transforms that work in the 10-channel one-hot space.

#	Solver	Pattern	Key Ops	Status
B1	`channel_filter`	Keep only certain colors, rest → background	Mul(mask [1,10,1,1])	⬜
B2	`overlay_constant`	Input + fixed pixel pattern overlaid	Add or Where + constant tensor	⬜
B3	`fill_bg_with_mode`	Background pixels filled with dominant color, non-bg unchanged	ReduceSum→ArgMax→Where	⬜
B4	`row_mode_fill`	Each row filled with its dominant color	ReduceSum(width)→ArgMax→Tile(width)	⬜
B5	`col_mode_fill`	Each column filled with its dominant color	ReduceSum(height)→ArgMax→Tile(height)	⬜

Category C: Composition / Chaining

Chain two existing solvers. If transform(input) → intermediate, and color_map(intermediate) → output, emit one combined graph.

#	Solver	Pattern	Key Ops	Status
C1	`transform_then_recolor`	rotate/flip/transpose + color_map	Chain existing	⬜
C2	`crop_then_transform`	fixed_crop + rotate/flip	Chain existing	⬜
C3	`recolor_then_tile`	color_map + tile/upscale	Chain existing	⬜

Category D: Unrolled Propagation (Conv+Where loops)

Dynamic solvers that need N unrolled steps. Higher MAC cost (~8-12 score).

#	Solver	Pattern	Key Ops	Status
D1	`gravity_unrolled`	Directional compaction, 4 dirs × 10 bg colors	Conv+Where ×N steps	✅ Task 78
D2	`flood_fill`	BFS: seed spreads through passable cells	Conv+Clip+Mul ×N steps	⬜
D3	`edge_detect`	Laplacian/Sobel boundary detection	Conv(3×3)+Abs+Greater	✅ built, 0 matches

Category E: Global Aggregation

Solvers that compute a global statistic and broadcast it.

#	Solver	Pattern	Key Ops	Status
E1	`mode_fill`	Output = solid fill of most common input color	ReduceSum→ArgMax→Expand	✅ Task 129
E2	`cumsum_fill`	Running sums for object extent, directional filling	CumSum	⬜
E3	`bbox_crop_pad`	Find bounding box via ReduceSum+ArgMax, crop+pad	ReduceSum→ArgMax→Slice→Pad	⬜

Build Order (highest expected ROI first)

Wave 1 — Static remapping (Category A): Cheapest to build, highest score per task, most likely to have matches. ~1 day.

A1 extract_inner + A2 add_border (border ops)
A5 extract_and_tile + A6 sparse_fill (pattern ops)
A3 pad_align + A4 downsample_stride (placement ops)
A7 symmetry_complete (symmetry)

Wave 2 — Color/channel ops (Category B): Builds on mode_fill. ~0.5 day. 5. B1 channel_filter + B3 fill_bg_with_mode 6. B4 row_mode_fill + B5 col_mode_fill

Wave 3 — Composition (Category C): Chains existing solvers, no new ONNX ops. ~0.5 day. 7. C1 transform_then_recolor

Wave 4 — Propagation (Category D): More complex, lower score. ~1 day. 8. D2 flood_fill

Wave 5 — Global aggregation (Category E): Needs careful design. ~1 day. 9. E2 cumsum_fill + E3 bbox_crop_pad

Honest Projections

I will NOT repeat the Phase 2 mistake of projecting fantasy numbers. Here's what I know:

51 tasks solved today. LB 594.84.
Each Wave: Might add 2-10 tasks. Might add 0. We don't know until we scan and test.
The only reliable estimate: Gravity added 1 task. Mode fill added 1 task. Edge detect added 0. Hit rate so far: ~1 new task per solver built.
If hit rate holds: 20 new solvers × ~1 task each = ~20 new tasks → ~70 solved → LB ~800-900.
If some solvers hit 5+ tasks: Could reach 100-120 solved → LB ~1200-1500.
3000+ requires a fundamentally different approach (test-time training, learned architectures) that we're not doing.

Scenario	Solved	Est LB	Confidence
Wave 1 only	55-65	650-800	60%
Wave 1+2	60-75	750-950	50%
Wave 1+2+3	65-85	850-1100	40%
All waves	70-120	900-1500	30%

Phase 4: Score Optimization

4a: Best-of-N Model Selection ⬜

4b: Official Scoring Alignment (onnx_tool) ⬜

BLENDING — EXPLICITLY EXCLUDED

Experiment Log

Date	Experiment	Result	Decision
2026-04-24	v4.2 baseline	50 arc-gen, LB ~501	Baseline
2026-04-26	v5.0 refactor	49 solved, ~604 score	New baseline
2026-04-26	Exp 1-3 (regularization)	0 improvement	EXHAUSTED
2026-04-26	v5.2 gravity+mode	+2 tasks (78, 129)	✅ Kept
2026-04-27	v5.2 Kaggle submission	51 solved, LB 594.84	Current best

Research Queue

✅ CompressARC — CumMax/ReduceSum architecture
✅ TRM — recursive reasoning
✅ ARC Prize 2025 Tech Report
✅ Expert review #1 — Phase 3 solver list (pad_align, crop_paste, downsample, etc.)
✅ Expert review #2 — 6 concrete solvers with code (extract_inner, add_border, etc.)
Task taxonomy scan — for each Wave 1 solver, count matching unsolved tasks before building