NeuroGolf Solver — Roadmap
Current: v5.2 · 51 Kaggle validated · LB 594.84 · Target: 3000+ Philosophy: Research → Design → Experiment → Analyze → Research loop until confirmed score increase. Rule: NEVER claim a feature works without full arc-gen validation on representative tasks. Updated: 2026-04-27 — LB 594.84 confirmed. Phase 3 redesigned from expert review + literature. All 400 tasks count. There are NO excluded tasks. Unsolved = 1.0 pt (Kaggle adds automatically).
Current Solver Breakdown (51/400 solved, LB 594.84)
| Category | Tasks | Solvers |
|---|---|---|
| Conv (lstsq) | 25 | conv_fixed, conv_var, conv_diff, conv_var_diff |
| Analytical | 24 | identity, constant, color_map, transpose, flip, rotate, shift, tile, upscale, mirror, concat, spatial_gather, etc. |
| Gravity | 1 | gravity_unrolled (Task 78) |
| Mode fill | 1 | mode_fill (Task 129) |
| Unsolved | 349 | — |
Phase 1: Score Optimization on Existing Tasks
1a: Opset 17 Slice-Based Analytical Solvers ⬜
Convert Gather-based solvers to Slice(step=-1) + Transpose for ~0 MACs.
1b: ONNX Optimizer Pass ⬜
onnxoptimizer.optimize()for dead-code elimination.
Phase 2: Regularization — EXHAUSTED
Exps 0-3 tested. Architecture mismatch, not overfitting. Conv ceiling = ~25 tasks.
Phase 3: New Solver Types
Organized by architecture type. Each solver is a separate .py file. Build rule: Scan for matches FIRST, build only what has hits, validate on arc-gen.
Category A: Static Spatial Remapping (Gather/Slice/Pad)
These are cheap, zero/low-MAC solvers that use precomputed index mappings. Highest score per task. Build these first.
| # | Solver | Pattern | Key Ops | Status |
|---|---|---|---|---|
| A1 | extract_inner |
Remove N-pixel border frame → smaller output | Gather | ⬜ |
| A2 | add_border |
Add constant-color border → larger output | Gather+const | ⬜ |
| A3 | pad_align |
Input pasted into larger canvas at fixed offset | Gather+const | ⬜ |
| A4 | downsample_stride |
out[r,c] = inp[r*sH, c*sW] |
Gather | ⬜ |
| A5 | extract_and_tile |
Find smallest repeating unit, tile to fill output | Gather | ⬜ |
| A6 | sparse_fill |
Each non-zero pixel becomes NxN block | Gather | ⬜ |
| A7 | symmetry_complete |
Mirror sparse data to complete L-R or T-B symmetry | Gather | ⬜ |
| A8 | multi_stamp |
Union of shifted copies of input at fixed offsets | Gather+Add | ⬜ |
| A9 | affine_remap |
General integer coordinate remap: stride+offset, axis swap | Gather | ⬜ |
| A10 | crop_paste |
Crop from input, paste at different position in output | Gather+const | ⬜ |
Category B: Channel/Color Operations
Color-level transforms that work in the 10-channel one-hot space.
| # | Solver | Pattern | Key Ops | Status |
|---|---|---|---|---|
| B1 | channel_filter |
Keep only certain colors, rest → background | Mul(mask [1,10,1,1]) | ⬜ |
| B2 | overlay_constant |
Input + fixed pixel pattern overlaid | Add or Where + constant tensor | ⬜ |
| B3 | fill_bg_with_mode |
Background pixels filled with dominant color, non-bg unchanged | ReduceSum→ArgMax→Where | ⬜ |
| B4 | row_mode_fill |
Each row filled with its dominant color | ReduceSum(width)→ArgMax→Tile(width) | ⬜ |
| B5 | col_mode_fill |
Each column filled with its dominant color | ReduceSum(height)→ArgMax→Tile(height) | ⬜ |
Category C: Composition / Chaining
Chain two existing solvers. If transform(input) → intermediate, and color_map(intermediate) → output, emit one combined graph.
| # | Solver | Pattern | Key Ops | Status |
|---|---|---|---|---|
| C1 | transform_then_recolor |
rotate/flip/transpose + color_map | Chain existing | ⬜ |
| C2 | crop_then_transform |
fixed_crop + rotate/flip | Chain existing | ⬜ |
| C3 | recolor_then_tile |
color_map + tile/upscale | Chain existing | ⬜ |
Category D: Unrolled Propagation (Conv+Where loops)
Dynamic solvers that need N unrolled steps. Higher MAC cost (~8-12 score).
| # | Solver | Pattern | Key Ops | Status |
|---|---|---|---|---|
| D1 | gravity_unrolled |
Directional compaction, 4 dirs × 10 bg colors | Conv+Where ×N steps | ✅ Task 78 |
| D2 | flood_fill |
BFS: seed spreads through passable cells | Conv+Clip+Mul ×N steps | ⬜ |
| D3 | edge_detect |
Laplacian/Sobel boundary detection | Conv(3×3)+Abs+Greater | ✅ built, 0 matches |
Category E: Global Aggregation
Solvers that compute a global statistic and broadcast it.
| # | Solver | Pattern | Key Ops | Status |
|---|---|---|---|---|
| E1 | mode_fill |
Output = solid fill of most common input color | ReduceSum→ArgMax→Expand | ✅ Task 129 |
| E2 | cumsum_fill |
Running sums for object extent, directional filling | CumSum | ⬜ |
| E3 | bbox_crop_pad |
Find bounding box via ReduceSum+ArgMax, crop+pad | ReduceSum→ArgMax→Slice→Pad | ⬜ |
Build Order (highest expected ROI first)
Wave 1 — Static remapping (Category A): Cheapest to build, highest score per task, most likely to have matches. ~1 day.
- A1
extract_inner+ A2add_border(border ops) - A5
extract_and_tile+ A6sparse_fill(pattern ops) - A3
pad_align+ A4downsample_stride(placement ops) - A7
symmetry_complete(symmetry)
Wave 2 — Color/channel ops (Category B): Builds on mode_fill. ~0.5 day.
5. B1 channel_filter + B3 fill_bg_with_mode
6. B4 row_mode_fill + B5 col_mode_fill
Wave 3 — Composition (Category C): Chains existing solvers, no new ONNX ops. ~0.5 day.
7. C1 transform_then_recolor
Wave 4 — Propagation (Category D): More complex, lower score. ~1 day.
8. D2 flood_fill
Wave 5 — Global aggregation (Category E): Needs careful design. ~1 day.
9. E2 cumsum_fill + E3 bbox_crop_pad
Honest Projections
I will NOT repeat the Phase 2 mistake of projecting fantasy numbers. Here's what I know:
- 51 tasks solved today. LB 594.84.
- Each Wave: Might add 2-10 tasks. Might add 0. We don't know until we scan and test.
- The only reliable estimate: Gravity added 1 task. Mode fill added 1 task. Edge detect added 0. Hit rate so far: ~1 new task per solver built.
- If hit rate holds: 20 new solvers × ~1 task each = ~20 new tasks → ~70 solved → LB ~800-900.
- If some solvers hit 5+ tasks: Could reach 100-120 solved → LB ~1200-1500.
- 3000+ requires a fundamentally different approach (test-time training, learned architectures) that we're not doing.
| Scenario | Solved | Est LB | Confidence |
|---|---|---|---|
| Wave 1 only | 55-65 | 650-800 | 60% |
| Wave 1+2 | 60-75 | 750-950 | 50% |
| Wave 1+2+3 | 65-85 | 850-1100 | 40% |
| All waves | 70-120 | 900-1500 | 30% |
Phase 4: Score Optimization
4a: Best-of-N Model Selection ⬜
4b: Official Scoring Alignment (onnx_tool) ⬜
BLENDING — EXPLICITLY EXCLUDED
Experiment Log
| Date | Experiment | Result | Decision |
|---|---|---|---|
| 2026-04-24 | v4.2 baseline | 50 arc-gen, LB ~501 | Baseline |
| 2026-04-26 | v5.0 refactor | 49 solved, ~604 score | New baseline |
| 2026-04-26 | Exp 1-3 (regularization) | 0 improvement | EXHAUSTED |
| 2026-04-26 | v5.2 gravity+mode | +2 tasks (78, 129) | ✅ Kept |
| 2026-04-27 | v5.2 Kaggle submission | 51 solved, LB 594.84 | Current best |
Research Queue
- ✅ CompressARC — CumMax/ReduceSum architecture
- ✅ TRM — recursive reasoning
- ✅ ARC Prize 2025 Tech Report
- ✅ Expert review #1 — Phase 3 solver list (pad_align, crop_paste, downsample, etc.)
- ✅ Expert review #2 — 6 concrete solvers with code (extract_inner, add_border, etc.)
- Task taxonomy scan — for each Wave 1 solver, count matching unsolved tasks before building