rogermt's picture
Move own-solver/TODO.md to own-solver/
987c46d verified

NeuroGolf Solver — Roadmap

Current: v5.2 · 51 Kaggle validated · LB 594.84 · Target: 3000+ Philosophy: Research → Design → Experiment → Analyze → Research loop until confirmed score increase. Rule: NEVER claim a feature works without full arc-gen validation on representative tasks. Updated: 2026-04-27 — LB 594.84 confirmed. Phase 3 redesigned from expert review + literature. All 400 tasks count. There are NO excluded tasks. Unsolved = 1.0 pt (Kaggle adds automatically).


Current Solver Breakdown (51/400 solved, LB 594.84)

Category Tasks Solvers
Conv (lstsq) 25 conv_fixed, conv_var, conv_diff, conv_var_diff
Analytical 24 identity, constant, color_map, transpose, flip, rotate, shift, tile, upscale, mirror, concat, spatial_gather, etc.
Gravity 1 gravity_unrolled (Task 78)
Mode fill 1 mode_fill (Task 129)
Unsolved 349

Phase 1: Score Optimization on Existing Tasks

1a: Opset 17 Slice-Based Analytical Solvers ⬜

Convert Gather-based solvers to Slice(step=-1) + Transpose for ~0 MACs.

1b: ONNX Optimizer Pass ⬜

onnxoptimizer.optimize() for dead-code elimination.


Phase 2: Regularization — EXHAUSTED

Exps 0-3 tested. Architecture mismatch, not overfitting. Conv ceiling = ~25 tasks.


Phase 3: New Solver Types

Organized by architecture type. Each solver is a separate .py file. Build rule: Scan for matches FIRST, build only what has hits, validate on arc-gen.


Category A: Static Spatial Remapping (Gather/Slice/Pad)

These are cheap, zero/low-MAC solvers that use precomputed index mappings. Highest score per task. Build these first.

# Solver Pattern Key Ops Status
A1 extract_inner Remove N-pixel border frame → smaller output Gather
A2 add_border Add constant-color border → larger output Gather+const
A3 pad_align Input pasted into larger canvas at fixed offset Gather+const
A4 downsample_stride out[r,c] = inp[r*sH, c*sW] Gather
A5 extract_and_tile Find smallest repeating unit, tile to fill output Gather
A6 sparse_fill Each non-zero pixel becomes NxN block Gather
A7 symmetry_complete Mirror sparse data to complete L-R or T-B symmetry Gather
A8 multi_stamp Union of shifted copies of input at fixed offsets Gather+Add
A9 affine_remap General integer coordinate remap: stride+offset, axis swap Gather
A10 crop_paste Crop from input, paste at different position in output Gather+const

Category B: Channel/Color Operations

Color-level transforms that work in the 10-channel one-hot space.

# Solver Pattern Key Ops Status
B1 channel_filter Keep only certain colors, rest → background Mul(mask [1,10,1,1])
B2 overlay_constant Input + fixed pixel pattern overlaid Add or Where + constant tensor
B3 fill_bg_with_mode Background pixels filled with dominant color, non-bg unchanged ReduceSum→ArgMax→Where
B4 row_mode_fill Each row filled with its dominant color ReduceSum(width)→ArgMax→Tile(width)
B5 col_mode_fill Each column filled with its dominant color ReduceSum(height)→ArgMax→Tile(height)

Category C: Composition / Chaining

Chain two existing solvers. If transform(input) → intermediate, and color_map(intermediate) → output, emit one combined graph.

# Solver Pattern Key Ops Status
C1 transform_then_recolor rotate/flip/transpose + color_map Chain existing
C2 crop_then_transform fixed_crop + rotate/flip Chain existing
C3 recolor_then_tile color_map + tile/upscale Chain existing

Category D: Unrolled Propagation (Conv+Where loops)

Dynamic solvers that need N unrolled steps. Higher MAC cost (~8-12 score).

# Solver Pattern Key Ops Status
D1 gravity_unrolled Directional compaction, 4 dirs × 10 bg colors Conv+Where ×N steps ✅ Task 78
D2 flood_fill BFS: seed spreads through passable cells Conv+Clip+Mul ×N steps
D3 edge_detect Laplacian/Sobel boundary detection Conv(3×3)+Abs+Greater ✅ built, 0 matches

Category E: Global Aggregation

Solvers that compute a global statistic and broadcast it.

# Solver Pattern Key Ops Status
E1 mode_fill Output = solid fill of most common input color ReduceSum→ArgMax→Expand ✅ Task 129
E2 cumsum_fill Running sums for object extent, directional filling CumSum
E3 bbox_crop_pad Find bounding box via ReduceSum+ArgMax, crop+pad ReduceSum→ArgMax→Slice→Pad

Build Order (highest expected ROI first)

Wave 1 — Static remapping (Category A): Cheapest to build, highest score per task, most likely to have matches. ~1 day.

  1. A1 extract_inner + A2 add_border (border ops)
  2. A5 extract_and_tile + A6 sparse_fill (pattern ops)
  3. A3 pad_align + A4 downsample_stride (placement ops)
  4. A7 symmetry_complete (symmetry)

Wave 2 — Color/channel ops (Category B): Builds on mode_fill. ~0.5 day. 5. B1 channel_filter + B3 fill_bg_with_mode 6. B4 row_mode_fill + B5 col_mode_fill

Wave 3 — Composition (Category C): Chains existing solvers, no new ONNX ops. ~0.5 day. 7. C1 transform_then_recolor

Wave 4 — Propagation (Category D): More complex, lower score. ~1 day. 8. D2 flood_fill

Wave 5 — Global aggregation (Category E): Needs careful design. ~1 day. 9. E2 cumsum_fill + E3 bbox_crop_pad


Honest Projections

I will NOT repeat the Phase 2 mistake of projecting fantasy numbers. Here's what I know:

  • 51 tasks solved today. LB 594.84.
  • Each Wave: Might add 2-10 tasks. Might add 0. We don't know until we scan and test.
  • The only reliable estimate: Gravity added 1 task. Mode fill added 1 task. Edge detect added 0. Hit rate so far: ~1 new task per solver built.
  • If hit rate holds: 20 new solvers × ~1 task each = ~20 new tasks → ~70 solved → LB ~800-900.
  • If some solvers hit 5+ tasks: Could reach 100-120 solved → LB ~1200-1500.
  • 3000+ requires a fundamentally different approach (test-time training, learned architectures) that we're not doing.
Scenario Solved Est LB Confidence
Wave 1 only 55-65 650-800 60%
Wave 1+2 60-75 750-950 50%
Wave 1+2+3 65-85 850-1100 40%
All waves 70-120 900-1500 30%

Phase 4: Score Optimization

4a: Best-of-N Model Selection ⬜

4b: Official Scoring Alignment (onnx_tool) ⬜


BLENDING — EXPLICITLY EXCLUDED


Experiment Log

Date Experiment Result Decision
2026-04-24 v4.2 baseline 50 arc-gen, LB ~501 Baseline
2026-04-26 v5.0 refactor 49 solved, ~604 score New baseline
2026-04-26 Exp 1-3 (regularization) 0 improvement EXHAUSTED
2026-04-26 v5.2 gravity+mode +2 tasks (78, 129) ✅ Kept
2026-04-27 v5.2 Kaggle submission 51 solved, LB 594.84 Current best

Research Queue

  1. ✅ CompressARC — CumMax/ReduceSum architecture
  2. ✅ TRM — recursive reasoning
  3. ✅ ARC Prize 2025 Tech Report
  4. ✅ Expert review #1 — Phase 3 solver list (pad_align, crop_paste, downsample, etc.)
  5. ✅ Expert review #2 — 6 concrete solvers with code (extract_inner, add_border, etc.)
  6. Task taxonomy scan — for each Wave 1 solver, count matching unsolved tasks before building