File size: 8,025 Bytes
987c46d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | # NeuroGolf Solver — Roadmap
> Current: v5.2 · 51 Kaggle validated · LB 594.84 · Target: 3000+
> Philosophy: **Research → Design → Experiment → Analyze → Research** loop until confirmed score increase.
> Rule: **NEVER claim a feature works without full arc-gen validation on representative tasks.**
> Updated: 2026-04-27 — LB 594.84 confirmed. Phase 3 redesigned from expert review + literature.
> **All 400 tasks count. There are NO excluded tasks. Unsolved = 1.0 pt (Kaggle adds automatically).**
---
## Current Solver Breakdown (51/400 solved, LB 594.84)
| Category | Tasks | Solvers |
|----------|-------|---------|
| Conv (lstsq) | 25 | conv_fixed, conv_var, conv_diff, conv_var_diff |
| Analytical | 24 | identity, constant, color_map, transpose, flip, rotate, shift, tile, upscale, mirror, concat, spatial_gather, etc. |
| Gravity | 1 | gravity_unrolled (Task 78) |
| Mode fill | 1 | mode_fill (Task 129) |
| **Unsolved** | **349** | — |
---
## Phase 1: Score Optimization on Existing Tasks
### 1a: Opset 17 Slice-Based Analytical Solvers ⬜
> Convert Gather-based solvers to Slice(step=-1) + Transpose for ~0 MACs.
### 1b: ONNX Optimizer Pass ⬜
> `onnxoptimizer.optimize()` for dead-code elimination.
---
## Phase 2: Regularization — EXHAUSTED
> Exps 0-3 tested. Architecture mismatch, not overfitting. Conv ceiling = ~25 tasks.
---
## Phase 3: New Solver Types
> Organized by architecture type. Each solver is a separate .py file.
> **Build rule:** Scan for matches FIRST, build only what has hits, validate on arc-gen.
---
### Category A: Static Spatial Remapping (Gather/Slice/Pad)
These are cheap, zero/low-MAC solvers that use precomputed index mappings. Highest score per task. Build these first.
| # | Solver | Pattern | Key Ops | Status |
|---|--------|---------|---------|--------|
| A1 | `extract_inner` | Remove N-pixel border frame → smaller output | Gather | ⬜ |
| A2 | `add_border` | Add constant-color border → larger output | Gather+const | ⬜ |
| A3 | `pad_align` | Input pasted into larger canvas at fixed offset | Gather+const | ⬜ |
| A4 | `downsample_stride` | `out[r,c] = inp[r*sH, c*sW]` | Gather | ⬜ |
| A5 | `extract_and_tile` | Find smallest repeating unit, tile to fill output | Gather | ⬜ |
| A6 | `sparse_fill` | Each non-zero pixel becomes NxN block | Gather | ⬜ |
| A7 | `symmetry_complete` | Mirror sparse data to complete L-R or T-B symmetry | Gather | ⬜ |
| A8 | `multi_stamp` | Union of shifted copies of input at fixed offsets | Gather+Add | ⬜ |
| A9 | `affine_remap` | General integer coordinate remap: stride+offset, axis swap | Gather | ⬜ |
| A10 | `crop_paste` | Crop from input, paste at different position in output | Gather+const | ⬜ |
---
### Category B: Channel/Color Operations
Color-level transforms that work in the 10-channel one-hot space.
| # | Solver | Pattern | Key Ops | Status |
|---|--------|---------|---------|--------|
| B1 | `channel_filter` | Keep only certain colors, rest → background | Mul(mask [1,10,1,1]) | ⬜ |
| B2 | `overlay_constant` | Input + fixed pixel pattern overlaid | Add or Where + constant tensor | ⬜ |
| B3 | `fill_bg_with_mode` | Background pixels filled with dominant color, non-bg unchanged | ReduceSum→ArgMax→Where | ⬜ |
| B4 | `row_mode_fill` | Each row filled with its dominant color | ReduceSum(width)→ArgMax→Tile(width) | ⬜ |
| B5 | `col_mode_fill` | Each column filled with its dominant color | ReduceSum(height)→ArgMax→Tile(height) | ⬜ |
---
### Category C: Composition / Chaining
Chain two existing solvers. If transform(input) → intermediate, and color_map(intermediate) → output, emit one combined graph.
| # | Solver | Pattern | Key Ops | Status |
|---|--------|---------|---------|--------|
| C1 | `transform_then_recolor` | rotate/flip/transpose + color_map | Chain existing | ⬜ |
| C2 | `crop_then_transform` | fixed_crop + rotate/flip | Chain existing | ⬜ |
| C3 | `recolor_then_tile` | color_map + tile/upscale | Chain existing | ⬜ |
---
### Category D: Unrolled Propagation (Conv+Where loops)
Dynamic solvers that need N unrolled steps. Higher MAC cost (~8-12 score).
| # | Solver | Pattern | Key Ops | Status |
|---|--------|---------|---------|--------|
| D1 | `gravity_unrolled` | Directional compaction, 4 dirs × 10 bg colors | Conv+Where ×N steps | ✅ Task 78 |
| D2 | `flood_fill` | BFS: seed spreads through passable cells | Conv+Clip+Mul ×N steps | ⬜ |
| D3 | `edge_detect` | Laplacian/Sobel boundary detection | Conv(3×3)+Abs+Greater | ✅ built, 0 matches |
---
### Category E: Global Aggregation
Solvers that compute a global statistic and broadcast it.
| # | Solver | Pattern | Key Ops | Status |
|---|--------|---------|---------|--------|
| E1 | `mode_fill` | Output = solid fill of most common input color | ReduceSum→ArgMax→Expand | ✅ Task 129 |
| E2 | `cumsum_fill` | Running sums for object extent, directional filling | CumSum | ⬜ |
| E3 | `bbox_crop_pad` | Find bounding box via ReduceSum+ArgMax, crop+pad | ReduceSum→ArgMax→Slice→Pad | ⬜ |
---
### Build Order (highest expected ROI first)
**Wave 1 — Static remapping (Category A):** Cheapest to build, highest score per task, most likely to have matches. ~1 day.
1. A1 `extract_inner` + A2 `add_border` (border ops)
2. A5 `extract_and_tile` + A6 `sparse_fill` (pattern ops)
3. A3 `pad_align` + A4 `downsample_stride` (placement ops)
4. A7 `symmetry_complete` (symmetry)
**Wave 2 — Color/channel ops (Category B):** Builds on mode_fill. ~0.5 day.
5. B1 `channel_filter` + B3 `fill_bg_with_mode`
6. B4 `row_mode_fill` + B5 `col_mode_fill`
**Wave 3 — Composition (Category C):** Chains existing solvers, no new ONNX ops. ~0.5 day.
7. C1 `transform_then_recolor`
**Wave 4 — Propagation (Category D):** More complex, lower score. ~1 day.
8. D2 `flood_fill`
**Wave 5 — Global aggregation (Category E):** Needs careful design. ~1 day.
9. E2 `cumsum_fill` + E3 `bbox_crop_pad`
---
### Honest Projections
I will NOT repeat the Phase 2 mistake of projecting fantasy numbers. Here's what I know:
- **51 tasks solved today.** LB 594.84.
- **Each Wave:** Might add 2-10 tasks. Might add 0. We don't know until we scan and test.
- **The only reliable estimate:** Gravity added 1 task. Mode fill added 1 task. Edge detect added 0. Hit rate so far: ~1 new task per solver built.
- **If hit rate holds:** 20 new solvers × ~1 task each = ~20 new tasks → ~70 solved → LB ~800-900.
- **If some solvers hit 5+ tasks:** Could reach 100-120 solved → LB ~1200-1500.
- **3000+ requires a fundamentally different approach** (test-time training, learned architectures) that we're not doing.
| Scenario | Solved | Est LB | Confidence |
|----------|--------|--------|------------|
| Wave 1 only | 55-65 | 650-800 | 60% |
| Wave 1+2 | 60-75 | 750-950 | 50% |
| Wave 1+2+3 | 65-85 | 850-1100 | 40% |
| All waves | 70-120 | 900-1500 | 30% |
---
## Phase 4: Score Optimization
### 4a: Best-of-N Model Selection ⬜
### 4b: Official Scoring Alignment (onnx_tool) ⬜
---
## BLENDING — EXPLICITLY EXCLUDED
---
## Experiment Log
| Date | Experiment | Result | Decision |
|------|-----------|--------|----------|
| 2026-04-24 | v4.2 baseline | 50 arc-gen, LB ~501 | Baseline |
| 2026-04-26 | v5.0 refactor | 49 solved, ~604 score | New baseline |
| 2026-04-26 | Exp 1-3 (regularization) | 0 improvement | **EXHAUSTED** |
| 2026-04-26 | v5.2 gravity+mode | +2 tasks (78, 129) | ✅ Kept |
| 2026-04-27 | **v5.2 Kaggle submission** | **51 solved, LB 594.84** | **Current best** |
---
## Research Queue
1. ✅ CompressARC — CumMax/ReduceSum architecture
2. ✅ TRM — recursive reasoning
3. ✅ ARC Prize 2025 Tech Report
4. ✅ Expert review #1 — Phase 3 solver list (pad_align, crop_paste, downsample, etc.)
5. ✅ Expert review #2 — 6 concrete solvers with code (extract_inner, add_border, etc.)
6. [ ] **Task taxonomy scan** — for each Wave 1 solver, count matching unsolved tasks before building
|