rogermt commited on
Commit
987c46d
·
verified ·
1 Parent(s): 022a14c

Move own-solver/TODO.md to own-solver/

Browse files
Files changed (1) hide show
  1. own-solver/TODO.md +188 -0
own-solver/TODO.md ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # NeuroGolf Solver — Roadmap
2
+
3
+ > Current: v5.2 · 51 Kaggle validated · LB 594.84 · Target: 3000+
4
+ > Philosophy: **Research → Design → Experiment → Analyze → Research** loop until confirmed score increase.
5
+ > Rule: **NEVER claim a feature works without full arc-gen validation on representative tasks.**
6
+ > Updated: 2026-04-27 — LB 594.84 confirmed. Phase 3 redesigned from expert review + literature.
7
+ > **All 400 tasks count. There are NO excluded tasks. Unsolved = 1.0 pt (Kaggle adds automatically).**
8
+
9
+ ---
10
+
11
+ ## Current Solver Breakdown (51/400 solved, LB 594.84)
12
+
13
+ | Category | Tasks | Solvers |
14
+ |----------|-------|---------|
15
+ | Conv (lstsq) | 25 | conv_fixed, conv_var, conv_diff, conv_var_diff |
16
+ | Analytical | 24 | identity, constant, color_map, transpose, flip, rotate, shift, tile, upscale, mirror, concat, spatial_gather, etc. |
17
+ | Gravity | 1 | gravity_unrolled (Task 78) |
18
+ | Mode fill | 1 | mode_fill (Task 129) |
19
+ | **Unsolved** | **349** | — |
20
+
21
+ ---
22
+
23
+ ## Phase 1: Score Optimization on Existing Tasks
24
+
25
+ ### 1a: Opset 17 Slice-Based Analytical Solvers ⬜
26
+ > Convert Gather-based solvers to Slice(step=-1) + Transpose for ~0 MACs.
27
+
28
+ ### 1b: ONNX Optimizer Pass ⬜
29
+ > `onnxoptimizer.optimize()` for dead-code elimination.
30
+
31
+ ---
32
+
33
+ ## Phase 2: Regularization — EXHAUSTED
34
+
35
+ > Exps 0-3 tested. Architecture mismatch, not overfitting. Conv ceiling = ~25 tasks.
36
+
37
+ ---
38
+
39
+ ## Phase 3: New Solver Types
40
+
41
+ > Organized by architecture type. Each solver is a separate .py file.
42
+ > **Build rule:** Scan for matches FIRST, build only what has hits, validate on arc-gen.
43
+
44
+ ---
45
+
46
+ ### Category A: Static Spatial Remapping (Gather/Slice/Pad)
47
+
48
+ These are cheap, zero/low-MAC solvers that use precomputed index mappings. Highest score per task. Build these first.
49
+
50
+ | # | Solver | Pattern | Key Ops | Status |
51
+ |---|--------|---------|---------|--------|
52
+ | A1 | `extract_inner` | Remove N-pixel border frame → smaller output | Gather | ⬜ |
53
+ | A2 | `add_border` | Add constant-color border → larger output | Gather+const | ⬜ |
54
+ | A3 | `pad_align` | Input pasted into larger canvas at fixed offset | Gather+const | ⬜ |
55
+ | A4 | `downsample_stride` | `out[r,c] = inp[r*sH, c*sW]` | Gather | ⬜ |
56
+ | A5 | `extract_and_tile` | Find smallest repeating unit, tile to fill output | Gather | ⬜ |
57
+ | A6 | `sparse_fill` | Each non-zero pixel becomes NxN block | Gather | ⬜ |
58
+ | A7 | `symmetry_complete` | Mirror sparse data to complete L-R or T-B symmetry | Gather | ⬜ |
59
+ | A8 | `multi_stamp` | Union of shifted copies of input at fixed offsets | Gather+Add | ⬜ |
60
+ | A9 | `affine_remap` | General integer coordinate remap: stride+offset, axis swap | Gather | ⬜ |
61
+ | A10 | `crop_paste` | Crop from input, paste at different position in output | Gather+const | ⬜ |
62
+
63
+ ---
64
+
65
+ ### Category B: Channel/Color Operations
66
+
67
+ Color-level transforms that work in the 10-channel one-hot space.
68
+
69
+ | # | Solver | Pattern | Key Ops | Status |
70
+ |---|--------|---------|---------|--------|
71
+ | B1 | `channel_filter` | Keep only certain colors, rest → background | Mul(mask [1,10,1,1]) | ⬜ |
72
+ | B2 | `overlay_constant` | Input + fixed pixel pattern overlaid | Add or Where + constant tensor | ⬜ |
73
+ | B3 | `fill_bg_with_mode` | Background pixels filled with dominant color, non-bg unchanged | ReduceSum→ArgMax→Where | ⬜ |
74
+ | B4 | `row_mode_fill` | Each row filled with its dominant color | ReduceSum(width)→ArgMax→Tile(width) | ⬜ |
75
+ | B5 | `col_mode_fill` | Each column filled with its dominant color | ReduceSum(height)→ArgMax→Tile(height) | ⬜ |
76
+
77
+ ---
78
+
79
+ ### Category C: Composition / Chaining
80
+
81
+ Chain two existing solvers. If transform(input) → intermediate, and color_map(intermediate) → output, emit one combined graph.
82
+
83
+ | # | Solver | Pattern | Key Ops | Status |
84
+ |---|--------|---------|---------|--------|
85
+ | C1 | `transform_then_recolor` | rotate/flip/transpose + color_map | Chain existing | ⬜ |
86
+ | C2 | `crop_then_transform` | fixed_crop + rotate/flip | Chain existing | ⬜ |
87
+ | C3 | `recolor_then_tile` | color_map + tile/upscale | Chain existing | ⬜ |
88
+
89
+ ---
90
+
91
+ ### Category D: Unrolled Propagation (Conv+Where loops)
92
+
93
+ Dynamic solvers that need N unrolled steps. Higher MAC cost (~8-12 score).
94
+
95
+ | # | Solver | Pattern | Key Ops | Status |
96
+ |---|--------|---------|---------|--------|
97
+ | D1 | `gravity_unrolled` | Directional compaction, 4 dirs × 10 bg colors | Conv+Where ×N steps | ✅ Task 78 |
98
+ | D2 | `flood_fill` | BFS: seed spreads through passable cells | Conv+Clip+Mul ×N steps | ⬜ |
99
+ | D3 | `edge_detect` | Laplacian/Sobel boundary detection | Conv(3×3)+Abs+Greater | ✅ built, 0 matches |
100
+
101
+ ---
102
+
103
+ ### Category E: Global Aggregation
104
+
105
+ Solvers that compute a global statistic and broadcast it.
106
+
107
+ | # | Solver | Pattern | Key Ops | Status |
108
+ |---|--------|---------|---------|--------|
109
+ | E1 | `mode_fill` | Output = solid fill of most common input color | ReduceSum→ArgMax→Expand | ✅ Task 129 |
110
+ | E2 | `cumsum_fill` | Running sums for object extent, directional filling | CumSum | ⬜ |
111
+ | E3 | `bbox_crop_pad` | Find bounding box via ReduceSum+ArgMax, crop+pad | ReduceSum→ArgMax→Slice→Pad | ⬜ |
112
+
113
+ ---
114
+
115
+ ### Build Order (highest expected ROI first)
116
+
117
+ **Wave 1 — Static remapping (Category A):** Cheapest to build, highest score per task, most likely to have matches. ~1 day.
118
+ 1. A1 `extract_inner` + A2 `add_border` (border ops)
119
+ 2. A5 `extract_and_tile` + A6 `sparse_fill` (pattern ops)
120
+ 3. A3 `pad_align` + A4 `downsample_stride` (placement ops)
121
+ 4. A7 `symmetry_complete` (symmetry)
122
+
123
+ **Wave 2 — Color/channel ops (Category B):** Builds on mode_fill. ~0.5 day.
124
+ 5. B1 `channel_filter` + B3 `fill_bg_with_mode`
125
+ 6. B4 `row_mode_fill` + B5 `col_mode_fill`
126
+
127
+ **Wave 3 — Composition (Category C):** Chains existing solvers, no new ONNX ops. ~0.5 day.
128
+ 7. C1 `transform_then_recolor`
129
+
130
+ **Wave 4 — Propagation (Category D):** More complex, lower score. ~1 day.
131
+ 8. D2 `flood_fill`
132
+
133
+ **Wave 5 — Global aggregation (Category E):** Needs careful design. ~1 day.
134
+ 9. E2 `cumsum_fill` + E3 `bbox_crop_pad`
135
+
136
+ ---
137
+
138
+ ### Honest Projections
139
+
140
+ I will NOT repeat the Phase 2 mistake of projecting fantasy numbers. Here's what I know:
141
+
142
+ - **51 tasks solved today.** LB 594.84.
143
+ - **Each Wave:** Might add 2-10 tasks. Might add 0. We don't know until we scan and test.
144
+ - **The only reliable estimate:** Gravity added 1 task. Mode fill added 1 task. Edge detect added 0. Hit rate so far: ~1 new task per solver built.
145
+ - **If hit rate holds:** 20 new solvers × ~1 task each = ~20 new tasks → ~70 solved → LB ~800-900.
146
+ - **If some solvers hit 5+ tasks:** Could reach 100-120 solved → LB ~1200-1500.
147
+ - **3000+ requires a fundamentally different approach** (test-time training, learned architectures) that we're not doing.
148
+
149
+ | Scenario | Solved | Est LB | Confidence |
150
+ |----------|--------|--------|------------|
151
+ | Wave 1 only | 55-65 | 650-800 | 60% |
152
+ | Wave 1+2 | 60-75 | 750-950 | 50% |
153
+ | Wave 1+2+3 | 65-85 | 850-1100 | 40% |
154
+ | All waves | 70-120 | 900-1500 | 30% |
155
+
156
+ ---
157
+
158
+ ## Phase 4: Score Optimization
159
+
160
+ ### 4a: Best-of-N Model Selection ⬜
161
+ ### 4b: Official Scoring Alignment (onnx_tool) ⬜
162
+
163
+ ---
164
+
165
+ ## BLENDING — EXPLICITLY EXCLUDED
166
+
167
+ ---
168
+
169
+ ## Experiment Log
170
+
171
+ | Date | Experiment | Result | Decision |
172
+ |------|-----------|--------|----------|
173
+ | 2026-04-24 | v4.2 baseline | 50 arc-gen, LB ~501 | Baseline |
174
+ | 2026-04-26 | v5.0 refactor | 49 solved, ~604 score | New baseline |
175
+ | 2026-04-26 | Exp 1-3 (regularization) | 0 improvement | **EXHAUSTED** |
176
+ | 2026-04-26 | v5.2 gravity+mode | +2 tasks (78, 129) | ✅ Kept |
177
+ | 2026-04-27 | **v5.2 Kaggle submission** | **51 solved, LB 594.84** | **Current best** |
178
+
179
+ ---
180
+
181
+ ## Research Queue
182
+
183
+ 1. ✅ CompressARC — CumMax/ReduceSum architecture
184
+ 2. ✅ TRM — recursive reasoning
185
+ 3. ✅ ARC Prize 2025 Tech Report
186
+ 4. ✅ Expert review #1 — Phase 3 solver list (pad_align, crop_paste, downsample, etc.)
187
+ 5. ✅ Expert review #2 — 6 concrete solvers with code (extract_inner, add_border, etc.)
188
+ 6. [ ] **Task taxonomy scan** — for each Wave 1 solver, count matching unsolved tasks before building