File size: 8,025 Bytes
987c46d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
# NeuroGolf Solver — Roadmap

> Current: v5.2 · 51 Kaggle validated · LB 594.84 · Target: 3000+
> Philosophy: **Research → Design → Experiment → Analyze → Research** loop until confirmed score increase.
> Rule: **NEVER claim a feature works without full arc-gen validation on representative tasks.**
> Updated: 2026-04-27 — LB 594.84 confirmed. Phase 3 redesigned from expert review + literature.
> **All 400 tasks count. There are NO excluded tasks. Unsolved = 1.0 pt (Kaggle adds automatically).**

---

## Current Solver Breakdown (51/400 solved, LB 594.84)

| Category | Tasks | Solvers |
|----------|-------|---------|
| Conv (lstsq) | 25 | conv_fixed, conv_var, conv_diff, conv_var_diff |
| Analytical | 24 | identity, constant, color_map, transpose, flip, rotate, shift, tile, upscale, mirror, concat, spatial_gather, etc. |
| Gravity | 1 | gravity_unrolled (Task 78) |
| Mode fill | 1 | mode_fill (Task 129) |
| **Unsolved** | **349** | — |

---

## Phase 1: Score Optimization on Existing Tasks

### 1a: Opset 17 Slice-Based Analytical Solvers ⬜
> Convert Gather-based solvers to Slice(step=-1) + Transpose for ~0 MACs.

### 1b: ONNX Optimizer Pass ⬜
> `onnxoptimizer.optimize()` for dead-code elimination.

---

## Phase 2: Regularization — EXHAUSTED

> Exps 0-3 tested. Architecture mismatch, not overfitting. Conv ceiling = ~25 tasks.

---

## Phase 3: New Solver Types

> Organized by architecture type. Each solver is a separate .py file.
> **Build rule:** Scan for matches FIRST, build only what has hits, validate on arc-gen.

---

### Category A: Static Spatial Remapping (Gather/Slice/Pad)

These are cheap, zero/low-MAC solvers that use precomputed index mappings. Highest score per task. Build these first.

| # | Solver | Pattern | Key Ops | Status |
|---|--------|---------|---------|--------|
| A1 | `extract_inner` | Remove N-pixel border frame → smaller output | Gather | ⬜ |
| A2 | `add_border` | Add constant-color border → larger output | Gather+const | ⬜ |
| A3 | `pad_align` | Input pasted into larger canvas at fixed offset | Gather+const | ⬜ |
| A4 | `downsample_stride` | `out[r,c] = inp[r*sH, c*sW]` | Gather | ⬜ |
| A5 | `extract_and_tile` | Find smallest repeating unit, tile to fill output | Gather | ⬜ |
| A6 | `sparse_fill` | Each non-zero pixel becomes NxN block | Gather | ⬜ |
| A7 | `symmetry_complete` | Mirror sparse data to complete L-R or T-B symmetry | Gather | ⬜ |
| A8 | `multi_stamp` | Union of shifted copies of input at fixed offsets | Gather+Add | ⬜ |
| A9 | `affine_remap` | General integer coordinate remap: stride+offset, axis swap | Gather | ⬜ |
| A10 | `crop_paste` | Crop from input, paste at different position in output | Gather+const | ⬜ |

---

### Category B: Channel/Color Operations

Color-level transforms that work in the 10-channel one-hot space.

| # | Solver | Pattern | Key Ops | Status |
|---|--------|---------|---------|--------|
| B1 | `channel_filter` | Keep only certain colors, rest → background | Mul(mask [1,10,1,1]) | ⬜ |
| B2 | `overlay_constant` | Input + fixed pixel pattern overlaid | Add or Where + constant tensor | ⬜ |
| B3 | `fill_bg_with_mode` | Background pixels filled with dominant color, non-bg unchanged | ReduceSum→ArgMax→Where | ⬜ |
| B4 | `row_mode_fill` | Each row filled with its dominant color | ReduceSum(width)→ArgMax→Tile(width) | ⬜ |
| B5 | `col_mode_fill` | Each column filled with its dominant color | ReduceSum(height)→ArgMax→Tile(height) | ⬜ |

---

### Category C: Composition / Chaining

Chain two existing solvers. If transform(input) → intermediate, and color_map(intermediate) → output, emit one combined graph.

| # | Solver | Pattern | Key Ops | Status |
|---|--------|---------|---------|--------|
| C1 | `transform_then_recolor` | rotate/flip/transpose + color_map | Chain existing | ⬜ |
| C2 | `crop_then_transform` | fixed_crop + rotate/flip | Chain existing | ⬜ |
| C3 | `recolor_then_tile` | color_map + tile/upscale | Chain existing | ⬜ |

---

### Category D: Unrolled Propagation (Conv+Where loops)

Dynamic solvers that need N unrolled steps. Higher MAC cost (~8-12 score).

| # | Solver | Pattern | Key Ops | Status |
|---|--------|---------|---------|--------|
| D1 | `gravity_unrolled` | Directional compaction, 4 dirs × 10 bg colors | Conv+Where ×N steps | ✅ Task 78 |
| D2 | `flood_fill` | BFS: seed spreads through passable cells | Conv+Clip+Mul ×N steps | ⬜ |
| D3 | `edge_detect` | Laplacian/Sobel boundary detection | Conv(3×3)+Abs+Greater | ✅ built, 0 matches |

---

### Category E: Global Aggregation

Solvers that compute a global statistic and broadcast it.

| # | Solver | Pattern | Key Ops | Status |
|---|--------|---------|---------|--------|
| E1 | `mode_fill` | Output = solid fill of most common input color | ReduceSum→ArgMax→Expand | ✅ Task 129 |
| E2 | `cumsum_fill` | Running sums for object extent, directional filling | CumSum | ⬜ |
| E3 | `bbox_crop_pad` | Find bounding box via ReduceSum+ArgMax, crop+pad | ReduceSum→ArgMax→Slice→Pad | ⬜ |

---

### Build Order (highest expected ROI first)

**Wave 1 — Static remapping (Category A):** Cheapest to build, highest score per task, most likely to have matches. ~1 day.
1. A1 `extract_inner` + A2 `add_border` (border ops)
2. A5 `extract_and_tile` + A6 `sparse_fill` (pattern ops)
3. A3 `pad_align` + A4 `downsample_stride` (placement ops)
4. A7 `symmetry_complete` (symmetry)

**Wave 2 — Color/channel ops (Category B):** Builds on mode_fill. ~0.5 day.
5. B1 `channel_filter` + B3 `fill_bg_with_mode`
6. B4 `row_mode_fill` + B5 `col_mode_fill`

**Wave 3 — Composition (Category C):** Chains existing solvers, no new ONNX ops. ~0.5 day.
7. C1 `transform_then_recolor`

**Wave 4 — Propagation (Category D):** More complex, lower score. ~1 day.
8. D2 `flood_fill`

**Wave 5 — Global aggregation (Category E):** Needs careful design. ~1 day.
9. E2 `cumsum_fill` + E3 `bbox_crop_pad`

---

### Honest Projections

I will NOT repeat the Phase 2 mistake of projecting fantasy numbers. Here's what I know:

- **51 tasks solved today.** LB 594.84.
- **Each Wave:** Might add 2-10 tasks. Might add 0. We don't know until we scan and test.
- **The only reliable estimate:** Gravity added 1 task. Mode fill added 1 task. Edge detect added 0. Hit rate so far: ~1 new task per solver built.
- **If hit rate holds:** 20 new solvers × ~1 task each = ~20 new tasks → ~70 solved → LB ~800-900.
- **If some solvers hit 5+ tasks:** Could reach 100-120 solved → LB ~1200-1500.
- **3000+ requires a fundamentally different approach** (test-time training, learned architectures) that we're not doing.

| Scenario | Solved | Est LB | Confidence |
|----------|--------|--------|------------|
| Wave 1 only | 55-65 | 650-800 | 60% |
| Wave 1+2 | 60-75 | 750-950 | 50% |
| Wave 1+2+3 | 65-85 | 850-1100 | 40% |
| All waves | 70-120 | 900-1500 | 30% |

---

## Phase 4: Score Optimization

### 4a: Best-of-N Model Selection ⬜
### 4b: Official Scoring Alignment (onnx_tool) ⬜

---

## BLENDING — EXPLICITLY EXCLUDED

---

## Experiment Log

| Date | Experiment | Result | Decision |
|------|-----------|--------|----------|
| 2026-04-24 | v4.2 baseline | 50 arc-gen, LB ~501 | Baseline |
| 2026-04-26 | v5.0 refactor | 49 solved, ~604 score | New baseline |
| 2026-04-26 | Exp 1-3 (regularization) | 0 improvement | **EXHAUSTED** |
| 2026-04-26 | v5.2 gravity+mode | +2 tasks (78, 129) | ✅ Kept |
| 2026-04-27 | **v5.2 Kaggle submission** | **51 solved, LB 594.84** | **Current best** |

---

## Research Queue

1. ✅ CompressARC — CumMax/ReduceSum architecture
2. ✅ TRM — recursive reasoning
3. ✅ ARC Prize 2025 Tech Report
4. ✅ Expert review #1 — Phase 3 solver list (pad_align, crop_paste, downsample, etc.)
5. ✅ Expert review #2 — 6 concrete solvers with code (extract_inner, add_border, etc.)
6. [ ] **Task taxonomy scan** — for each Wave 1 solver, count matching unsolved tasks before building