Rewrite Phase 3 with research-backed blueprints + honest projections
Browse filesBased on CompressARC (2512.06104), TRM (2510.04871), NCA (2506.15746),
and ONNX opset 17 operator audit. Realistic score estimates per solver type.
Removed fake excluded tasks references throughout."
TODO.md
CHANGED
|
@@ -1,172 +1,238 @@
|
|
| 1 |
# NeuroGolf Solver β Roadmap
|
| 2 |
|
| 3 |
-
> Current: v5.1 Β· 49 arc-gen validated (budget=5s) Β· ~
|
| 4 |
> Philosophy: **Research β Design β Experiment β Analyze β Research** loop until confirmed score increase.
|
| 5 |
> Rule: **NEVER claim a feature works without full arc-gen validation on representative tasks.**
|
| 6 |
-
> Updated: 2026-04-26 β
|
|
|
|
| 7 |
|
| 8 |
---
|
| 9 |
|
| 10 |
-
##
|
| 11 |
|
| 12 |
-
|
| 13 |
-
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
- Affected solvers: s_tile, s_upscale, s_concat, s_concat_enhanced, s_kronecker, s_diagonal_tile, s_shift, s_mirror_h, s_mirror_v, s_quad_mirror, s_fixed_crop, s_spatial_gather, s_varshape_spatial_gather
|
| 19 |
-
- [ ] **Validate**: Full 400 arc-gen run. Compare analytical task count vs v4.
|
| 20 |
-
- Target: ~25 analytical tasks scoring ~25 pts each (was ~15)
|
| 21 |
-
- Accept only if >10% improvement in analytical category total score.
|
| 22 |
|
| 23 |
-
|
| 24 |
-
- [ ] **Identify actual tasks** that are rotation+recolor, flip+recolor, transpose+recolor
|
| 25 |
-
- Scan 400 tasks: apply rotate β check if color_map solves, etc.
|
| 26 |
-
- Only implement solvers for combinations that exist in dataset
|
| 27 |
-
- [ ] **Build composition solver** β chain analytical + color_map as single ONNX graph
|
| 28 |
-
- [ ] **Validate**: Full 400 arc-gen. Count new tasks solved. Accept only if >0 new tasks.
|
| 29 |
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
---
|
| 38 |
|
| 39 |
-
## Phase 2:
|
| 40 |
|
| 41 |
-
>
|
| 42 |
-
>
|
| 43 |
|
| 44 |
-
|
| 45 |
|
| 46 |
-
|
| 47 |
-
v5.1 refactored to composable primitives: `_build_patch_matrix` + `_solve_weights` + `_extract_weights`.
|
| 48 |
-
PCR (`_solve_weights_pcr`) added as deferred 2nd-pass fallback.
|
| 49 |
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
| **ks=5** | **250** | **196** | **1.27** | **β INTERPOLATION THRESHOLD** |
|
| 55 |
-
| **ks=7** | **490** | **196** | **2.50** | **β PAST THRESHOLD** |
|
| 56 |
-
| ks=11 | 1210 | 196 | 6.17 | Overparameterized |
|
| 57 |
-
| ks=29 | 8410 | 196 | 42.9 | Heavily overparameterized |
|
| 58 |
|
| 59 |
-
###
|
| 60 |
|
| 61 |
-
|
|
| 62 |
-
|-------|-------|--------------------|
|
| 63 |
-
|
|
| 64 |
-
|
|
| 65 |
-
|
|
| 66 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
-
###
|
| 71 |
-
|
| 72 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
|
| 74 |
-
|
| 75 |
-
- HURTS 2 solved tasks (322@ks5, 299@ks9), helps 0 new
|
| 76 |
|
| 77 |
-
###
|
| 78 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
|
| 80 |
-
|
| 81 |
|
| 82 |
-
|
|
|
|
| 83 |
|
| 84 |
-
**
|
| 85 |
-
| p/n regime | Tasks | PCR at 0.99 | Arc-gen impact |
|
| 86 |
-
|------------|-------|-------------|----------------|
|
| 87 |
-
| p/n < 0.5 (safe) | 17 | Mostly fits train | Already 100% ag β no improvement possible |
|
| 88 |
-
| p/n > 1.0 (danger) | 8 | 4 fail to fit train at ANY threshold | PCR removes dimensions that carry signal |
|
| 89 |
|
| 90 |
-
|
| 91 |
-
-
|
| 92 |
-
-
|
| 93 |
-
|
| 94 |
-
- Task 389: lstsq 87.2% β PCR 95.7% (still fails)
|
| 95 |
-
- Task 129: lstsq 59.6% β PCR 63.0% (still fails)
|
| 96 |
-
- Task 229: lstsq 57.0% β PCR 60.0% (still fails)
|
| 97 |
|
| 98 |
-
|
| 99 |
-
- 50 solved (vs 49 baseline) β the +1 is Task 61, a **timing artifact** (took 11.8s, not a PCR solve)
|
| 100 |
-
- **0 tasks solved via PCR path**
|
| 101 |
-
- **0 regressions** on existing 25 conv tasks
|
| 102 |
-
- Code kept: composable primitives useful for future Lasso/Ridge experiments
|
| 103 |
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
2. For tasks with p/n > 1.0: the training signal requires ALL patch dimensions to interpolate. PCA truncation removes exactly the dimensions that encode the (noisy) signal, causing train_fail.
|
| 107 |
-
3. For unsolved tasks: most (~335/345) can't be fit by ANY ks β architecture mismatch (conv can't represent the required operation). The 10 that fit have wrong arc-gen behavior because the task requires global reasoning, not local patches.
|
| 108 |
|
| 109 |
-
|
| 110 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 111 |
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
> But given that only 10/345 unsolved tasks even have lstsq fits, the ceiling is very low.
|
| 115 |
|
| 116 |
-
|
|
|
|
|
|
|
| 117 |
|
| 118 |
---
|
| 119 |
|
| 120 |
-
###
|
|
|
|
| 121 |
|
| 122 |
-
|
| 123 |
-
| Scenario | Projected | Actual |
|
| 124 |
-
|----------|-----------|--------|
|
| 125 |
-
| Exp 1 alone | 60-80 tasks | **HURT** 2 tasks |
|
| 126 |
-
| Exp 1+2+3 | 90-130 tasks | **49 tasks** (no change) |
|
| 127 |
|
| 128 |
-
**
|
| 129 |
|
| 130 |
-
|
|
|
|
|
|
|
|
|
|
| 131 |
|
| 132 |
---
|
| 133 |
|
| 134 |
-
##
|
|
|
|
| 135 |
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
- Accept if it solves β₯2 tasks that are currently unsolved.
|
| 143 |
|
| 144 |
-
|
| 145 |
-
- [ ] **Depthwise conv to detect runs of N, gap patterns** β like task096 in public notebooks
|
| 146 |
-
- Template for "count and classify" tasks
|
| 147 |
-
- [ ] **Validate**: Find tasks with run-length structure. Test detector.
|
| 148 |
-
- Accept if it solves β₯2 new tasks.
|
| 149 |
|
| 150 |
-
|
| 151 |
-
- [ ]
|
| 152 |
-
|
| 153 |
-
- [ ] **Validate**: Build 5 rescue models. Arc-gen validate. Accept if β₯3 pass.
|
| 154 |
|
| 155 |
---
|
| 156 |
|
| 157 |
-
## Phase 4: Score Optimization (est +
|
| 158 |
|
| 159 |
-
### 4a:
|
| 160 |
-
|
| 161 |
-
- Top notebooks do this; can shrink models 5-20%
|
| 162 |
-
- [ ] **Validate**: Run on all 400 models. Compare total score before/after.
|
| 163 |
-
- Accept if total score improves by >2%.
|
| 164 |
|
| 165 |
-
|
| 166 |
-
- [ ]
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 170 |
|
| 171 |
---
|
| 172 |
|
|
@@ -174,12 +240,6 @@ PCR (`_solve_weights_pcr`) added as deferred 2nd-pass fallback.
|
|
| 174 |
|
| 175 |
> **User's competitive philosophy**: "I am writing my own models no blending. This is major flaw in the competition loophole."
|
| 176 |
|
| 177 |
-
- ~~Blend pipeline~~ β **NOT DONE. Not our strategy.**
|
| 178 |
-
- ~~Upload submission.zip as Kaggle dataset~~ β **NOT DONE.**
|
| 179 |
-
- ~~Attach public datasets (24 sources)~~ β **NOT DONE.**
|
| 180 |
-
|
| 181 |
-
Competitive intelligence on blending stays in LEARNING.md "What Others Do" section only.
|
| 182 |
-
|
| 183 |
---
|
| 184 |
|
| 185 |
## Experiment Log
|
|
@@ -188,52 +248,44 @@ Competitive intelligence on blending stays in LEARNING.md "What Others Do" secti
|
|
| 188 |
|------|-----------|-------------|--------|----------|
|
| 189 |
| 2026-04-24 | v4.2 baseline | 400 | 50 arc-gen, ~670 LB | Keep as baseline |
|
| 190 |
| 2026-04-25 | v5 untested code | 10 | 3/10 FAILED arc-gen | **REVERTED** |
|
| 191 |
-
| 2026-04-26 | v5.0 refactor |
|
| 192 |
-
| 2026-04-26 | Exp
|
| 193 |
-
| 2026-04-26 | Exp
|
| 194 |
-
| 2026-04-26 | Exp
|
| 195 |
-
| 2026-04-26 | Exp 3:
|
| 196 |
-
| 2026-04-26 | Exp 3: PCA/trunc-SVD (partial) | Task 129 | **0 pass** | **[-] REJECTED for lstsq** |
|
| 197 |
-
| 2026-04-26 | **Exp 3: Full PCA/SVD** | **400 tasks** | **0 PCR solves, 0 regressions, code refactored** | **[-] REJECTED (code kept)** |
|
| 198 |
|
| 199 |
-
### CRITICAL FINDING (2026-04-26)
|
| 200 |
|
| 201 |
-
The
|
| 202 |
|
| 203 |
-
|
| 204 |
-
1. Only **10 of 345** unsolved same-shape tasks pass train-fit at any ksβ€9.
|
| 205 |
-
2. Ridge (L2) on 4 victim tasks Γ 5 alphas: **zero arc-gen passes**.
|
| 206 |
-
3. PCA/truncated-SVD on 400 tasks with thresholds {0.999, 0.99, 0.95}: **zero arc-gen validates**.
|
| 207 |
-
4. PCR improves arc-gen accuracy by 3-9% on 4 unsolved tasks β but 95.7% is the ceiling. 100% is required.
|
| 208 |
-
5. For tasks where conv IS the right solver (25 tasks), lstsq already generalizes perfectly (100% arc-gen at p/n < 0.5).
|
| 209 |
|
| 210 |
-
|
| 211 |
|
| 212 |
-
|
| 213 |
-
-
|
| 214 |
-
|
| 215 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 216 |
|
| 217 |
-
-
|
| 218 |
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
| Symbol | Meaning |
|
| 222 |
-
|--------|---------|
|
| 223 |
-
| `β¬` / `[ ]` | Not started β designed, ready to implement |
|
| 224 |
-
| `[~]` | In progress β experiment running |
|
| 225 |
-
| `[x]` | Done β validated with arc-gen on β₯20 tasks, confirmed score increase |
|
| 226 |
-
| `[!]` | Blocked β needs prerequisite or resource (e.g., GPU) |
|
| 227 |
-
| `[-]` | Rejected β tested, did not improve arc-gen survival or score |
|
| 228 |
|
| 229 |
-
## Research Queue
|
| 230 |
|
| 231 |
-
1. β
|
| 232 |
-
2. β
|
| 233 |
-
3. β
|
| 234 |
-
4. β
|
| 235 |
-
5. β
|
| 236 |
-
6. β
|
| 237 |
-
7. [ ] **
|
|
|
|
| 238 |
|
| 239 |
-
>
|
|
|
|
| 1 |
# NeuroGolf Solver β Roadmap
|
| 2 |
|
| 3 |
+
> Current: v5.1 Β· 49 arc-gen validated (budget=5s) Β· ~604 score Β· Target: 3000+
|
| 4 |
> Philosophy: **Research β Design β Experiment β Analyze β Research** loop until confirmed score increase.
|
| 5 |
> Rule: **NEVER claim a feature works without full arc-gen validation on representative tasks.**
|
| 6 |
+
> Updated: 2026-04-26 β Phase 2 (regularization) exhausted. Phase 3 redesigned from literature.
|
| 7 |
+
> **All 400 tasks count. There are NO excluded tasks.**
|
| 8 |
|
| 9 |
---
|
| 10 |
|
| 11 |
+
## Current Solver Breakdown (49/400 solved)
|
| 12 |
|
| 13 |
+
| Category | Tasks | Avg Score | Solver |
|
| 14 |
+
|----------|-------|-----------|--------|
|
| 15 |
+
| Conv (lstsq) | 25 | ~10.5 | conv_fixed, conv_var, conv_diff, conv_var_diff |
|
| 16 |
+
| Analytical | 24 | ~15.5 | identity, constant, color_map, transpose, flip, rotate, shift, tile, upscale, mirror, concat, spatial_gather, etc. |
|
| 17 |
+
| **Unsolved** | **351** | **1.0** | β |
|
| 18 |
+
| **Total** | **400** | | **~604** |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
+
The 351 unsolved tasks need fundamentally different solver architectures.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
## Phase 1: Score Optimization on Existing Tasks (est +100-200 pts)
|
| 25 |
+
|
| 26 |
+
### 1a: Opset 17 Slice-Based Analytical Solvers (~0 cost) β¬
|
| 27 |
+
> Reduce MACs on the 24 analytical tasks. Currently score ~15.5 avg, target ~20+.
|
| 28 |
+
|
| 29 |
+
- [ ] Convert Gather-based solvers to Slice(step=-1) + Transpose
|
| 30 |
+
- Affected: s_tile, s_upscale, s_concat, s_concat_enhanced, s_kronecker, s_diagonal_tile, s_shift, s_mirror_h, s_mirror_v, s_quad_mirror, s_fixed_crop, s_spatial_gather, s_varshape_spatial_gather
|
| 31 |
+
- [ ] Validate: Full 400 arc-gen. Accept if >10% score increase on analytical tasks.
|
| 32 |
+
- **Estimate:** 24 tasks Γ (+5 pts avg) = **+120 pts**
|
| 33 |
+
|
| 34 |
+
### 1b: ONNX Optimizer Pass β¬
|
| 35 |
+
- [ ] `onnxoptimizer.optimize()` with dead-code elimination
|
| 36 |
+
- [ ] Validate: Compare scores before/after on all 49 solved tasks.
|
| 37 |
+
- **Estimate:** 49 tasks Γ (+1-2 pts avg) = **+50-100 pts**
|
| 38 |
|
| 39 |
---
|
| 40 |
|
| 41 |
+
## Phase 2: Regularization β EXHAUSTED
|
| 42 |
|
| 43 |
+
> Exps 0-3 tested. Root cause is architecture mismatch, not overfitting.
|
| 44 |
+
> Conv ceiling = ~25 tasks. See Experiment Log below for full data.
|
| 45 |
|
| 46 |
+
---
|
| 47 |
|
| 48 |
+
## Phase 3: New Solver Types (the actual path to 3000+)
|
|
|
|
|
|
|
| 49 |
|
| 50 |
+
> **Research basis:** CompressARC (`2512.06104`), TRM (`2510.04871`), NCA (`2506.15746`), ONNX opset 17 operator audit.
|
| 51 |
+
> **Key insight:** ARC tasks cluster into ~8 families. Each family needs a specialized ONNX architecture. Score = max(1, 25 - ln(MACs + mem + params)), so tiny models score highest.
|
| 52 |
+
>
|
| 53 |
+
> **Honest math:** Solving 50 more tasks at ~12 pts avg = +600. Solving 100 more = +1200. To hit 3000 we need ~200 new tasks at ~12 pts avg. That's ambitious but structurally possible.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
+
### Solver Priority Table (ordered by score Γ expected tasks)
|
| 56 |
|
| 57 |
+
| # | Solver | Expected Tasks | Score | Total Pts | Complexity | Key Ops |
|
| 58 |
+
|---|--------|---------------|-------|-----------|------------|---------|
|
| 59 |
+
| 1 | **Gravity (4-dir)** | 10-20 | ~12 | 120-240 | Medium | Conv(3Γ3 shift kernel) Γ 30 unrolled steps + Where |
|
| 60 |
+
| 2 | **Flood Fill (BFS)** | 10-20 | ~12 | 120-240 | Medium | Conv(3Γ3 cross kernel) + Clip Γ 30 steps |
|
| 61 |
+
| 3 | **Edge/Boundary Detect** | 10-20 | ~13 | 130-260 | Low | Conv(Laplacian/Sobel kernel) + threshold |
|
| 62 |
+
| 4 | **Composition (transform+recolor)** | 10-15 | ~14 | 140-210 | Low | Chain existing analytical + color_map |
|
| 63 |
+
| 5 | **Mode/Majority Color** | 5-10 | ~16 | 80-160 | Low | ReduceSum β ArgMax β Expand |
|
| 64 |
+
| 6 | **Color LUT (10Γ10 MatMul)** | 10-20 | ~13 | 130-260 | Low | OneHot β MatMul(W_lut) β ArgMax, lstsq-fit W_lut |
|
| 65 |
+
| 7 | **Object Copy/Offset** | 5-15 | ~12 | 60-180 | High | ScatterND + offset detection |
|
| 66 |
+
| 8 | **CumSum Analysis** | 5-10 | ~15 | 75-150 | Medium | CumSum for running totals, object extent |
|
| 67 |
|
| 68 |
+
**Conservative total: +80-150 tasks, +850-1700 pts β est LB ~1450-2300**
|
| 69 |
+
**Optimistic total: +150-200 tasks β est LB ~2400-3000**
|
| 70 |
+
|
| 71 |
+
---
|
| 72 |
+
|
| 73 |
+
### 3a: Gravity Solver β¬ β Confidence: **70%**
|
| 74 |
+
> Directional pixel propagation. ~30 unrolled steps, 4 directions.
|
| 75 |
+
|
| 76 |
+
**ONNX Blueprint:**
|
| 77 |
+
```python
|
| 78 |
+
# Per step: pull pixel from direction, fill if empty
|
| 79 |
+
shift_k = np.zeros((1,1,3,3), dtype=np.float32)
|
| 80 |
+
shift_k[0,0,0,1] = 1.0 # gravity down: pull from row above
|
| 81 |
+
for i in range(30):
|
| 82 |
+
nodes += [
|
| 83 |
+
Conv(cur, shift_k, pads=[1,1,0,0]), # shifted copy
|
| 84 |
+
Equal(cur, zero), # is cell empty?
|
| 85 |
+
Where(is_empty, shifted, cur), # fill empty cells
|
| 86 |
+
]
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
**Fitting:** For each task, try all 4 directions. Detect "empty color" (usually 0). Validate against arc-gen.
|
| 90 |
+
**Cost:** ~240K MACs (30 steps Γ 8100 per Conv), ~4.8KB, score ~12.
|
| 91 |
+
**Implementation:** ~60 lines in `neurogolf_solver/solvers/gravity.py`
|
| 92 |
+
|
| 93 |
+
- [ ] Implement `s_gravity_unrolled(td)` for all 4 directions
|
| 94 |
+
- [ ] Detect empty color from training examples
|
| 95 |
+
- [ ] Validate on 400 tasks
|
| 96 |
+
- **Accept if:** β₯3 new tasks solved
|
| 97 |
+
|
| 98 |
+
---
|
| 99 |
|
| 100 |
+
### 3b: Flood Fill Solver β¬ β Confidence: **60%**
|
| 101 |
+
> BFS via unrolled Conv. Seeds propagate through passable cells.
|
| 102 |
+
|
| 103 |
+
**ONNX Blueprint:**
|
| 104 |
+
```python
|
| 105 |
+
# 30-step BFS. Seed starts at one color, spreads through another.
|
| 106 |
+
cross_k = np.array([[0,1,0],[1,0,1],[0,1,0]], dtype=np.float32).reshape(1,1,3,3)
|
| 107 |
+
for i in range(30):
|
| 108 |
+
nodes += [
|
| 109 |
+
Conv(cur, cross_k, pads=[1,1,1,1]), # expand frontier
|
| 110 |
+
Clip(expanded, 0, 1), # saturate
|
| 111 |
+
Mul(clipped, obstacle_mask), # block walls
|
| 112 |
+
Add(cur, masked), # accumulate
|
| 113 |
+
Clip(sum, 0, 1), # final saturate
|
| 114 |
+
]
|
| 115 |
+
```
|
| 116 |
+
|
| 117 |
+
**Fitting:** Learn seed_selector (10 weights: which input color is seed) + obstacle_selector (10 weights: which colors are passable). Fit via lstsq on training examples.
|
| 118 |
+
**Cost:** ~240K MACs, ~4.9KB, score ~12.
|
| 119 |
+
**Implementation:** ~80 lines in `neurogolf_solver/solvers/flood.py`
|
| 120 |
+
|
| 121 |
+
- [ ] Implement `s_flood_fill(td)` with parameterized seed/obstacle selection
|
| 122 |
+
- [ ] Fit selectors via lstsq
|
| 123 |
+
- [ ] Validate on 400 tasks
|
| 124 |
+
- **Accept if:** β₯2 new tasks solved
|
| 125 |
|
| 126 |
+
---
|
|
|
|
| 127 |
|
| 128 |
+
### 3c: Edge/Boundary Detection β¬ β Confidence: **75%**
|
| 129 |
+
> Laplacian/Sobel convolution to detect boundaries between colors.
|
| 130 |
+
|
| 131 |
+
**ONNX Blueprint:**
|
| 132 |
+
```python
|
| 133 |
+
# Laplacian kernel detects any color boundary
|
| 134 |
+
lap_k = np.array([[0,-1,0],[-1,4,-1],[0,-1,0]], dtype=np.float32)
|
| 135 |
+
nodes = [
|
| 136 |
+
ReduceSum(input, axes=[1]), # collapse channels to [1,1,H,W] intensity
|
| 137 |
+
Conv(intensity, lap_k, pads=[1,1,1,1]), # edge response
|
| 138 |
+
Greater(response, threshold), # binary edge map
|
| 139 |
+
Cast(binary, FLOAT), # to float
|
| 140 |
+
# Then: assign edge_color via Mul + Add
|
| 141 |
+
]
|
| 142 |
+
```
|
| 143 |
+
|
| 144 |
+
**Fitting:** Detect edge_color and background_color from training pairs. Many ARC tasks ask "draw the outline of the shape."
|
| 145 |
+
**Cost:** ~16K MACs, ~1KB, score ~15.
|
| 146 |
+
**Implementation:** ~40 lines in `neurogolf_solver/solvers/edge.py`
|
| 147 |
+
|
| 148 |
+
- [ ] Implement `s_edge_detect(td)` with Laplacian + Sobel variants
|
| 149 |
+
- [ ] Fit edge/background colors from examples
|
| 150 |
+
- [ ] Validate on 400 tasks
|
| 151 |
+
- **Accept if:** β₯2 new tasks solved
|
| 152 |
|
| 153 |
+
---
|
| 154 |
|
| 155 |
+
### 3d: Composition Detectors β¬ β Confidence: **65%**
|
| 156 |
+
> Chain existing analytical solvers: rotate+recolor, flip+recolor, etc.
|
| 157 |
|
| 158 |
+
**Approach:** For each task, try all (transform Γ color_map) pairs. If the composition matches all train+arc-gen examples, emit combined ONNX graph.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 159 |
|
| 160 |
+
- [ ] Scan 400 tasks: for each, apply all transforms, then check if color_map fixes remainder
|
| 161 |
+
- [ ] Build ONNX graph that chains transform + color_map nodes
|
| 162 |
+
- [ ] Validate on 400 tasks
|
| 163 |
+
- **Accept if:** β₯3 new tasks solved
|
|
|
|
|
|
|
|
|
|
| 164 |
|
| 165 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
|
| 167 |
+
### 3e: Mode/Majority Color Solver β¬ β Confidence: **80%**
|
| 168 |
+
> Output = most common color in input (or region).
|
|
|
|
|
|
|
| 169 |
|
| 170 |
+
**ONNX Blueprint:**
|
| 171 |
+
```python
|
| 172 |
+
# ~543 bytes, 13 params, ~10K MACs, score ~16
|
| 173 |
+
nodes = [
|
| 174 |
+
ReduceSum(input, axes=[2,3]), # sum over spatial β [1,10] histogram
|
| 175 |
+
ArgMax(hist, axis=1), # most common color index
|
| 176 |
+
# Expand to full grid, one-hot encode
|
| 177 |
+
]
|
| 178 |
+
```
|
| 179 |
|
| 180 |
+
**Fitting:** Check training pairs: does output = constant fill of mode color? Also try per-row/per-col mode.
|
| 181 |
+
**Implementation:** ~30 lines
|
|
|
|
| 182 |
|
| 183 |
+
- [ ] Implement `s_mode_color(td)` β global, per-row, per-col variants
|
| 184 |
+
- [ ] Validate on 400 tasks
|
| 185 |
+
- **Accept if:** β₯1 new task solved
|
| 186 |
|
| 187 |
---
|
| 188 |
|
| 189 |
+
### 3f: Color LUT (10Γ10 MatMul) β¬ β Confidence: **70%**
|
| 190 |
+
> General colorβcolor mapping via learned 10Γ10 weight matrix.
|
| 191 |
|
| 192 |
+
Already have `s_color_map` for permutations + Conv 1Γ1 for non-permutations. This extends to position-dependent color transforms by stacking spatial features.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 193 |
|
| 194 |
+
**Fitting:** `W_lut = lstsq(OneHot(input_pixels), OneHot(output_pixels))`
|
| 195 |
|
| 196 |
+
- [ ] Implement `s_color_lut(td)` using OneHot β MatMul β ArgMax
|
| 197 |
+
- [ ] Compare with existing color_map solver β keep if it solves additional tasks
|
| 198 |
+
- [ ] Validate on 400 tasks
|
| 199 |
+
- **Accept if:** β₯2 new tasks beyond existing color_map
|
| 200 |
|
| 201 |
---
|
| 202 |
|
| 203 |
+
### 3g: CumSum-Based Analysis β¬ β Confidence: **50%**
|
| 204 |
+
> Running sums for object extent, counting, filling. Key op from CompressARC.
|
| 205 |
|
| 206 |
+
**ONNX Blueprint:**
|
| 207 |
+
```python
|
| 208 |
+
# CumSum along axis 2 (rows) β running sum per column
|
| 209 |
+
axis_tensor = from_array(np.int64(2), 'axis')
|
| 210 |
+
nodes = [CumSum(input_channel, axis_tensor)]
|
| 211 |
+
```
|
|
|
|
| 212 |
|
| 213 |
+
**Use cases:** "Fill everything below the topmost pixel of each color", "count pixels per row", object bounding boxes.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 214 |
|
| 215 |
+
- [ ] Prototype CumSum-based solver for specific task families
|
| 216 |
+
- [ ] Validate on 400 tasks
|
| 217 |
+
- **Accept if:** β₯1 new task solved
|
|
|
|
| 218 |
|
| 219 |
---
|
| 220 |
|
| 221 |
+
## Phase 4: Score Optimization (est +50-100 pts)
|
| 222 |
|
| 223 |
+
### 4a: Best-of-N Model Selection β¬
|
| 224 |
+
> For each task, try ALL ks values + ALL solver types, keep cheapest valid model.
|
|
|
|
|
|
|
|
|
|
| 225 |
|
| 226 |
+
- [ ] Refactor `solve_task` to collect all valid candidates, pick lowest cost
|
| 227 |
+
- [ ] Validate: Compare total score before/after
|
| 228 |
+
- **Accept if:** β₯3% total score improvement
|
| 229 |
+
|
| 230 |
+
### 4b: Official Scoring Alignment β¬
|
| 231 |
+
> Use `onnx_tool` for exact cost matching with Kaggle scorer.
|
| 232 |
+
|
| 233 |
+
- [ ] Compare static profiler vs onnx_tool on all solved models
|
| 234 |
+
- [ ] Fix divergences
|
| 235 |
+
- **Accept if:** divergence <2% on all models
|
| 236 |
|
| 237 |
---
|
| 238 |
|
|
|
|
| 240 |
|
| 241 |
> **User's competitive philosophy**: "I am writing my own models no blending. This is major flaw in the competition loophole."
|
| 242 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 243 |
---
|
| 244 |
|
| 245 |
## Experiment Log
|
|
|
|
| 248 |
|------|-----------|-------------|--------|----------|
|
| 249 |
| 2026-04-24 | v4.2 baseline | 400 | 50 arc-gen, ~670 LB | Keep as baseline |
|
| 250 |
| 2026-04-25 | v5 untested code | 10 | 3/10 FAILED arc-gen | **REVERTED** |
|
| 251 |
+
| 2026-04-26 | v5.0 refactor | 400 | **49 solved, ~603.6 score, budget=5s** | New baseline |
|
| 252 |
+
| 2026-04-26 | Exp 1: Skip ks=5,7,9 | 55 | **HURTS 2 solved tasks** | **[-] REJECTED** |
|
| 253 |
+
| 2026-04-26 | Exp 2: Best-of-N | 55 | **No new solves** | **[~] NEUTRAL** |
|
| 254 |
+
| 2026-04-26 | Exp 3: Ridge reg | 4 victims | **0/4 pass arc-gen** | **[-] REJECTED** |
|
| 255 |
+
| 2026-04-26 | **Exp 3: Full PCA/SVD** | **400 tasks** | **0 PCR solves, 0 regressions** | **[-] REJECTED** |
|
|
|
|
|
|
|
| 256 |
|
| 257 |
+
### CRITICAL FINDING (2026-04-26)
|
| 258 |
|
| 259 |
+
The 351 unsolved tasks fail because **conv is the wrong architecture**, not because of bad regularization. Score improvement requires new solver types (Phase 3), not fixing conv.
|
| 260 |
|
| 261 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 262 |
|
| 263 |
+
## Realistic Projections
|
| 264 |
|
| 265 |
+
| Milestone | Solved | Score | How |
|
| 266 |
+
|-----------|--------|-------|-----|
|
| 267 |
+
| **Current** | **49** | **~604** | β |
|
| 268 |
+
| + Phase 1 (score opt) | 49 | ~750-800 | Opset 17 conversions + ONNX optimizer |
|
| 269 |
+
| + 3c edge detect | 55-65 | ~900-1000 | Laplacian/Sobel conv |
|
| 270 |
+
| + 3d composition | 60-75 | ~1000-1150 | Transform+recolor chains |
|
| 271 |
+
| + 3a gravity | 70-90 | ~1150-1400 | 4-dir unrolled Conv+Where |
|
| 272 |
+
| + 3b flood fill | 80-110 | ~1300-1700 | Unrolled BFS |
|
| 273 |
+
| + 3e-g (mode, LUT, cumsum) | 90-130 | ~1500-2000 | Various analytical |
|
| 274 |
+
| **Stretch: all Phase 3** | **130-200** | **~1800-2800** | Everything above working |
|
| 275 |
|
| 276 |
+
**3000+ requires ~200+ solved tasks.** Achievable only if most Phase 3 solvers work AND we find additional task families to target. Honest range: **1500-2500 LB.**
|
| 277 |
|
| 278 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 279 |
|
| 280 |
+
## Research Queue
|
| 281 |
|
| 282 |
+
1. β
Nakkiran 2019 β double descent (inapplicable)
|
| 283 |
+
2. β
Segert 2023 β PCA > Ridge (0/400 PCR solves)
|
| 284 |
+
3. β
CompressARC 2024 β MDL principle, CumMax/ReduceSum architecture
|
| 285 |
+
4. β
TRM 2025 β recursive reasoning, 45% ARC-AGI-1
|
| 286 |
+
5. β
NCA 2025 β cellular automata, fails at global coordination
|
| 287 |
+
6. β
ARC Prize 2025 Tech Report β competition landscape
|
| 288 |
+
7. [ ] **Task taxonomy:** Classify all 351 unsolved tasks by family β prioritize solvers
|
| 289 |
+
8. [ ] **Top Kaggle non-blending notebooks** β implementation details
|
| 290 |
|
| 291 |
+
> **Next action:** Classify the 351 unsolved tasks to validate the Phase 3 task count estimates before building anything.
|