rogermt commited on
Commit
260153f
Β·
verified Β·
1 Parent(s): ae0427a

Split SKILL.md (rules/quick-ref) + LEARNING.md (history/mistakes/analysis)

Browse files
Files changed (1) hide show
  1. SKILL.md +109 -292
SKILL.md CHANGED
@@ -3,310 +3,127 @@ name: neurogolf-solver
3
  description: Build and improve an ONNX model generator for the NeuroGolf Championship (Kaggle). Produces 400 tiny ONNX models (opset 10, IR 10, input/output [1,10,30,30] one-hot float32) for ARC-AGI tasks. Scoring = max(1, 25 - ln(MACs + memory_bytes + params)). Lower cost = higher score. Use this skill whenever working on this competition, debugging submission failures, or starting a fresh session.
4
  ---
5
 
6
- # NeuroGolf Solver β€” Complete Knowledge Base
7
 
8
- ## 1. Competition Format
9
 
10
- ### What is NeuroGolf?
11
- IJCAI-ECAI 2026 NeuroGolf Challenge on Kaggle. You build 400 tiny ONNX neural networks, one per ARC-AGI task. Each network transforms a one-hot encoded grid to another grid. Scoring rewards small, efficient networks.
 
 
 
12
 
13
- ### ONNX Model Spec
14
- - **Input**: `"input"` float32 `[1, 10, 30, 30]` β€” one-hot encoded grid (10 color channels, 30Γ—30 spatial)
15
- - **Output**: `"output"` float32 `[1, 10, 30, 30]` β€” same format
16
- - **Opset**: 10, IR version: 10 (but opset 17 ALSO works on Kaggle β€” see Β§3)
17
- - **Max file size**: 1.44 MB per model (floppy disk limit)
18
- - **Banned ops**: Loop, Scan, NonZero, Unique, Script, Function
19
 
20
- ### Scoring Formula
21
- ```
22
- score_per_task = max(1.0, 25.0 - ln(MACs + memory_bytes + params))
23
- total_score = sum(score_per_task for all 400 tasks)
24
- ```
25
- - Unsolved tasks score 1.0 (not 0!)
26
- - Max possible per task: 25.0 (cost=0, e.g. Identity)
27
- - **Excluded tasks**: {21, 55, 80, 184, 202, 366} β€” officially excluded, score 0 regardless
28
-
29
- ### Submission Format
30
- - `submission.zip` containing `task001.onnx` through `task400.onnx`
31
- - Models must pass validation against ALL examples: **train + test + arc-gen**
32
- - Optional: `submission.csv` with columns `task_id, total_cost`
33
-
34
- ### ARC-GEN Data (CRITICAL)
35
- On Kaggle, each task JSON at `/kaggle/input/competitions/neurogolf-2026/taskNNN.json` contains:
36
- ```json
37
- {"train": [...], "test": [...], "arc-gen": [...]}
38
- ```
39
- The `arc-gen` key has **up to 262 additional examples per task** (100K total across 400 tasks) generated by Google's ARC-GEN system. **Models are validated against ALL splits including arc-gen.** A model that passes train+test but fails arc-gen scores ZERO on Kaggle.
40
-
41
- Locally, ARC-GEN data is in separate files at `ARC-GEN-100K/{hex_id}.json` as a list of `{input, output}` dicts. Must be merged with the ARC-AGI task data.
42
-
43
- ## 2. Current State (v3 β†’ v4 in progress)
44
-
45
- ### v3 Results: 307/400 solved locally, LB score ~501 (NOT ~3267)
46
- The massive gap (3267 local vs 501 LB) means **most of our conv models fail ARC-GEN validation on Kaggle**. The conv is fitted on ~6 train+test examples but must generalize to ~250 arc-gen examples of varying sizes. Many don't.
47
-
48
- ### Solver Breakdown (v3)
49
- ```
50
- conv_var: 125, conv_fixed: 107, conv_diff: 39, spatial_gather: 16,
51
- concat: 5, color_map: 4, concat_enhanced: 4, rotate: 3,
52
- transpose: 2, upscale: 1, varshape_spatial_gather: 1
53
- ```
54
-
55
- ### Repository
56
- - HF: `rogermt/neurogolf-solver`
57
- - Files: `neurogolf_solver.py`, `neurogolf_utils.py` (official Kaggle utils), `ARC-GEN-100K.zip`, `neurogolf-2026-solver-notebooks.zip`
58
-
59
- ## 3. Key Differences: Our Solver vs High-Scoring Notebooks
60
-
61
- ### The 4200-point notebook (`neurogolf-2026-tiny-onnx-solver`)
62
- This is a **BLEND notebook** β€” it does NOT solve tasks from scratch. It:
63
- 1. **Phase 1**: Loads 12+ other notebooks' `submission.zip` files as inputs
64
- 2. For each task, picks the cheapest valid model across all sources
65
- 3. **Phase 2**: Tries loose ONNX files from dataset inputs
66
- 4. **Phase 3**: Runs its own solver only on remaining unsolved tasks
67
- 5. Validates EVERYTHING against train+test+arc-gen before including
68
- 6. Result: 338/400 solved, est. score 4197.5
69
-
70
- **Critical insight**: The 4200 score comes from BLENDING many solutions, not from a single solver. The solver itself only adds 0 new tasks in Phase 3. All 338 come from other notebooks' pre-built models.
71
-
72
- ### The championship notebook (`the-2026-neurogolf-championship`)
73
- Also a blend but with its own solver. Key differences from ours:
74
- - Uses **opset 17** (not 10!) β€” works fine on Kaggle
75
- - Has **shift detector**, **gravity detector**, **mirror detectors**, **fixed crop detector**, **outline detector**
76
- - Has **composition detectors**: rotation+color, transpose+color, flip+color
77
- - Has **channel reduction**: reduces 10β†’N channels for fewer colors β†’ cheaper models
78
- - Uses **PyTorch learned conv**: multi-seed Adam training, ternary weight snapping
79
- - Uses **two-layer conv**: Conv→ReLU→Conv for complex patterns
80
- - Validates against `train + arc-gen[:30]` (capped at 30 arc-gen examples)
81
- - Result: 288 from own solver + more from blended inputs
82
-
83
- ### What they have that we don't
84
- | Feature | Them | Us |
85
- |---------|------|-----|
86
- | ARC-GEN validation | βœ… validate against arc-gen | ❌ v3 ignores arc-gen |
87
- | ARC-GEN in fitting | βœ… uses arc-gen[:3] in detectors | ❌ fits only train+test |
88
- | Opset 17 | βœ… uses freely | ❌ stuck on opset 10 |
89
- | Shift detector | βœ… | ❌ |
90
- | Gravity detector | βœ… | ❌ |
91
- | Mirror detectors | βœ… (h, v, quad) | ❌ |
92
- | Fixed crop detector | βœ… | ❌ |
93
- | Extract outline | βœ… | ❌ |
94
- | Composition (rot+color) | βœ… | ❌ |
95
- | Channel reduction | βœ… (fewer channels = cheaper) | ❌ |
96
- | PyTorch learned conv | βœ… (multi-seed, ternary snap) | ❌ (lstsq only) |
97
- | Two-layer conv | βœ… (Convβ†’ReLUβ†’Conv) | ❌ |
98
- | Blend from other notebooks | βœ… (12+ sources) | ❌ |
99
-
100
- ## 4. The Submission Score Gap Problem
101
-
102
- ### Why LB = 501 when local = 3267
103
- Our 307 solved tasks generate ONNX models locally. But on Kaggle:
104
- 1. Models are validated against `train + test + arc-gen` (all splits)
105
- 2. Conv models fitted on 6 examples often fail on 250+ arc-gen examples
106
- 3. Failed models score 0 (not even the 1.0 minimum)
107
- 4. Likely only ~40-50 of our 307 models actually pass on Kaggle
108
-
109
- ### The fix priority
110
- 1. **Validate locally against arc-gen** before submitting β€” only include models that pass
111
- 2. **Include arc-gen examples in conv fitting** β€” more data = better generalization
112
- 3. **Add more analytical solvers** (shift, mirror, gravity, crop) β€” these always generalize
113
- 4. **Try opset 17** β€” unlocks more ops, may work fine on Kaggle
114
-
115
- ## 5. Architecture & Code Structure
116
-
117
- ### `neurogolf_solver.py` structure
118
- ```
119
- Constants: BATCH=1, CH=10, GH=GW=30
120
- EXCLUDED_TASKS = {21, 55, 80, 184, 202, 366}
121
-
122
- load_tasks_dir(data_dir, arcgen_dir) # Load + merge ARC-GEN
123
- to_onehot(grid) # Grid β†’ [1,10,30,30]
124
- validate(path, td) # Check model on ALL splits
125
- score_network(path) # MACs + memory + params
126
-
127
- Analytical Solvers (priority order):
128
- identity β†’ constant β†’ color_map β†’ transpose β†’ flip β†’ rotate β†’
129
- tile β†’ upscale β†’ kronecker β†’ concat β†’ concat_enhanced β†’
130
- diagonal_tile β†’ spatial_gather β†’ varshape_spatial_gather
131
-
132
- Conv Solvers:
133
- solve_conv_fixed() — Fixed same-shape: Slice→Conv→ArgMax→Equal+Cast→Pad
134
- solve_conv_variable() — Variable same-shape: Conv(30×30)→ArgMax→Equal+Cast→Mul(mask)
135
- solve_conv_diffshape()β€” Fixed diff-shape (output≀input)
136
- solve_conv_var_diff() β€” Variable diff-shape (output≀input)
137
-
138
- Main: solve_task() β†’ run_tasks() β†’ generate submission.zip + submission.csv
139
- ```
140
-
141
- ### ONNX Building Patterns (opset 10)
142
- ```python
143
- # Model skeleton
144
- def mk(nodes, inits=None):
145
- x = helper.make_tensor_value_info("input", DT, [1,10,30,30])
146
- y = helper.make_tensor_value_info("output", DT, [1,10,30,30])
147
- g = helper.make_graph(nodes, "g", [x], [y], initializer=inits or [])
148
- return helper.make_model(g, ir_version=10, opset_imports=[helper.make_opsetid("", 10)])
149
-
150
- # One-hot via Equal+Cast (NOT OneHot β€” has CUDA issues)
151
- classes = np.arange(10).reshape(1,10,1,1)
152
- Equal(argmax_output, classes) β†’ Cast(to=FLOAT)
153
-
154
- # Spatial remap via Gather (NOT GatherElements β€” requires opset 11!)
155
- Reshape([1,10,30,30] β†’ [1,10,900]) β†’ Gather(axis=2, indices=[900]) β†’ Reshape back
156
 
157
- # Conv pattern
158
- Conv(input, W, kernel_shape=[ks,ks], pads=[pad]*4) β†’ ArgMax β†’ Equal+Cast β†’ Mul(mask)
159
 
160
- # Mask for variable-shape: ReduceSum(input, axes=[1], keepdims=1) gives 1 where content exists
161
- ```
162
 
163
- ### Critical Op Compatibility
164
- | Op | Opset Required | Notes |
165
- |----|---------------|-------|
166
- | Gather | 1 | βœ… Safe. Use axis=2 on flattened [1,10,900] |
167
- | GatherElements | 11 | ❌ DO NOT USE with opset 10. Will fail on ORT 1.25+ |
168
- | OneHot | 9 | ⚠️ No CUDA kernel. Use Equal+Cast instead |
169
- | Conv | 1 | βœ… Safe |
170
- | ArgMax | 1 | βœ… Safe |
171
- | ReduceSum | 1 | βœ… Safe |
172
- | Pad | 2 (opset 10 syntax) | βœ… Use `pads` attribute for opset 10 |
173
- | Slice | 10 | βœ… With starts/ends as inputs |
174
- | Tile | 6 | βœ… Safe |
175
- | ScatterElements | 11 | ⚠️ Requires opset 11+ |
176
 
177
- ## 6. Conv Fitting: lstsq vs PyTorch
178
 
179
- ### Current: lstsq (single-layer, closed-form)
180
- ```python
181
- patches = [] # [N, 10*ks*ks] feature vectors
182
- targets = [] # [N] integer class labels
183
- P, T_oh = build_from_examples(exs)
184
- WT = np.linalg.lstsq(P, T_oh)[0] # Closed-form optimal weights
185
- if np.argmax(P @ WT, 1) == T: SUCCESS # Perfect fit check
186
  ```
187
- - Fast, deterministic, optimal for linear case
188
- - FAILS when: pattern is nonlinear, too few examples, kernel too small
189
-
190
- ### Needed: PyTorch gradient descent (multi-layer)
191
- ```python
192
- class TinyARC(nn.Module):
193
- def __init__(self, hidden=32, ks=5):
194
- self.conv1 = nn.Conv2d(10, hidden, ks, padding=ks//2)
195
- self.conv2 = nn.Conv2d(hidden, 10, ks, padding=ks//2)
196
- def forward(self, x):
197
- return self.conv2(torch.relu(self.conv1(x)))
198
-
199
- # Train with MSE or cross-entropy, export with torch.onnx.export(model, dummy, path, opset_version=10)
200
- # Then add argmax+equal+cast+mask post-processing in ONNX manually
201
  ```
202
- - Can fit nonlinear patterns lstsq can't
203
- - Multi-seed training (0, 7, 42) for robustness
204
- - Ternary weight snapping: round weights to {-1, 0, 1} for smaller models
205
-
206
- ### ARC-GEN for conv fitting
207
- The conv MUST generalize to arc-gen examples. Two approaches:
208
- 1. **Include arc-gen in fitting data** β€” use `train + test + arc-gen[:20]` for lstsq
209
- 2. **Validate against arc-gen after fitting** β€” only accept if passes all splits
210
-
211
- ## 7. Unsolved Tasks (94 in v3)
212
-
213
- ### Categories
214
- | Category | Count | Why Unsolved |
215
- |----------|-------|-------------|
216
- | Variable diff-shape (output smaller) | ~60 | Output shape depends on input content |
217
- | Variable diff-shape (output larger) | ~17 | Same problem |
218
- | Same-shape, complex pattern | ~10 | Need larger kernels or multi-layer |
219
- | Fixed diff-shape, output larger | ~7 | Input-content-dependent patterns |
220
-
221
- ### Fundamental Blocker
222
- Variable-shape tasks where output size depends on input CONTENT cannot be solved with a static ONNX graph. The only workaround: conv learns to put valid content in the right region, masked by input-derived spatial mask.
223
-
224
- ## 8. Mistakes Log (DO NOT REPEAT)
225
-
226
- ### GatherElements (opset 11) β€” Fixed in v3
227
- `GatherElements` requires opset 11. Works on Kaggle's old ORT but fails on ORT 1.25+. Replaced with `Gather` (opset 1) using 1D indices on flattened spatial dim.
228
-
229
- ### s_flip still used GatherElements β€” Fixed in v4
230
- The `s_flip` solver was still using `GatherElements`. Must use `_build_gather_model()` instead.
231
-
232
- ### ARC-GEN not loaded β€” The #1 score killer
233
- v3 had `if 'arc-gen' in td` in validate() but never loaded arc-gen data into `td`. So validation always passed (no arc-gen to check), but Kaggle validated against arc-gen and most conv models failed.
234
-
235
- ### Conv fitted on too few examples
236
- Fitting on 6 train+test examples β†’ overfits to small sample. Must include arc-gen examples in fitting data for better generalization.
237
-
238
- ### No submission.csv
239
- Kaggle may need submission.csv alongside submission.zip.
240
-
241
- ### Wrong score_network without onnx_tool
242
- Our fallback `score_network` returned `(0, 0, 0)` instead of real costs. Need static profiler that matches Kaggle's calculation.
243
-
244
- ### Ignored EXCLUDED tasks
245
- Wasted time trying to solve tasks 21, 55, 80, 184, 202, 366 which are officially excluded.
246
-
247
- ## 9. Competitive Strategy
248
-
249
- ### Path to 4800+ LB score
250
- 1. **Fix ARC-GEN validation** β€” immediately recover ~200 points from models that actually work
251
- 2. **Add missing analytical solvers** (shift, mirror, gravity, crop, composition) β€” +20-30 tasks, ~13 points each
252
- 3. **PyTorch multi-layer conv** β€” solve 5-10 more complex same-shape tasks
253
- 4. **Channel reduction** β€” reduce cost of existing solutions by 30-50%
254
- 5. **Blend with other notebooks** β€” the 4200 notebook proves this is the meta-strategy
255
-
256
- ### Quick wins
257
- - Transpose: score=25.0 (cost=0, just permute dims) β€” already have
258
- - Identity: score=25.0 β€” already have
259
- - Color map via channel Gather: cheaper than Conv 1Γ—1 (params+nbytes only, no MACs)
260
- - Analytical solvers: ~13 points each (cost β‰ˆ 165K)
261
- - Small conv (ks=1): ~11-13 points
262
- - Large conv (ks=29): ~7 points
263
-
264
- ## 10. Data & File Locations
265
-
266
- ### On Kaggle
267
- ```
268
- /kaggle/input/competitions/neurogolf-2026/
269
- task001.json ... task400.json (with train+test+arc-gen)
270
- neurogolf_utils/neurogolf_utils.py
271
- ```
272
-
273
- ### Locally
274
- ```
275
- ARC-AGI/data/training/ # 400 hex-named .json files (train+test only)
276
- ARC-GEN-100K/ # 400 hex-named .json files (arc-gen examples)
277
- neurogolf-solver/
278
- neurogolf_solver.py # Main solver
279
- neurogolf_utils.py # Official Kaggle utils (needs onnx_tool, IPython)
280
- ```
281
-
282
- ### ARC-GEN file format
283
- ```python
284
- # ARC-GEN-100K/{hex_id}.json is a LIST of examples:
285
- [{"input": [[...]], "output": [[...]]}, ...]
286
- # Must be merged into task data as td['arc-gen'] = list_of_examples
287
- ```
288
-
289
- ### ARC-GEN GitHub generator
290
- https://github.com/google/ARC-GEN β€” Can generate MORE examples per task if needed.
291
-
292
- ## 11. Reference Notebooks (in repo as neurogolf-2026-solver-notebooks.zip)
293
-
294
- | Notebook | LB Score | Tasks | Key Technique |
295
- |----------|----------|-------|---------------|
296
- | neurogolf-2026-tiny-onnx-solver | ~4200 | 338 | Mega-blend of 12+ notebooks |
297
- | 4200-v5-neurogolf-fix | ~5700 est | 341 | Same blend, manual LLM rescue tasks |
298
- | the-2026-neurogolf-championship | ~3200 est | 288 | Own solver + blend |
299
- | neurogolf-logic-driven-ensembling | β€” | 401 | Pure ensembling from zips |
300
-
301
- ## 12. Testing Checklist
302
 
303
- Before any Kaggle submission:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
304
  - [ ] All models validated against train + test + arc-gen (locally)
305
  - [ ] EXCLUDED tasks {21,55,80,184,202,366} not included
306
- - [ ] No GatherElements (opset 11) in any model
307
- - [ ] No banned ops (Loop, Scan, NonZero, Unique)
308
- - [ ] Each .onnx file < 1.44 MB
309
- - [ ] submission.zip < 1.44 MB total
310
  - [ ] submission.csv generated
311
- - [ ] Local estimated score calculated with static profiler
312
- - [ ] Compared local score vs expected LB (should be close now)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  description: Build and improve an ONNX model generator for the NeuroGolf Championship (Kaggle). Produces 400 tiny ONNX models (opset 10, IR 10, input/output [1,10,30,30] one-hot float32) for ARC-AGI tasks. Scoring = max(1, 25 - ln(MACs + memory_bytes + params)). Lower cost = higher score. Use this skill whenever working on this competition, debugging submission failures, or starting a fresh session.
4
  ---
5
 
6
+ # NeuroGolf Solver
7
 
8
+ ## Quick Reference
9
 
10
+ - **Repo**: `rogermt/neurogolf-solver`
11
+ - **Current version**: v4.1 β€” 50 arc-gen-validated tasks, est LB ~670
12
+ - **Kaggle runtime**: 12 hours for submission
13
+ - **Target**: 4800+ LB (first page)
14
+ - **Detailed history, mistakes, analysis**: see `LEARNING.md`
15
 
16
+ ## 1. Competition Rules
 
 
 
 
 
17
 
18
+ | Item | Value |
19
+ |------|-------|
20
+ | Input/Output | `"input"`/`"output"` float32 `[1,10,30,30]` one-hot |
21
+ | Opset | 10 (IR 10). Opset 17 also works on Kaggle |
22
+ | Max file size | 1.44 MB per model |
23
+ | Banned ops | Loop, Scan, NonZero, Unique, Script, Function |
24
+ | Scoring | `max(1.0, 25.0 - ln(MACs + memory + params))` per task |
25
+ | Excluded tasks | {21, 55, 80, 184, 202, 366} β€” skip these |
26
+ | Validation | Models checked against **train + test + arc-gen** (ALL splits) |
27
+ | Submission | `submission.zip` with `task001.onnx`–`task400.onnx` + optional `submission.csv` |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
+ ## 2. ARC-GEN Data β€” THE Critical Factor
 
30
 
31
+ **A model that passes train+test but fails arc-gen scores ZERO on Kaggle.**
 
32
 
33
+ - Kaggle tasks at `/kaggle/input/competitions/neurogolf-2026/taskNNN.json` contain `{"train":[], "test":[], "arc-gen":[]}`
34
+ - Up to 262 arc-gen examples per task (100K total)
35
+ - Locally: ARC-GEN in `ARC-GEN-100K/{hex_id}.json` as list of `{input, output}` β€” merge into task data
36
+ - Conv fitting: include arc-gen examples **only when grid sizes match** train/test (otherwise lstsq fails)
37
+ - Validation: always check against `arc-gen[:30]` minimum
 
 
 
 
 
 
 
 
38
 
39
+ ## 3. Architecture
40
 
41
+ ### Solver Pipeline
 
 
 
 
 
 
42
  ```
43
+ 1. Analytical solvers (instant, zero/low cost, always arc-gen safe):
44
+ identity β†’ constant β†’ color_map β†’ transpose β†’ flip β†’ rotate β†’
45
+ shift β†’ tile β†’ upscale β†’ kronecker β†’ nonuniform_scale β†’
46
+ mirror_h β†’ mirror_v β†’ quad_mirror β†’ concat β†’ concat_enhanced β†’
47
+ diagonal_tile β†’ fixed_crop β†’ spatial_gather β†’ varshape_spatial_gather
48
+
49
+ 2. Conv solvers (lstsq fitted, validated against arc-gen):
50
+ conv_fixed — Slice→Conv→ArgMax→Equal+Cast→Pad
51
+ conv_variable — Conv(30×30)→ArgMax→Equal+Cast→Mul(mask)
52
+ conv_diffshape— Slice→Conv→Slice(crop)→ArgMax→Equal+Cast→Pad
53
+ conv_var_diff — Conv(30×30)→ArgMax→Equal+Cast→Mul(input_mask)
 
 
 
54
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
+ ### ONNX Building Rules
57
+ - **Gather** (opset 1) for spatial remapping β€” NEVER use GatherElements (opset 11)
58
+ - **Equal+Cast** for one-hot β€” NEVER use OneHot (no CUDA kernel)
59
+ - **Channel Gather** for permutation color maps (0 MACs, score ~21 vs ~13 for Conv 1Γ—1)
60
+ - **Conv 1Γ—1** for non-permutation color maps (has MACs but correct)
61
+ - **ReduceSum(input, axes=[1])** for variable-shape mask
62
+
63
+ ### Conv Fitting Strategy
64
+ - lstsq on train+test (+arc-gen when same grid size, capped at 10 examples)
65
+ - Kernel sizes: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]
66
+ - Try no-bias first, then bias
67
+ - **Validate against arc-gen BEFORE accepting** β€” reject if fails
68
+ - Bottleneck is algorithmic (O(nΒ³) SVD), NOT device β€” GPU/CuPy doesn't help, just crashes
69
+
70
+ ## 4. Performance Bottleneck
71
+
72
+ **The lstsq conv solver is the speed bottleneck.** For ks=29 on 21Γ—21 grids with 16 examples: 7056Γ—8410 matrix SVD. This is pure math cost β€” moving to GPU (CuPy) doesn't help because:
73
+ 1. Same O(nΒ³) algorithmic cost
74
+ 2. GPU memory fills up (~1GB for large matrices) and crashes
75
+ 3. Falls back to CPU anyway after CUDA error
76
+
77
+ **Do NOT** try to GPU-accelerate lstsq. Use `--conv_budget` to cap time per task (10-20s locally, 60s on Kaggle's 12hr runtime). The real win is more analytical solvers, not faster conv.
78
+
79
+ ## 5. Score Accounting (v4.1)
80
+
81
+ | Category | Tasks | Avg Score | Total |
82
+ |----------|-------|-----------|-------|
83
+ | Analytical (gather, rotate, etc.) | 25 | ~16 | ~400 |
84
+ | Conv (arc-gen validated) | 25 | ~11 | ~275 |
85
+ | Unsolved | 344 | 1.0 | 344 |
86
+ | **Estimated LB** | | | **~670** |
87
+
88
+ ### Path to 4800+
89
+ 1. βœ… ARC-GEN validation (fixed: +155 pts by eliminating 0-scoring models)
90
+ 2. βœ… New analytical solvers: shift, mirror, crop, quad_mirror (+8 tasks)
91
+ 3. βœ… Color map Gather for permutations (+15 pts)
92
+ 4. πŸ”² PyTorch multi-layer conv with ternary snap (est +20-50 tasks)
93
+ 5. πŸ”² Channel reduction (fewer colors β†’ cheaper models)
94
+ 6. πŸ”² Composition detectors: rot+color, flip+color, transpose+color
95
+ 7. πŸ”² Blend with other notebooks on Kaggle (the meta-strategy for 4000+)
96
+
97
+ ## 6. Submission Checklist
98
+
99
+ Before submitting to Kaggle:
100
  - [ ] All models validated against train + test + arc-gen (locally)
101
  - [ ] EXCLUDED tasks {21,55,80,184,202,366} not included
102
+ - [ ] No GatherElements in any model
103
+ - [ ] No banned ops
104
+ - [ ] Each .onnx < 1.44 MB, submission.zip < 1.44 MB
 
105
  - [ ] submission.csv generated
106
+ - [ ] Local estimated score calculated and compared to expected LB
107
+
108
+ ## 7. Files & Locations
109
+
110
+ | Location | Path | Notes |
111
+ |----------|------|-------|
112
+ | HF Repo | `rogermt/neurogolf-solver` | All code + data |
113
+ | Solver | `neurogolf_solver.py` | v4.1, 1270 lines |
114
+ | Official utils | `neurogolf_utils.py` | Kaggle scoring lib (needs onnx_tool) |
115
+ | ARC-GEN data | `ARC-GEN-100K.zip` | 400 files, 100K examples |
116
+ | Notebooks | `neurogolf-2026-solver-notebooks.zip` | 5 reference notebooks |
117
+ | Kaggle data | `/kaggle/input/competitions/neurogolf-2026/` | task JSONs with arc-gen |
118
+ | Local ARC data | `ARC-AGI/data/training/` | 400 hex-named JSONs |
119
+
120
+ ## 8. LEARNING.md Maintenance Rules
121
+
122
+ `LEARNING.md` is the knowledge accumulation file. Update it when:
123
+ - A bug is found and fixed β€” add to Mistakes Log with root cause
124
+ - A new approach is tried β€” record what worked, what didn't, and why
125
+ - Competition analysis reveals new insights β€” add to Competitive Intelligence
126
+ - Version milestones β€” update the Version History table
127
+ - Performance measurements β€” add concrete numbers
128
+
129
+ Structure: chronological within sections, newest entries first. Always include dates and version numbers. The goal is that a fresh agent with zero context can read LEARNING.md and understand every mistake to avoid and every technique that works.