rogermt commited on
Commit
022a14c
Β·
verified Β·
1 Parent(s): 872fabe

Move own-solver/SKILL.md to own-solver/

Browse files
Files changed (1) hide show
  1. own-solver/SKILL.md +222 -0
own-solver/SKILL.md ADDED
@@ -0,0 +1,222 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: neurogolf-solver
3
+ description: Build and improve an ONNX model generator for the NeuroGolf Championship (Kaggle). Produces 400 tiny ONNX models (opset 17, IR 8, input/output [1,10,30,30] one-hot float32) for ARC-AGI tasks. Scoring = max(1, 25 - ln(MACs + memory_bytes + params)). Lower cost = higher score. Use this skill whenever working on this competition, debugging submission failures, or starting a fresh session.
4
+ ---
5
+
6
+ # NeuroGolf Solver
7
+
8
+ ## Development Methodology: The Closed-Loop
9
+
10
+ ```
11
+ Research β†’ Design β†’ Experiment β†’ Analyze β†’ Research β†’ ...
12
+ ```
13
+
14
+ **Rule: Loop until we have a CONFIRMED increase in arc-gen validated score.**
15
+
16
+ | Phase | What | Exit Criteria |
17
+ |-------|------|---------------|
18
+ | **Research** | Read papers, understand theory, find what works in similar regimes | Have a testable hypothesis with cited evidence |
19
+ | **Design** | Write MINIMAL code to test the hypothesis | Code is <200 lines, focused on ONE feature |
20
+ | **Experiment** | Run on representative task sample (β‰₯20 tasks, or all 400 if cheap) | Full arc-gen validation completed |
21
+ | **Analyze** | Compare with/without feature. Measure: tasks solved, arc-gen survival, total score | Data shows >10% improvement in arc-gen survival rate OR total score |
22
+ | **Research** | If failed: why? Read more papers. If succeeded: can we combine with other wins? | Next hypothesis ready |
23
+
24
+ **Critical rules:**
25
+ - NEVER write >200 lines without running them first
26
+ - NEVER claim a feature "works" until arc-gen validated on β‰₯20 tasks
27
+ - NEVER upload code to repo that hasn't been validated
28
+ - Theory from papers is NOT proof for our data β€” always test
29
+ - If a feature shows no improvement after testing, DELETE it β€” don't leave dead code
30
+ - Make surgical edits to individual files β€” NEVER rewrite the entire codebase in one shot
31
+
32
+ ## Quick Reference
33
+
34
+ - **Repo**: `rogermt/neurogolf-solver`
35
+ - **Current version**: v5.2 β€” 52 solved, ~710 score, est LB ~1058
36
+ - **Previous best on Kaggle**: v4.3 β€” 50 arc-gen-validated tasks, est LB ~670
37
+ - **Kaggle runtime**: 12 hours for submission
38
+ - **Target**: 3000+ LB (our own solver, no blending)
39
+ - **Detailed history, mistakes, analysis**: see `LEARNING.md`
40
+ - **Roadmap & experiment queue**: see `TODO.md`
41
+
42
+ ## 1. Competition Rules
43
+
44
+ | Item | Value |
45
+ |------|-------|
46
+ | Input/Output | `"input"`/`"output"` float32 `[1,10,30,30]` one-hot |
47
+ | Opset | 17 (IR 8). Opset 10 also accepted on Kaggle |
48
+ | **Max .onnx file size** | **1.44 MB per ONNX file** (not submission zip) |
49
+ | Static shapes | **All tensors and parameters must have statically-defined shapes** |
50
+ | Banned ops | **Loop, Scan, NonZero, Unique, Script, Function** |
51
+ | Scoring | `max(1.0, 25.0 - ln(MACs + memory + params))` per task |
52
+ | Tasks | **All 400 count. There are NO excluded tasks. Unsolved = 1.0 pt.** |
53
+ | Validation | Models checked against **train + test + arc-gen** (ALL splits) |
54
+ | Submission | `submission.zip` with `task001.onnx`–`task400.onnx` + optional `submission.csv` |
55
+
56
+ ## 2. ARC-GEN Data β€” THE Critical Factor
57
+
58
+ **A model that passes train+test but fails arc-gen scores ZERO on Kaggle.**
59
+
60
+ - Kaggle tasks at `/kaggle/input/competitions/neurogolf-2026/taskNNN.json` contain `{"train":[], "test":[], "arc-gen":[]}`
61
+ - Up to 262 arc-gen examples per task (100K total)
62
+ - Locally: ARC-GEN in `ARC-GEN-100K/{hex_id}.json` as list of `{input, output}` β€” merge into task data
63
+ - Conv fitting: include arc-gen examples **only when grid sizes match** train/test (otherwise lstsq fails)
64
+ - Validation: always check against `arc-gen[:30]` minimum
65
+
66
+ ## 3. Architecture
67
+
68
+ ### Package Structure (v5.2)
69
+ ```
70
+ neurogolf_solver/
71
+ β”œβ”€β”€ constants.py # Grid dims, opset, limits (NO excluded tasks)
72
+ β”œβ”€β”€ config.py # Runtime providers, opset factory
73
+ β”œβ”€β”€ data_loader.py # Task loading, one-hot, example extraction
74
+ β”œβ”€β”€ validators.py # Model validation against all splits
75
+ β”œβ”€β”€ profiler.py # Static cost profiler (onnx_tool fallback)
76
+ β”œβ”€β”€ onnx_helpers.py # Opset 17 builders: Slice, Pad, ReduceSum, mk()
77
+ β”œβ”€β”€ gather_helpers.py # Gather-based spatial remapping models
78
+ β”œβ”€β”€ submission.py # run_tasks (W&B logging), zip/csv generation
79
+ β”œβ”€β”€ main.py # Entry point with argparse
80
+ └── solvers/
81
+ β”œβ”€β”€ analytical.py # identity, constant, color_map, transpose
82
+ β”œβ”€β”€ geometric.py # flip, rotate, shift, crop, gravity (detect only)
83
+ β”œβ”€β”€ tiling.py # tile, upscale, mirror, concat, spatial_gather
84
+ β”œβ”€β”€ conv.py # lstsq conv (fixed, variable, diffshape, var_diff) + PCR fallback
85
+ β”œβ”€β”€ gravity.py # Unrolled bubble-sort gravity (Conv+Where, 4 dirs) β€” Task 78
86
+ β”œβ”€β”€ edge.py # Laplacian edge detection (0 matches currently)
87
+ β”œβ”€β”€ mode.py # Mode fill (ReduceSumβ†’ArgMaxβ†’Expand) β€” Task 129
88
+ └── solver_registry.py # ANALYTICAL_SOLVERS list + solve_task()
89
+ ```
90
+
91
+ Run with: `python -m neurogolf_solver.main [args]`
92
+
93
+ ### Solver Pipeline
94
+ ```
95
+ 1. Analytical solvers (instant, zero/low cost, always arc-gen safe):
96
+ identity β†’ constant β†’ color_map β†’ transpose β†’ flip β†’ rotate β†’
97
+ shift β†’ tile β†’ upscale β†’ kronecker β†’ nonuniform_scale β†’
98
+ mirror_h β†’ mirror_v β†’ quad_mirror β†’ concat β†’ concat_enhanced β†’
99
+ diagonal_tile β†’ fixed_crop β†’ spatial_gather β†’ varshape_spatial_gather β†’
100
+ gravity_unrolled β†’ edge_detect β†’ mode_fill
101
+
102
+ 2. Conv solvers (lstsq fitted, validated against arc-gen, PCR fallback):
103
+ conv_fixed — Slice→Conv→ArgMax→Equal+Cast→Pad
104
+ conv_variable — Conv(30×30)→ArgMax→Equal+Cast→Mul(mask)
105
+ conv_diffshape — Slice→Conv→Slice(crop)→ArgMax→Equal+Cast→Pad
106
+ conv_var_diff — Conv(30×30)→ArgMax→Equal+Cast→Mul(input_mask)
107
+ ```
108
+
109
+ ### ONNX Building Rules (opset 17)
110
+ - **All shapes must be static** β€” no dynamic dimensions
111
+ - **Max 1.44 MB per .onnx file** β€” checked by Kaggle validator
112
+ - **Slice(step=-1)** for flip/rotate β€” zero MACs, replaces Gather for these transforms
113
+ - **Gather** (opset 1) for spatial remapping β€” used by concat, spatial_gather, mirrors, etc.
114
+ - **NEVER** use GatherElements (opset 11)
115
+ - **Equal+Cast** for one-hot β€” NEVER use OneHot (no CUDA kernel)
116
+ - **Channel Gather** for permutation color maps (0 MACs, score ~21 vs ~13 for Conv 1Γ—1)
117
+ - **Conv 1Γ—1** for non-permutation color maps (has MACs but correct)
118
+ - **ReduceSum** with axes as **tensor input** (opset 13+ requirement)
119
+ - **Pad** with tensor-based `pads` input (opset 11+ requirement)
120
+ - **lstsq calls** must be wrapped in `try/except (LinAlgError, ValueError)` β€” SVD can fail to converge
121
+ - **ArgMax + Equal+Cast** before Pad to ensure clean one-hot in padded region (gravity solver lesson)
122
+
123
+ ### Conv Fitting
124
+
125
+ **Conv ceiling: ~25 tasks.** Regularization (Ridge, PCA/SVD, skip-ks) all tested and rejected.
126
+ Root cause: architecture mismatch β€” most unsolved tasks need non-local ops, not local conv patches.
127
+
128
+ **Current fitting strategy (v5.1+):**
129
+ - Composable primitives: `_build_patch_matrix` + `_solve_weights` + `_extract_weights`
130
+ - PCR fallback via `_solve_weights_pcr` (deferred 2nd pass, 0 new solves but no regressions)
131
+ - Kernel sizes: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]
132
+ - Try no-bias first, then bias
133
+ - lstsq wrapped in try/except for SVD non-convergence
134
+ - **Validate against arc-gen BEFORE accepting** β€” reject if fails
135
+
136
+ ### New Solver Architectures (v5.2)
137
+
138
+ **gravity.py** β€” Unrolled bubble-sort via Conv+Where
139
+ - 4 directions Γ— 10 bg colors, max(IH,IW) steps
140
+ - Per step: 2Γ— Conv(3Γ—3 shift), 3Γ— ReduceSum, 3Γ— Greater, 2Γ— And, 2Γ— Where
141
+ - Final: ArgMax + Equal+Cast + Pad (clean one-hot)
142
+ - Cost: ~16M (10Γ—10 grid), score ~8.4
143
+ - **Validated: Task 78 (direction=up, bg=0)**
144
+
145
+ **edge.py** β€” Laplacian conv boundary detection
146
+ - Conv 1Γ—1 (channel collapse) β†’ Conv 3Γ—3 (Laplacian) β†’ Abs β†’ Greater β†’ And β†’ Where
147
+ - Cost: ~16K MACs, score ~15
148
+ - **0 matches currently** β€” edge definition may be too strict
149
+
150
+ **mode.py** β€” Global majority color fill
151
+ - Slice β†’ ReduceSum(axes=[2,3]) β†’ ArgMax β†’ Equal+Cast β†’ Expand β†’ Pad
152
+ - Cost: ~2K, score ~19.5
153
+ - **Validated: Task 129**
154
+
155
+ ## 4. Performance
156
+
157
+ **The lstsq conv solver is the speed bottleneck.** Use `--conv_budget` to cap time per task (5s locally, 60s on Kaggle).
158
+
159
+ **Do NOT** try to GPU-accelerate lstsq. The bottleneck is algorithmic (O(nΒ³) SVD), not device.
160
+
161
+ ## 5. Score Accounting (v5.2)
162
+
163
+ | Category | Tasks | Avg Score | Notes |
164
+ |----------|-------|-----------|-------|
165
+ | Analytical | 24 | ~16 | identity, constant, color_map, transpose, flip, rotate, shift, tile, mirrors, etc. |
166
+ | Conv (lstsq) | 25 | ~10.5 | conv_fixed, conv_var, conv_diff, conv_var_diff |
167
+ | Gravity | 1 | 8.4 | Task 78 |
168
+ | Mode fill | 1 | 19.5 | Task 129 |
169
+ | Timing artifact | 1 | 8.2 | Task 61 (conv_var, only on slow hardware) |
170
+ | **Unsolved** | **348** | **1.0** | Minimum score |
171
+ | **Total** | **52/400** | | **~710 solved + 348 = ~1058 est LB** |
172
+
173
+ ### Path to 3000+
174
+ 1. βœ… ARC-GEN validation (v4)
175
+ 2. βœ… New analytical solvers (v4)
176
+ 3. βœ… Opset 17 Slice-based transforms (v5)
177
+ 4. βœ… lstsq crash fix + modular package (v5)
178
+ 5. βœ… PCR fallback in conv (v5.1 β€” 0 new solves but clean code)
179
+ 6. βœ… Gravity solver (v5.2 β€” Task 78)
180
+ 7. βœ… Mode fill solver (v5.2 β€” Task 129)
181
+ 8. πŸ”² **Phase 3 solvers**: flood fill, composition, color LUT, CumSum β€” see TODO.md
182
+ 9. πŸ”² **Phase 1a**: Opset 17 conversions for existing analytical tasks (score optimization)
183
+ 10. πŸ”² **Phase 4**: ONNX optimizer, best-of-N selection
184
+
185
+ **Blending is EXPLICITLY excluded** β€” user's competitive philosophy.
186
+
187
+ ## 6. Submission Checklist
188
+
189
+ Before submitting to Kaggle:
190
+ - [ ] All models validated against train + test + arc-gen (locally)
191
+ - [ ] **All 400 tasks attempted** (no exclusions)
192
+ - [ ] No GatherElements in any model
193
+ - [ ] No banned ops (Loop, Scan, NonZero, Unique, Script, Function)
194
+ - [ ] All tensor shapes are static
195
+ - [ ] **Each .onnx file < 1.44 MB**
196
+ - [ ] Local estimated score calculated and compared to expected LB
197
+ - [ ] **A/B test**: ran both old and new solver on same tasks, new solver scores higher
198
+
199
+ ## 7. Files & Locations
200
+
201
+ | Location | Path | Notes |
202
+ |----------|------|-------|
203
+ | HF Repo | `rogermt/neurogolf-solver` | All code + data |
204
+ | **Solver package** | `neurogolf_solver/` | **v5.2 β€” 19 files, modular** |
205
+ | Legacy monolith | `neurogolf_solver.py` | v4, kept for reference β€” do not edit |
206
+ | Official utils | `neurogolf_utils.py` | Kaggle scoring lib (needs onnx_tool) |
207
+ | ARC-GEN data | `ARC-GEN-100K.zip` | 400 files, 100K examples |
208
+ | Notebooks | `neurogolf-2026-solver-notebooks.zip` | 5 reference notebooks |
209
+ | Kaggle data | `/kaggle/input/competitions/neurogolf-2026/` | task JSONs with arc-gen |
210
+ | Roadmap | `TODO.md` | Experiment queue with status key |
211
+ | Learning | `LEARNING.md` | Knowledge accumulation β€” read before coding |
212
+
213
+ ## 8. LEARNING.md Maintenance Rules
214
+
215
+ `LEARNING.md` is the knowledge accumulation file. Update it when:
216
+ - A bug is found and fixed β€” add to Mistakes Log with root cause
217
+ - A new approach is tried β€” record what worked, what didn't, and why
218
+ - Competition analysis reveals new insights β€” add to Competitive Intelligence
219
+ - Version milestones β€” update the Version History table
220
+ - Performance measurements β€” add concrete numbers
221
+
222
+ Structure: chronological within sections, newest entries first. Always include dates and version numbers.