rogermt commited on
Commit
72d0404
Β·
verified Β·
1 Parent(s): bc5d5ee

Update SKILL.md for v5 refactored package

Browse files
Files changed (1) hide show
  1. SKILL.md +67 -40
SKILL.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  name: neurogolf-solver
3
- description: Build and improve an ONNX model generator for the NeuroGolf Championship (Kaggle). Produces 400 tiny ONNX models (opset 10/17, IR 10, input/output [1,10,30,30] one-hot float32) for ARC-AGI tasks. Scoring = max(1, 25 - ln(MACs + memory_bytes + params)). Lower cost = higher score. Use this skill whenever working on this competition, debugging submission failures, or starting a fresh session.
4
  ---
5
 
6
  # NeuroGolf Solver
@@ -25,14 +25,15 @@ Research β†’ Design β†’ Experiment β†’ Analyze β†’ Research β†’ ...
25
  - NEVER write >200 lines without running them first
26
  - NEVER claim a feature "works" until arc-gen validated on β‰₯20 tasks
27
  - NEVER upload code to repo that hasn't been validated
28
- - NEVER overwrite neurogolf_solver.py with unvalidated code
29
  - Theory from papers is NOT proof for our data β€” always test
30
  - If a feature shows no improvement after testing, DELETE it β€” don't leave dead code
 
31
 
32
  ## Quick Reference
33
 
34
  - **Repo**: `rogermt/neurogolf-solver`
35
- - **Current version**: v4.3 β€” 50 arc-gen-validated tasks, est LB ~670
 
36
  - **Kaggle runtime**: 12 hours for submission
37
  - **Target**: 3000+ LB (our own solver, no blending)
38
  - **Detailed history, mistakes, analysis**: see `LEARNING.md`
@@ -43,7 +44,7 @@ Research β†’ Design β†’ Experiment β†’ Analyze β†’ Research β†’ ...
43
  | Item | Value |
44
  |------|-------|
45
  | Input/Output | `"input"`/`"output"` float32 `[1,10,30,30]` one-hot |
46
- | Opset | 10 (IR 10). **Opeset 17 also works on Kaggle** |
47
  | Max file size | 1.44 MB per model |
48
  | Banned ops | Loop, Scan, NonZero, Unique, Script, Function |
49
  | Scoring | `max(1.0, 25.0 - ln(MACs + memory + params))` per task |
@@ -63,6 +64,28 @@ Research β†’ Design β†’ Experiment β†’ Analyze β†’ Research β†’ ...
63
 
64
  ## 3. Architecture
65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
  ### Solver Pipeline
67
  ```
68
  1. Analytical solvers (instant, zero/low cost, always arc-gen safe):
@@ -78,67 +101,71 @@ Research β†’ Design β†’ Experiment β†’ Analyze β†’ Research β†’ ...
78
  conv_var_diff β€” Conv(30Γ—30)β†’ArgMaxβ†’Equal+Castβ†’Mul(input_mask)
79
  ```
80
 
81
- ### ONNX Building Rules
82
- - **Gather** (opset 1) for spatial remapping β€” NEVER use GatherElements (opset 11)
 
 
83
  - **Equal+Cast** for one-hot β€” NEVER use OneHot (no CUDA kernel)
84
  - **Channel Gather** for permutation color maps (0 MACs, score ~21 vs ~13 for Conv 1Γ—1)
85
  - **Conv 1Γ—1** for non-permutation color maps (has MACs but correct)
86
- - **ReduceSum(input, axes=[1])** for variable-shape mask
87
- - **Pad** (opset 17): use tensor-based `pads` input, NOT attribute-based (opset 10 style)
 
88
 
89
  ### Conv Fitting β€” THE #1 BLOCKER
90
 
91
- **We solve 307 locally but only 50 survive arc-gen. This is CATASTROPHIC overfitting, not a hyperparameter problem.**
92
 
93
  - Patch matrix P has n rows (patches) and p columns (10Γ—ksΒ² features)
94
- - For ks=7 on 7Γ—7 grid: nβ‰ˆ196, p=490 β†’ underdetermined β†’ min-norm among infinite fits β†’ overfits
95
- - For ks=7 on 21Γ—21 grid: nβ‰ˆ7056, p=490 β†’ determined, but arc-gen still fails
96
- - **Root cause**: LOW effective rank of patch covariance (~10-40) due to few active colors β†’ noise concentrates in low-rank directions
97
  - **Double descent**: ks=5,7,9 are at/near interpolation threshold where test error PEAKS
98
 
99
- **Current fitting strategy (v4.2):**
100
  - lstsq on train+test (+arc-gen when same grid size, capped at 10 examples)
101
  - Kernel sizes: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]
102
  - Try no-bias first, then bias
 
103
  - **Validate against arc-gen BEFORE accepting** β€” reject if fails
104
 
105
- **What does NOT help lstsq overfitting:**
106
- - ❌ Ridge/LOOCV Ξ» tuning β€” theory predicts failure for low effective rank (Bartlett et al., arXiv:2306.13185)
107
  - ❌ More arc-gen examples in lstsq β€” adding constraints to underdetermined system doesn't fix wrong model
108
  - ❌ GPU/CuPy for lstsq β€” same O(nΒ³) cost, crashes on memory
109
 
110
  **What MIGHT help (evidence-backed, needs testing):**
111
  - πŸ”² Skip ks=5,7,9 β€” avoid interpolation threshold (double descent peak)
112
  - πŸ”² PCA dimensionality reduction β€” project to top-20 components, ensure p_reduced << n
113
- - πŸ”² Lasso (ℓ₁) instead of lstsq β€” matches sparse signal structure (arXiv:2302.00257)
114
  - πŸ”² Gradient descent with early stopping β€” implicit regularization, don't interpolate
115
- - πŸ”² PyTorch conv trained on arc-gen data β€” needs GPU, multi-seed, ternary snap
116
 
117
  ## 4. Performance
118
 
119
- **The lstsq conv solver is the speed bottleneck.** For ks=29 on 21Γ—21 grids with 16 examples: 7056Γ—8410 matrix SVD. This is pure math cost β€” moving to GPU (CuPy) doesn't help.
120
 
121
- **Do NOT** try to GPU-accelerate lstsq. Use `--conv_budget` to cap time per task (10-20s locally, 60s on Kaggle's 12hr runtime). The real win is more analytical solvers + fixing arc-gen survival, not faster conv.
122
 
123
- ## 5. Score Accounting (v4.2)
124
 
125
- | Category | Tasks | Avg Score | Total |
126
- |----------|-------|-----------|-------|
127
- | Analytical (gather, rotate, etc.) | 25 | ~16 | ~400 |
128
- | Conv (arc-gen validated) | 25 | ~11 | ~275 |
129
- | Unsolved | 344 | 1.0 | 344 |
130
- | **Estimated LB** | | | **~670** |
 
131
 
132
  ### Path to 3000+
133
- 1. βœ… ARC-GEN validation (fixed: +155 pts by eliminating 0-scoring models)
134
- 2. βœ… New analytical solvers: shift, mirror, crop, quad_mirror (+8 tasks)
135
- 3. βœ… Color map Gather for permutations (+15 pts)
136
- 4. πŸ”² **Phase 1: Cheap wins** β€” opset 17 transforms, channel reduction, composition detectors
137
- 5. πŸ”² **Phase 2: Fix arc-gen survival** β€” PCA, Lasso, skip bad ks, GD with early stopping
138
- 6. πŸ”² **Phase 3: Hard tasks** β€” hash matchers, run-length detectors, LLM rescue
139
- 7. πŸ”² **Phase 4: Score optimization** β€” ONNX optimizer, best-of-N selection
140
-
141
- **Blending with public datasets is EXPLICITLY excluded** β€” user's competitive philosophy. See LEARNING.md "What Others Do" for market intelligence only.
 
 
142
 
143
  ## 6. Submission Checklist
144
 
@@ -147,8 +174,8 @@ Before submitting to Kaggle:
147
  - [ ] EXCLUDED tasks {21,55,80,184,202,366} not included
148
  - [ ] No GatherElements in any model
149
  - [ ] No banned ops
150
- - [ ] Each .onnx < 1.44 MB, submission.zip < 1.44 MB
151
- - [ ] submission.csv generated
152
  - [ ] Local estimated score calculated and compared to expected LB
153
  - [ ] **A/B test**: ran both old and new solver on same tasks, new solver scores higher
154
 
@@ -157,12 +184,12 @@ Before submitting to Kaggle:
157
  | Location | Path | Notes |
158
  |----------|------|-------|
159
  | HF Repo | `rogermt/neurogolf-solver` | All code + data |
160
- | Solver | `neurogolf_solver.py` | v4.2 (repo has unvalidated v5 code at 1919 lines β€” needs revert or validation) |
 
161
  | Official utils | `neurogolf_utils.py` | Kaggle scoring lib (needs onnx_tool) |
162
  | ARC-GEN data | `ARC-GEN-100K.zip` | 400 files, 100K examples |
163
  | Notebooks | `neurogolf-2026-solver-notebooks.zip` | 5 reference notebooks |
164
  | Kaggle data | `/kaggle/input/competitions/neurogolf-2026/` | task JSONs with arc-gen |
165
- | Local ARC data | `ARC-AGI/data/training/` | 400 hex-named JSONs |
166
  | Roadmap | `TODO.md` | Experiment queue with status key |
167
  | Learning | `LEARNING.md` | Knowledge accumulation β€” read before coding |
168
 
@@ -175,4 +202,4 @@ Before submitting to Kaggle:
175
  - Version milestones β€” update the Version History table
176
  - Performance measurements β€” add concrete numbers
177
 
178
- Structure: chronological within sections, newest entries first. Always include dates and version numbers. The goal is that a fresh agent with zero context can read LEARNING.md and understand every mistake to avoid and every technique that works.
 
1
  ---
2
  name: neurogolf-solver
3
+ description: Build and improve an ONNX model generator for the NeuroGolf Championship (Kaggle). Produces 400 tiny ONNX models (opset 17, IR 8, input/output [1,10,30,30] one-hot float32) for ARC-AGI tasks. Scoring = max(1, 25 - ln(MACs + memory_bytes + params)). Lower cost = higher score. Use this skill whenever working on this competition, debugging submission failures, or starting a fresh session.
4
  ---
5
 
6
  # NeuroGolf Solver
 
25
  - NEVER write >200 lines without running them first
26
  - NEVER claim a feature "works" until arc-gen validated on β‰₯20 tasks
27
  - NEVER upload code to repo that hasn't been validated
 
28
  - Theory from papers is NOT proof for our data β€” always test
29
  - If a feature shows no improvement after testing, DELETE it β€” don't leave dead code
30
+ - Make surgical edits to individual files β€” NEVER rewrite the entire codebase in one shot
31
 
32
  ## Quick Reference
33
 
34
  - **Repo**: `rogermt/neurogolf-solver`
35
+ - **Current version**: v5 β€” refactored package, opset 17, currently running on Kaggle
36
+ - **Previous best**: v4.3 β€” 50 arc-gen-validated tasks, est LB ~670
37
  - **Kaggle runtime**: 12 hours for submission
38
  - **Target**: 3000+ LB (our own solver, no blending)
39
  - **Detailed history, mistakes, analysis**: see `LEARNING.md`
 
44
  | Item | Value |
45
  |------|-------|
46
  | Input/Output | `"input"`/`"output"` float32 `[1,10,30,30]` one-hot |
47
+ | Opset | 17 (IR 8). Opset 10 also accepted on Kaggle |
48
  | Max file size | 1.44 MB per model |
49
  | Banned ops | Loop, Scan, NonZero, Unique, Script, Function |
50
  | Scoring | `max(1.0, 25.0 - ln(MACs + memory + params))` per task |
 
64
 
65
  ## 3. Architecture
66
 
67
+ ### Package Structure (v5)
68
+ ```
69
+ neurogolf_solver/
70
+ β”œβ”€β”€ constants.py # Grid dims, opset, excluded tasks, limits
71
+ β”œβ”€β”€ config.py # Runtime providers, opset factory
72
+ β”œβ”€β”€ data_loader.py # Task loading, one-hot, example extraction
73
+ β”œβ”€β”€ validators.py # Model validation against all splits
74
+ β”œβ”€β”€ profiler.py # Static cost profiler (onnx_tool fallback)
75
+ β”œβ”€β”€ onnx_helpers.py # Opset 17 builders: Slice, Pad, ReduceSum, mk()
76
+ β”œβ”€β”€ gather_helpers.py # Gather-based spatial remapping models
77
+ β”œβ”€β”€ submission.py # run_tasks (W&B logging), zip/csv generation
78
+ β”œβ”€β”€ main.py # Entry point with argparse
79
+ └── solvers/
80
+ β”œβ”€β”€ analytical.py # identity, constant, color_map, transpose
81
+ β”œβ”€β”€ geometric.py # flip, rotate, shift, crop, gravity
82
+ β”œβ”€β”€ tiling.py # tile, upscale, mirror, concat, spatial_gather
83
+ β”œβ”€β”€ conv.py # lstsq conv (fixed, variable, diffshape, var_diff)
84
+ └── solver_registry.py # ANALYTICAL_SOLVERS list + solve_task()
85
+ ```
86
+
87
+ Run with: `python -m neurogolf_solver.main [args]`
88
+
89
  ### Solver Pipeline
90
  ```
91
  1. Analytical solvers (instant, zero/low cost, always arc-gen safe):
 
101
  conv_var_diff β€” Conv(30Γ—30)β†’ArgMaxβ†’Equal+Castβ†’Mul(input_mask)
102
  ```
103
 
104
+ ### ONNX Building Rules (opset 17)
105
+ - **Slice(step=-1)** for flip/rotate β€” zero MACs, replaces Gather for these transforms
106
+ - **Gather** (opset 1) for spatial remapping β€” used by concat, spatial_gather, mirrors, etc.
107
+ - **NEVER** use GatherElements (opset 11)
108
  - **Equal+Cast** for one-hot β€” NEVER use OneHot (no CUDA kernel)
109
  - **Channel Gather** for permutation color maps (0 MACs, score ~21 vs ~13 for Conv 1Γ—1)
110
  - **Conv 1Γ—1** for non-permutation color maps (has MACs but correct)
111
+ - **ReduceSum** with axes as **tensor input** (opset 13+ requirement)
112
+ - **Pad** with tensor-based `pads` input (opset 11+ requirement)
113
+ - **lstsq calls** must be wrapped in `try/except (LinAlgError, ValueError)` β€” SVD can fail to converge
114
 
115
  ### Conv Fitting β€” THE #1 BLOCKER
116
 
117
+ **We solve 307 locally but only ~50 survive arc-gen. This is CATASTROPHIC overfitting.**
118
 
119
  - Patch matrix P has n rows (patches) and p columns (10Γ—ksΒ² features)
120
+ - **Root cause**: LOW effective rank of patch covariance (~10-40) due to few active colors
 
 
121
  - **Double descent**: ks=5,7,9 are at/near interpolation threshold where test error PEAKS
122
 
123
+ **Current fitting strategy (v5):**
124
  - lstsq on train+test (+arc-gen when same grid size, capped at 10 examples)
125
  - Kernel sizes: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]
126
  - Try no-bias first, then bias
127
+ - lstsq wrapped in try/except for SVD non-convergence
128
  - **Validate against arc-gen BEFORE accepting** β€” reject if fails
129
 
130
+ **What does NOT help:**
131
+ - ❌ Ridge/LOOCV Ξ» tuning β€” theory predicts failure for low effective rank
132
  - ❌ More arc-gen examples in lstsq β€” adding constraints to underdetermined system doesn't fix wrong model
133
  - ❌ GPU/CuPy for lstsq β€” same O(nΒ³) cost, crashes on memory
134
 
135
  **What MIGHT help (evidence-backed, needs testing):**
136
  - πŸ”² Skip ks=5,7,9 β€” avoid interpolation threshold (double descent peak)
137
  - πŸ”² PCA dimensionality reduction β€” project to top-20 components, ensure p_reduced << n
138
+ - πŸ”² Lasso (ℓ₁) instead of lstsq β€” matches sparse signal structure
139
  - πŸ”² Gradient descent with early stopping β€” implicit regularization, don't interpolate
 
140
 
141
  ## 4. Performance
142
 
143
+ **The lstsq conv solver is the speed bottleneck.** Use `--conv_budget` to cap time per task (30s locally, 60s on Kaggle).
144
 
145
+ **Do NOT** try to GPU-accelerate lstsq. The bottleneck is algorithmic (O(nΒ³) SVD), not device.
146
 
147
+ ## 5. Score Accounting
148
 
149
+ | Category | Tasks (v4) | Avg Score | Notes |
150
+ |----------|------------|-----------|-------|
151
+ | Analytical (Slice/Gather) | ~25 | ~13-21 | v5 Slice-based should be ~20-25 |
152
+ | Conv (arc-gen validated) | ~25 | ~11 | Unchanged in v5 |
153
+ | Unsolved | ~350 | 1.0 | Minimum score |
154
+ | **v4 Est LB** | | | **~670** |
155
+ | **v5 Est LB** | | | **TBD (running)** |
156
 
157
  ### Path to 3000+
158
+ 1. βœ… ARC-GEN validation (v4: +155 pts)
159
+ 2. βœ… New analytical solvers: shift, mirror, crop, quad_mirror (v4: +8 tasks)
160
+ 3. βœ… Color map Gather for permutations (v4: +15 pts)
161
+ 4. βœ… Opset 17 Slice-based flip/rotate (v5: ~0 MACs for these transforms)
162
+ 5. βœ… Refactored to modular package (v5)
163
+ 6. βœ… lstsq crash fix β€” try/except for SVD non-convergence (v5)
164
+ 7. πŸ”² **Fix arc-gen survival** β€” PCA, Lasso, skip bad ks, GD with early stopping
165
+ 8. πŸ”² **Hard tasks** β€” hash matchers, run-length detectors, LLM rescue
166
+ 9. πŸ”² **Score optimization** β€” ONNX optimizer, best-of-N selection, channel reduction
167
+
168
+ **Blending is EXPLICITLY excluded** β€” user's competitive philosophy.
169
 
170
  ## 6. Submission Checklist
171
 
 
174
  - [ ] EXCLUDED tasks {21,55,80,184,202,366} not included
175
  - [ ] No GatherElements in any model
176
  - [ ] No banned ops
177
+ - [ ] Each .onnx < 1.44 MB
178
+ - [ ] submission.zip generated and < 1.44 MB
179
  - [ ] Local estimated score calculated and compared to expected LB
180
  - [ ] **A/B test**: ran both old and new solver on same tasks, new solver scores higher
181
 
 
184
  | Location | Path | Notes |
185
  |----------|------|-------|
186
  | HF Repo | `rogermt/neurogolf-solver` | All code + data |
187
+ | **Solver package** | `neurogolf_solver/` | **v5 β€” 16 files, modular** |
188
+ | Legacy monolith | `neurogolf_solver.py` | v4, kept for reference β€” do not edit |
189
  | Official utils | `neurogolf_utils.py` | Kaggle scoring lib (needs onnx_tool) |
190
  | ARC-GEN data | `ARC-GEN-100K.zip` | 400 files, 100K examples |
191
  | Notebooks | `neurogolf-2026-solver-notebooks.zip` | 5 reference notebooks |
192
  | Kaggle data | `/kaggle/input/competitions/neurogolf-2026/` | task JSONs with arc-gen |
 
193
  | Roadmap | `TODO.md` | Experiment queue with status key |
194
  | Learning | `LEARNING.md` | Knowledge accumulation β€” read before coding |
195
 
 
202
  - Version milestones β€” update the Version History table
203
  - Performance measurements β€” add concrete numbers
204
 
205
+ Structure: chronological within sections, newest entries first. Always include dates and version numbers.