rogermt commited on
Commit
6941e70
Β·
verified Β·
1 Parent(s): 863483e

v4.3: Update SKILL.md with closed-loop methodology, development rules, updated status

Browse files
Files changed (1) hide show
  1. SKILL.md +68 -19
SKILL.md CHANGED
@@ -1,24 +1,49 @@
1
  ---
2
  name: neurogolf-solver
3
- description: Build and improve an ONNX model generator for the NeuroGolf Championship (Kaggle). Produces 400 tiny ONNX models (opset 10, IR 10, input/output [1,10,30,30] one-hot float32) for ARC-AGI tasks. Scoring = max(1, 25 - ln(MACs + memory_bytes + params)). Lower cost = higher score. Use this skill whenever working on this competition, debugging submission failures, or starting a fresh session.
4
  ---
5
 
6
  # NeuroGolf Solver
7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ## Quick Reference
9
 
10
  - **Repo**: `rogermt/neurogolf-solver`
11
- - **Current version**: v4.1 β€” 50 arc-gen-validated tasks, est LB ~670
12
  - **Kaggle runtime**: 12 hours for submission
13
- - **Target**: 4800+ LB (first page)
14
  - **Detailed history, mistakes, analysis**: see `LEARNING.md`
 
15
 
16
  ## 1. Competition Rules
17
 
18
  | Item | Value |
19
  |------|-------|
20
  | Input/Output | `"input"`/`"output"` float32 `[1,10,30,30]` one-hot |
21
- | Opset | 10 (IR 10). Opset 17 also works on Kaggle |
22
  | Max file size | 1.44 MB per model |
23
  | Banned ops | Loop, Scan, NonZero, Unique, Script, Function |
24
  | Scoring | `max(1.0, 25.0 - ln(MACs + memory + params))` per task |
@@ -59,24 +84,43 @@ description: Build and improve an ONNX model generator for the NeuroGolf Champio
59
  - **Channel Gather** for permutation color maps (0 MACs, score ~21 vs ~13 for Conv 1Γ—1)
60
  - **Conv 1Γ—1** for non-permutation color maps (has MACs but correct)
61
  - **ReduceSum(input, axes=[1])** for variable-shape mask
 
62
 
63
- ### Conv Fitting Strategy
 
 
 
 
 
 
 
 
 
 
64
  - lstsq on train+test (+arc-gen when same grid size, capped at 10 examples)
65
  - Kernel sizes: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]
66
  - Try no-bias first, then bias
67
  - **Validate against arc-gen BEFORE accepting** β€” reject if fails
68
- - Bottleneck is algorithmic (O(nΒ³) SVD), NOT device β€” GPU/CuPy doesn't help, just crashes
69
 
70
- ## 4. Performance Bottleneck
 
 
 
71
 
72
- **The lstsq conv solver is the speed bottleneck.** For ks=29 on 21Γ—21 grids with 16 examples: 7056Γ—8410 matrix SVD. This is pure math cost β€” moving to GPU (CuPy) doesn't help because:
73
- 1. Same O(nΒ³) algorithmic cost
74
- 2. GPU memory fills up (~1GB for large matrices) and crashes
75
- 3. Falls back to CPU anyway after CUDA error
 
 
76
 
77
- **Do NOT** try to GPU-accelerate lstsq. Use `--conv_budget` to cap time per task (10-20s locally, 60s on Kaggle's 12hr runtime). The real win is more analytical solvers, not faster conv.
78
 
79
- ## 5. Score Accounting (v4.1)
 
 
 
 
80
 
81
  | Category | Tasks | Avg Score | Total |
82
  |----------|-------|-----------|-------|
@@ -85,14 +129,16 @@ description: Build and improve an ONNX model generator for the NeuroGolf Champio
85
  | Unsolved | 344 | 1.0 | 344 |
86
  | **Estimated LB** | | | **~670** |
87
 
88
- ### Path to 4800+
89
  1. βœ… ARC-GEN validation (fixed: +155 pts by eliminating 0-scoring models)
90
  2. βœ… New analytical solvers: shift, mirror, crop, quad_mirror (+8 tasks)
91
  3. βœ… Color map Gather for permutations (+15 pts)
92
- 4. πŸ”² PyTorch multi-layer conv with ternary snap (est +20-50 tasks)
93
- 5. πŸ”² Channel reduction (fewer colors β†’ cheaper models)
94
- 6. πŸ”² Composition detectors: rot+color, flip+color, transpose+color
95
- 7. πŸ”² Blend with other notebooks on Kaggle (the meta-strategy for 4000+)
 
 
96
 
97
  ## 6. Submission Checklist
98
 
@@ -104,18 +150,21 @@ Before submitting to Kaggle:
104
  - [ ] Each .onnx < 1.44 MB, submission.zip < 1.44 MB
105
  - [ ] submission.csv generated
106
  - [ ] Local estimated score calculated and compared to expected LB
 
107
 
108
  ## 7. Files & Locations
109
 
110
  | Location | Path | Notes |
111
  |----------|------|-------|
112
  | HF Repo | `rogermt/neurogolf-solver` | All code + data |
113
- | Solver | `neurogolf_solver.py` | v4.1, 1270 lines |
114
  | Official utils | `neurogolf_utils.py` | Kaggle scoring lib (needs onnx_tool) |
115
  | ARC-GEN data | `ARC-GEN-100K.zip` | 400 files, 100K examples |
116
  | Notebooks | `neurogolf-2026-solver-notebooks.zip` | 5 reference notebooks |
117
  | Kaggle data | `/kaggle/input/competitions/neurogolf-2026/` | task JSONs with arc-gen |
118
  | Local ARC data | `ARC-AGI/data/training/` | 400 hex-named JSONs |
 
 
119
 
120
  ## 8. LEARNING.md Maintenance Rules
121
 
 
1
  ---
2
  name: neurogolf-solver
3
+ description: Build and improve an ONNX model generator for the NeuroGolf Championship (Kaggle). Produces 400 tiny ONNX models (opset 10/17, IR 10, input/output [1,10,30,30] one-hot float32) for ARC-AGI tasks. Scoring = max(1, 25 - ln(MACs + memory_bytes + params)). Lower cost = higher score. Use this skill whenever working on this competition, debugging submission failures, or starting a fresh session.
4
  ---
5
 
6
  # NeuroGolf Solver
7
 
8
+ ## Development Methodology: The Closed-Loop
9
+
10
+ ```
11
+ Research β†’ Design β†’ Experiment β†’ Analyze β†’ Research β†’ ...
12
+ ```
13
+
14
+ **Rule: Loop until we have a CONFIRMED increase in arc-gen validated score.**
15
+
16
+ | Phase | What | Exit Criteria |
17
+ |-------|------|---------------|
18
+ | **Research** | Read papers, understand theory, find what works in similar regimes | Have a testable hypothesis with cited evidence |
19
+ | **Design** | Write MINIMAL code to test the hypothesis | Code is <200 lines, focused on ONE feature |
20
+ | **Experiment** | Run on representative task sample (β‰₯20 tasks, or all 400 if cheap) | Full arc-gen validation completed |
21
+ | **Analyze** | Compare with/without feature. Measure: tasks solved, arc-gen survival, total score | Data shows >10% improvement in arc-gen survival rate OR total score |
22
+ | **Research** | If failed: why? Read more papers. If succeeded: can we combine with other wins? | Next hypothesis ready |
23
+
24
+ **Critical rules:**
25
+ - NEVER write >200 lines without running them first
26
+ - NEVER claim a feature "works" until arc-gen validated on β‰₯20 tasks
27
+ - NEVER upload code to repo that hasn't been validated
28
+ - NEVER overwrite neurogolf_solver.py with unvalidated code
29
+ - Theory from papers is NOT proof for our data β€” always test
30
+ - If a feature shows no improvement after testing, DELETE it β€” don't leave dead code
31
+
32
  ## Quick Reference
33
 
34
  - **Repo**: `rogermt/neurogolf-solver`
35
+ - **Current version**: v4.3 β€” 50 arc-gen-validated tasks, est LB ~670
36
  - **Kaggle runtime**: 12 hours for submission
37
+ - **Target**: 3000+ LB (our own solver, no blending)
38
  - **Detailed history, mistakes, analysis**: see `LEARNING.md`
39
+ - **Roadmap & experiment queue**: see `TODO.md`
40
 
41
  ## 1. Competition Rules
42
 
43
  | Item | Value |
44
  |------|-------|
45
  | Input/Output | `"input"`/`"output"` float32 `[1,10,30,30]` one-hot |
46
+ | Opset | 10 (IR 10). **Opeset 17 also works on Kaggle** |
47
  | Max file size | 1.44 MB per model |
48
  | Banned ops | Loop, Scan, NonZero, Unique, Script, Function |
49
  | Scoring | `max(1.0, 25.0 - ln(MACs + memory + params))` per task |
 
84
  - **Channel Gather** for permutation color maps (0 MACs, score ~21 vs ~13 for Conv 1Γ—1)
85
  - **Conv 1Γ—1** for non-permutation color maps (has MACs but correct)
86
  - **ReduceSum(input, axes=[1])** for variable-shape mask
87
+ - **Pad** (opset 17): use tensor-based `pads` input, NOT attribute-based (opset 10 style)
88
 
89
+ ### Conv Fitting β€” THE #1 BLOCKER
90
+
91
+ **We solve 307 locally but only 50 survive arc-gen. This is CATASTROPHIC overfitting, not a hyperparameter problem.**
92
+
93
+ - Patch matrix P has n rows (patches) and p columns (10Γ—ksΒ² features)
94
+ - For ks=7 on 7Γ—7 grid: nβ‰ˆ196, p=490 β†’ underdetermined β†’ min-norm among infinite fits β†’ overfits
95
+ - For ks=7 on 21Γ—21 grid: nβ‰ˆ7056, p=490 β†’ determined, but arc-gen still fails
96
+ - **Root cause**: LOW effective rank of patch covariance (~10-40) due to few active colors β†’ noise concentrates in low-rank directions
97
+ - **Double descent**: ks=5,7,9 are at/near interpolation threshold where test error PEAKS
98
+
99
+ **Current fitting strategy (v4.2):**
100
  - lstsq on train+test (+arc-gen when same grid size, capped at 10 examples)
101
  - Kernel sizes: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]
102
  - Try no-bias first, then bias
103
  - **Validate against arc-gen BEFORE accepting** β€” reject if fails
 
104
 
105
+ **What does NOT help lstsq overfitting:**
106
+ - ❌ Ridge/LOOCV Ξ» tuning β€” theory predicts failure for low effective rank (Bartlett et al., arXiv:2306.13185)
107
+ - ❌ More arc-gen examples in lstsq β€” adding constraints to underdetermined system doesn't fix wrong model
108
+ - ❌ GPU/CuPy for lstsq β€” same O(nΒ³) cost, crashes on memory
109
 
110
+ **What MIGHT help (evidence-backed, needs testing):**
111
+ - πŸ”² Skip ks=5,7,9 β€” avoid interpolation threshold (double descent peak)
112
+ - πŸ”² PCA dimensionality reduction β€” project to top-20 components, ensure p_reduced << n
113
+ - πŸ”² Lasso (ℓ₁) instead of lstsq β€” matches sparse signal structure (arXiv:2302.00257)
114
+ - πŸ”² Gradient descent with early stopping β€” implicit regularization, don't interpolate
115
+ - πŸ”² PyTorch conv trained on arc-gen data β€” needs GPU, multi-seed, ternary snap
116
 
117
+ ## 4. Performance
118
 
119
+ **The lstsq conv solver is the speed bottleneck.** For ks=29 on 21Γ—21 grids with 16 examples: 7056Γ—8410 matrix SVD. This is pure math cost β€” moving to GPU (CuPy) doesn't help.
120
+
121
+ **Do NOT** try to GPU-accelerate lstsq. Use `--conv_budget` to cap time per task (10-20s locally, 60s on Kaggle's 12hr runtime). The real win is more analytical solvers + fixing arc-gen survival, not faster conv.
122
+
123
+ ## 5. Score Accounting (v4.2)
124
 
125
  | Category | Tasks | Avg Score | Total |
126
  |----------|-------|-----------|-------|
 
129
  | Unsolved | 344 | 1.0 | 344 |
130
  | **Estimated LB** | | | **~670** |
131
 
132
+ ### Path to 3000+
133
  1. βœ… ARC-GEN validation (fixed: +155 pts by eliminating 0-scoring models)
134
  2. βœ… New analytical solvers: shift, mirror, crop, quad_mirror (+8 tasks)
135
  3. βœ… Color map Gather for permutations (+15 pts)
136
+ 4. πŸ”² **Phase 1: Cheap wins** β€” opset 17 transforms, channel reduction, composition detectors
137
+ 5. πŸ”² **Phase 2: Fix arc-gen survival** β€” PCA, Lasso, skip bad ks, GD with early stopping
138
+ 6. πŸ”² **Phase 3: Hard tasks** β€” hash matchers, run-length detectors, LLM rescue
139
+ 7. πŸ”² **Phase 4: Score optimization** β€” ONNX optimizer, best-of-N selection
140
+
141
+ **Blending with public datasets is EXPLICITLY excluded** β€” user's competitive philosophy. See LEARNING.md "What Others Do" for market intelligence only.
142
 
143
  ## 6. Submission Checklist
144
 
 
150
  - [ ] Each .onnx < 1.44 MB, submission.zip < 1.44 MB
151
  - [ ] submission.csv generated
152
  - [ ] Local estimated score calculated and compared to expected LB
153
+ - [ ] **A/B test**: ran both old and new solver on same tasks, new solver scores higher
154
 
155
  ## 7. Files & Locations
156
 
157
  | Location | Path | Notes |
158
  |----------|------|-------|
159
  | HF Repo | `rogermt/neurogolf-solver` | All code + data |
160
+ | Solver | `neurogolf_solver.py` | v4.2 (repo has unvalidated v5 code at 1919 lines β€” needs revert or validation) |
161
  | Official utils | `neurogolf_utils.py` | Kaggle scoring lib (needs onnx_tool) |
162
  | ARC-GEN data | `ARC-GEN-100K.zip` | 400 files, 100K examples |
163
  | Notebooks | `neurogolf-2026-solver-notebooks.zip` | 5 reference notebooks |
164
  | Kaggle data | `/kaggle/input/competitions/neurogolf-2026/` | task JSONs with arc-gen |
165
  | Local ARC data | `ARC-AGI/data/training/` | 400 hex-named JSONs |
166
+ | Roadmap | `TODO.md` | Experiment queue with status key |
167
+ | Learning | `LEARNING.md` | Knowledge accumulation β€” read before coding |
168
 
169
  ## 8. LEARNING.md Maintenance Rules
170