rogermt commited on
Commit
4041952
Β·
verified Β·
1 Parent(s): fa9d7c5

Update SKILL.md: v5.2 structure, new solvers, no excluded tasks, current scores

Browse files
Files changed (1) hide show
  1. SKILL.md +65 -51
SKILL.md CHANGED
@@ -32,8 +32,8 @@ Research β†’ Design β†’ Experiment β†’ Analyze β†’ Research β†’ ...
32
  ## Quick Reference
33
 
34
  - **Repo**: `rogermt/neurogolf-solver`
35
- - **Current version**: v5 β€” refactored package, opset 17, currently running on Kaggle
36
- - **Previous best**: v4.3 β€” 50 arc-gen-validated tasks, est LB ~670
37
  - **Kaggle runtime**: 12 hours for submission
38
  - **Target**: 3000+ LB (our own solver, no blending)
39
  - **Detailed history, mistakes, analysis**: see `LEARNING.md`
@@ -48,7 +48,7 @@ Research β†’ Design β†’ Experiment β†’ Analyze β†’ Research β†’ ...
48
  | Max file size | 1.44 MB per model |
49
  | Banned ops | Loop, Scan, NonZero, Unique, Script, Function |
50
  | Scoring | `max(1.0, 25.0 - ln(MACs + memory + params))` per task |
51
- | Excluded tasks | {21, 55, 80, 184, 202, 366} β€” skip these |
52
  | Validation | Models checked against **train + test + arc-gen** (ALL splits) |
53
  | Submission | `submission.zip` with `task001.onnx`–`task400.onnx` + optional `submission.csv` |
54
 
@@ -64,10 +64,10 @@ Research β†’ Design β†’ Experiment β†’ Analyze β†’ Research β†’ ...
64
 
65
  ## 3. Architecture
66
 
67
- ### Package Structure (v5)
68
  ```
69
  neurogolf_solver/
70
- β”œβ”€β”€ constants.py # Grid dims, opset, excluded tasks, limits
71
  β”œβ”€β”€ config.py # Runtime providers, opset factory
72
  β”œβ”€β”€ data_loader.py # Task loading, one-hot, example extraction
73
  β”œβ”€β”€ validators.py # Model validation against all splits
@@ -78,9 +78,12 @@ neurogolf_solver/
78
  β”œβ”€β”€ main.py # Entry point with argparse
79
  └── solvers/
80
  β”œβ”€β”€ analytical.py # identity, constant, color_map, transpose
81
- β”œβ”€β”€ geometric.py # flip, rotate, shift, crop, gravity
82
  β”œβ”€β”€ tiling.py # tile, upscale, mirror, concat, spatial_gather
83
- β”œβ”€β”€ conv.py # lstsq conv (fixed, variable, diffshape, var_diff)
 
 
 
84
  └── solver_registry.py # ANALYTICAL_SOLVERS list + solve_task()
85
  ```
86
 
@@ -92,13 +95,14 @@ Run with: `python -m neurogolf_solver.main [args]`
92
  identity β†’ constant β†’ color_map β†’ transpose β†’ flip β†’ rotate β†’
93
  shift β†’ tile β†’ upscale β†’ kronecker β†’ nonuniform_scale β†’
94
  mirror_h β†’ mirror_v β†’ quad_mirror β†’ concat β†’ concat_enhanced β†’
95
- diagonal_tile β†’ fixed_crop β†’ spatial_gather β†’ varshape_spatial_gather
96
-
97
- 2. Conv solvers (lstsq fitted, validated against arc-gen):
98
- conv_fixed — Slice→Conv→ArgMax→Equal+Cast→Pad
99
- conv_variable — Conv(30×30)→ArgMax→Equal+Cast→Mul(mask)
100
- conv_diffshape— Slice→Conv→Slice(crop)→ArgMax→Equal+Cast→Pad
101
- conv_var_diff — Conv(30×30)→ArgMax→Equal+Cast→Mul(input_mask)
 
102
  ```
103
 
104
  ### ONNX Building Rules (opset 17)
@@ -111,59 +115,69 @@ Run with: `python -m neurogolf_solver.main [args]`
111
  - **ReduceSum** with axes as **tensor input** (opset 13+ requirement)
112
  - **Pad** with tensor-based `pads` input (opset 11+ requirement)
113
  - **lstsq calls** must be wrapped in `try/except (LinAlgError, ValueError)` β€” SVD can fail to converge
 
114
 
115
- ### Conv Fitting β€” THE #1 BLOCKER
116
-
117
- **We solve 307 locally but only ~50 survive arc-gen. This is CATASTROPHIC overfitting.**
118
 
119
- - Patch matrix P has n rows (patches) and p columns (10Γ—ksΒ² features)
120
- - **Root cause**: LOW effective rank of patch covariance (~10-40) due to few active colors
121
- - **Double descent**: ks=5,7,9 are at/near interpolation threshold where test error PEAKS
122
 
123
- **Current fitting strategy (v5):**
124
- - lstsq on train+test (+arc-gen when same grid size, capped at 10 examples)
 
125
  - Kernel sizes: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]
126
  - Try no-bias first, then bias
127
  - lstsq wrapped in try/except for SVD non-convergence
128
  - **Validate against arc-gen BEFORE accepting** β€” reject if fails
129
 
130
- **What does NOT help:**
131
- - ❌ Ridge/LOOCV Ξ» tuning β€” theory predicts failure for low effective rank
132
- - ❌ More arc-gen examples in lstsq β€” adding constraints to underdetermined system doesn't fix wrong model
133
- - ❌ GPU/CuPy for lstsq β€” same O(nΒ³) cost, crashes on memory
 
 
 
 
 
 
 
 
 
134
 
135
- **What MIGHT help (evidence-backed, needs testing):**
136
- - πŸ”² Skip ks=5,7,9 β€” avoid interpolation threshold (double descent peak)
137
- - πŸ”² PCA dimensionality reduction β€” project to top-20 components, ensure p_reduced << n
138
- - πŸ”² Lasso (ℓ₁) instead of lstsq β€” matches sparse signal structure
139
- - πŸ”² Gradient descent with early stopping β€” implicit regularization, don't interpolate
140
 
141
  ## 4. Performance
142
 
143
- **The lstsq conv solver is the speed bottleneck.** Use `--conv_budget` to cap time per task (30s locally, 60s on Kaggle).
144
 
145
  **Do NOT** try to GPU-accelerate lstsq. The bottleneck is algorithmic (O(nΒ³) SVD), not device.
146
 
147
- ## 5. Score Accounting
148
 
149
- | Category | Tasks (v4) | Avg Score | Notes |
150
- |----------|------------|-----------|-------|
151
- | Analytical (Slice/Gather) | ~25 | ~13-21 | v5 Slice-based should be ~20-25 |
152
- | Conv (arc-gen validated) | ~25 | ~11 | Unchanged in v5 |
153
- | Unsolved | ~350 | 1.0 | Minimum score |
154
- | **v4 Est LB** | | | **~670** |
155
- | **v5 Est LB** | | | **TBD (running)** |
 
 
156
 
157
  ### Path to 3000+
158
- 1. βœ… ARC-GEN validation (v4: +155 pts)
159
- 2. βœ… New analytical solvers: shift, mirror, crop, quad_mirror (v4: +8 tasks)
160
- 3. βœ… Color map Gather for permutations (v4: +15 pts)
161
- 4. βœ… Opset 17 Slice-based flip/rotate (v5: ~0 MACs for these transforms)
162
- 5. βœ… Refactored to modular package (v5)
163
- 6. βœ… lstsq crash fix β€” try/except for SVD non-convergence (v5)
164
- 7. πŸ”² **Fix arc-gen survival** β€” PCA, Lasso, skip bad ks, GD with early stopping
165
- 8. πŸ”² **Hard tasks** β€” hash matchers, run-length detectors, LLM rescue
166
- 9. πŸ”² **Score optimization** β€” ONNX optimizer, best-of-N selection, channel reduction
 
167
 
168
  **Blending is EXPLICITLY excluded** β€” user's competitive philosophy.
169
 
@@ -171,7 +185,7 @@ Run with: `python -m neurogolf_solver.main [args]`
171
 
172
  Before submitting to Kaggle:
173
  - [ ] All models validated against train + test + arc-gen (locally)
174
- - [ ] EXCLUDED tasks {21,55,80,184,202,366} not included
175
  - [ ] No GatherElements in any model
176
  - [ ] No banned ops
177
  - [ ] Each .onnx < 1.44 MB
@@ -184,7 +198,7 @@ Before submitting to Kaggle:
184
  | Location | Path | Notes |
185
  |----------|------|-------|
186
  | HF Repo | `rogermt/neurogolf-solver` | All code + data |
187
- | **Solver package** | `neurogolf_solver/` | **v5 β€” 16 files, modular** |
188
  | Legacy monolith | `neurogolf_solver.py` | v4, kept for reference β€” do not edit |
189
  | Official utils | `neurogolf_utils.py` | Kaggle scoring lib (needs onnx_tool) |
190
  | ARC-GEN data | `ARC-GEN-100K.zip` | 400 files, 100K examples |
 
32
  ## Quick Reference
33
 
34
  - **Repo**: `rogermt/neurogolf-solver`
35
+ - **Current version**: v5.2 β€” 52 solved, ~710 score, est LB ~1058
36
+ - **Previous best on Kaggle**: v4.3 β€” 50 arc-gen-validated tasks, est LB ~670
37
  - **Kaggle runtime**: 12 hours for submission
38
  - **Target**: 3000+ LB (our own solver, no blending)
39
  - **Detailed history, mistakes, analysis**: see `LEARNING.md`
 
48
  | Max file size | 1.44 MB per model |
49
  | Banned ops | Loop, Scan, NonZero, Unique, Script, Function |
50
  | Scoring | `max(1.0, 25.0 - ln(MACs + memory + params))` per task |
51
+ | Tasks | **All 400 count. There are NO excluded tasks.** |
52
  | Validation | Models checked against **train + test + arc-gen** (ALL splits) |
53
  | Submission | `submission.zip` with `task001.onnx`–`task400.onnx` + optional `submission.csv` |
54
 
 
64
 
65
  ## 3. Architecture
66
 
67
+ ### Package Structure (v5.2)
68
  ```
69
  neurogolf_solver/
70
+ β”œβ”€β”€ constants.py # Grid dims, opset, limits (NO excluded tasks)
71
  β”œβ”€β”€ config.py # Runtime providers, opset factory
72
  β”œβ”€β”€ data_loader.py # Task loading, one-hot, example extraction
73
  β”œβ”€β”€ validators.py # Model validation against all splits
 
78
  β”œβ”€β”€ main.py # Entry point with argparse
79
  └── solvers/
80
  β”œβ”€β”€ analytical.py # identity, constant, color_map, transpose
81
+ β”œβ”€β”€ geometric.py # flip, rotate, shift, crop, gravity (detect only)
82
  β”œβ”€β”€ tiling.py # tile, upscale, mirror, concat, spatial_gather
83
+ β”œβ”€β”€ conv.py # lstsq conv (fixed, variable, diffshape, var_diff) + PCR fallback
84
+ β”œβ”€β”€ gravity.py # Unrolled bubble-sort gravity (Conv+Where, 4 dirs) β€” Task 78
85
+ β”œβ”€β”€ edge.py # Laplacian edge detection (0 matches currently)
86
+ β”œβ”€β”€ mode.py # Mode fill (ReduceSumβ†’ArgMaxβ†’Expand) β€” Task 129
87
  └── solver_registry.py # ANALYTICAL_SOLVERS list + solve_task()
88
  ```
89
 
 
95
  identity β†’ constant β†’ color_map β†’ transpose β†’ flip β†’ rotate β†’
96
  shift β†’ tile β†’ upscale β†’ kronecker β†’ nonuniform_scale β†’
97
  mirror_h β†’ mirror_v β†’ quad_mirror β†’ concat β†’ concat_enhanced β†’
98
+ diagonal_tile β†’ fixed_crop β†’ spatial_gather β†’ varshape_spatial_gather β†’
99
+ gravity_unrolled β†’ edge_detect β†’ mode_fill
100
+
101
+ 2. Conv solvers (lstsq fitted, validated against arc-gen, PCR fallback):
102
+ conv_fixed — Slice→Conv→ArgMax→Equal+Cast→Pad
103
+ conv_variable — Conv(30×30)→ArgMax→Equal+Cast→Mul(mask)
104
+ conv_diffshape — Slice→Conv→Slice(crop)→ArgMax→Equal+Cast→Pad
105
+ conv_var_diff — Conv(30×30)→ArgMax→Equal+Cast→Mul(input_mask)
106
  ```
107
 
108
  ### ONNX Building Rules (opset 17)
 
115
  - **ReduceSum** with axes as **tensor input** (opset 13+ requirement)
116
  - **Pad** with tensor-based `pads` input (opset 11+ requirement)
117
  - **lstsq calls** must be wrapped in `try/except (LinAlgError, ValueError)` β€” SVD can fail to converge
118
+ - **ArgMax + Equal+Cast** before Pad to ensure clean one-hot in padded region (gravity solver lesson)
119
 
120
+ ### Conv Fitting
 
 
121
 
122
+ **Conv ceiling: ~25 tasks.** Regularization (Ridge, PCA/SVD, skip-ks) all tested and rejected.
123
+ Root cause: architecture mismatch β€” most unsolved tasks need non-local ops, not local conv patches.
 
124
 
125
+ **Current fitting strategy (v5.1+):**
126
+ - Composable primitives: `_build_patch_matrix` + `_solve_weights` + `_extract_weights`
127
+ - PCR fallback via `_solve_weights_pcr` (deferred 2nd pass, 0 new solves but no regressions)
128
  - Kernel sizes: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]
129
  - Try no-bias first, then bias
130
  - lstsq wrapped in try/except for SVD non-convergence
131
  - **Validate against arc-gen BEFORE accepting** β€” reject if fails
132
 
133
+ ### New Solver Architectures (v5.2)
134
+
135
+ **gravity.py** β€” Unrolled bubble-sort via Conv+Where
136
+ - 4 directions Γ— 10 bg colors, max(IH,IW) steps
137
+ - Per step: 2Γ— Conv(3Γ—3 shift), 3Γ— ReduceSum, 3Γ— Greater, 2Γ— And, 2Γ— Where
138
+ - Final: ArgMax + Equal+Cast + Pad (clean one-hot)
139
+ - Cost: ~16M (10Γ—10 grid), score ~8.4
140
+ - **Validated: Task 78 (direction=up, bg=0)**
141
+
142
+ **edge.py** β€” Laplacian conv boundary detection
143
+ - Conv 1Γ—1 (channel collapse) β†’ Conv 3Γ—3 (Laplacian) β†’ Abs β†’ Greater β†’ And β†’ Where
144
+ - Cost: ~16K MACs, score ~15
145
+ - **0 matches currently** β€” edge definition may be too strict
146
 
147
+ **mode.py** β€” Global majority color fill
148
+ - Slice β†’ ReduceSum(axes=[2,3]) β†’ ArgMax β†’ Equal+Cast β†’ Expand β†’ Pad
149
+ - Cost: ~2K, score ~19.5
150
+ - **Validated: Task 129**
 
151
 
152
  ## 4. Performance
153
 
154
+ **The lstsq conv solver is the speed bottleneck.** Use `--conv_budget` to cap time per task (5s locally, 60s on Kaggle).
155
 
156
  **Do NOT** try to GPU-accelerate lstsq. The bottleneck is algorithmic (O(nΒ³) SVD), not device.
157
 
158
+ ## 5. Score Accounting (v5.2)
159
 
160
+ | Category | Tasks | Avg Score | Notes |
161
+ |----------|-------|-----------|-------|
162
+ | Analytical | 24 | ~16 | identity, constant, color_map, transpose, flip, rotate, shift, tile, mirrors, etc. |
163
+ | Conv (lstsq) | 25 | ~10.5 | conv_fixed, conv_var, conv_diff, conv_var_diff |
164
+ | Gravity | 1 | 8.4 | Task 78 |
165
+ | Mode fill | 1 | 19.5 | Task 129 |
166
+ | Timing artifact | 1 | 8.2 | Task 61 (conv_var, only on slow hardware) |
167
+ | **Unsolved** | **348** | **1.0** | Minimum score |
168
+ | **Total** | **52/400** | | **~710 solved + 348 = ~1058 est LB** |
169
 
170
  ### Path to 3000+
171
+ 1. βœ… ARC-GEN validation (v4)
172
+ 2. βœ… New analytical solvers (v4)
173
+ 3. βœ… Opset 17 Slice-based transforms (v5)
174
+ 4. βœ… lstsq crash fix + modular package (v5)
175
+ 5. βœ… PCR fallback in conv (v5.1 β€” 0 new solves but clean code)
176
+ 6. βœ… Gravity solver (v5.2 β€” Task 78)
177
+ 7. βœ… Mode fill solver (v5.2 β€” Task 129)
178
+ 8. πŸ”² **Phase 3 solvers**: flood fill, composition, color LUT, CumSum β€” see TODO.md
179
+ 9. πŸ”² **Phase 1a**: Opset 17 conversions for existing analytical tasks (score optimization)
180
+ 10. πŸ”² **Phase 4**: ONNX optimizer, best-of-N selection
181
 
182
  **Blending is EXPLICITLY excluded** β€” user's competitive philosophy.
183
 
 
185
 
186
  Before submitting to Kaggle:
187
  - [ ] All models validated against train + test + arc-gen (locally)
188
+ - [ ] **All 400 tasks attempted** (no exclusions)
189
  - [ ] No GatherElements in any model
190
  - [ ] No banned ops
191
  - [ ] Each .onnx < 1.44 MB
 
198
  | Location | Path | Notes |
199
  |----------|------|-------|
200
  | HF Repo | `rogermt/neurogolf-solver` | All code + data |
201
+ | **Solver package** | `neurogolf_solver/` | **v5.2 β€” 19 files, modular** |
202
  | Legacy monolith | `neurogolf_solver.py` | v4, kept for reference β€” do not edit |
203
  | Official utils | `neurogolf_utils.py` | Kaggle scoring lib (needs onnx_tool) |
204
  | ARC-GEN data | `ARC-GEN-100K.zip` | 400 files, 100K examples |