Split SKILL.md (rules/quick-ref) + LEARNING.md (history/mistakes/analysis)
Browse files
SKILL.md
CHANGED
|
@@ -3,310 +3,127 @@ name: neurogolf-solver
|
|
| 3 |
description: Build and improve an ONNX model generator for the NeuroGolf Championship (Kaggle). Produces 400 tiny ONNX models (opset 10, IR 10, input/output [1,10,30,30] one-hot float32) for ARC-AGI tasks. Scoring = max(1, 25 - ln(MACs + memory_bytes + params)). Lower cost = higher score. Use this skill whenever working on this competition, debugging submission failures, or starting a fresh session.
|
| 4 |
---
|
| 5 |
|
| 6 |
-
# NeuroGolf Solver
|
| 7 |
|
| 8 |
-
##
|
| 9 |
|
| 10 |
-
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
-
##
|
| 14 |
-
- **Input**: `"input"` float32 `[1, 10, 30, 30]` β one-hot encoded grid (10 color channels, 30Γ30 spatial)
|
| 15 |
-
- **Output**: `"output"` float32 `[1, 10, 30, 30]` β same format
|
| 16 |
-
- **Opset**: 10, IR version: 10 (but opset 17 ALSO works on Kaggle β see Β§3)
|
| 17 |
-
- **Max file size**: 1.44 MB per model (floppy disk limit)
|
| 18 |
-
- **Banned ops**: Loop, Scan, NonZero, Unique, Script, Function
|
| 19 |
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
- `submission.zip` containing `task001.onnx` through `task400.onnx`
|
| 31 |
-
- Models must pass validation against ALL examples: **train + test + arc-gen**
|
| 32 |
-
- Optional: `submission.csv` with columns `task_id, total_cost`
|
| 33 |
-
|
| 34 |
-
### ARC-GEN Data (CRITICAL)
|
| 35 |
-
On Kaggle, each task JSON at `/kaggle/input/competitions/neurogolf-2026/taskNNN.json` contains:
|
| 36 |
-
```json
|
| 37 |
-
{"train": [...], "test": [...], "arc-gen": [...]}
|
| 38 |
-
```
|
| 39 |
-
The `arc-gen` key has **up to 262 additional examples per task** (100K total across 400 tasks) generated by Google's ARC-GEN system. **Models are validated against ALL splits including arc-gen.** A model that passes train+test but fails arc-gen scores ZERO on Kaggle.
|
| 40 |
-
|
| 41 |
-
Locally, ARC-GEN data is in separate files at `ARC-GEN-100K/{hex_id}.json` as a list of `{input, output}` dicts. Must be merged with the ARC-AGI task data.
|
| 42 |
-
|
| 43 |
-
## 2. Current State (v3 β v4 in progress)
|
| 44 |
-
|
| 45 |
-
### v3 Results: 307/400 solved locally, LB score ~501 (NOT ~3267)
|
| 46 |
-
The massive gap (3267 local vs 501 LB) means **most of our conv models fail ARC-GEN validation on Kaggle**. The conv is fitted on ~6 train+test examples but must generalize to ~250 arc-gen examples of varying sizes. Many don't.
|
| 47 |
-
|
| 48 |
-
### Solver Breakdown (v3)
|
| 49 |
-
```
|
| 50 |
-
conv_var: 125, conv_fixed: 107, conv_diff: 39, spatial_gather: 16,
|
| 51 |
-
concat: 5, color_map: 4, concat_enhanced: 4, rotate: 3,
|
| 52 |
-
transpose: 2, upscale: 1, varshape_spatial_gather: 1
|
| 53 |
-
```
|
| 54 |
-
|
| 55 |
-
### Repository
|
| 56 |
-
- HF: `rogermt/neurogolf-solver`
|
| 57 |
-
- Files: `neurogolf_solver.py`, `neurogolf_utils.py` (official Kaggle utils), `ARC-GEN-100K.zip`, `neurogolf-2026-solver-notebooks.zip`
|
| 58 |
-
|
| 59 |
-
## 3. Key Differences: Our Solver vs High-Scoring Notebooks
|
| 60 |
-
|
| 61 |
-
### The 4200-point notebook (`neurogolf-2026-tiny-onnx-solver`)
|
| 62 |
-
This is a **BLEND notebook** β it does NOT solve tasks from scratch. It:
|
| 63 |
-
1. **Phase 1**: Loads 12+ other notebooks' `submission.zip` files as inputs
|
| 64 |
-
2. For each task, picks the cheapest valid model across all sources
|
| 65 |
-
3. **Phase 2**: Tries loose ONNX files from dataset inputs
|
| 66 |
-
4. **Phase 3**: Runs its own solver only on remaining unsolved tasks
|
| 67 |
-
5. Validates EVERYTHING against train+test+arc-gen before including
|
| 68 |
-
6. Result: 338/400 solved, est. score 4197.5
|
| 69 |
-
|
| 70 |
-
**Critical insight**: The 4200 score comes from BLENDING many solutions, not from a single solver. The solver itself only adds 0 new tasks in Phase 3. All 338 come from other notebooks' pre-built models.
|
| 71 |
-
|
| 72 |
-
### The championship notebook (`the-2026-neurogolf-championship`)
|
| 73 |
-
Also a blend but with its own solver. Key differences from ours:
|
| 74 |
-
- Uses **opset 17** (not 10!) β works fine on Kaggle
|
| 75 |
-
- Has **shift detector**, **gravity detector**, **mirror detectors**, **fixed crop detector**, **outline detector**
|
| 76 |
-
- Has **composition detectors**: rotation+color, transpose+color, flip+color
|
| 77 |
-
- Has **channel reduction**: reduces 10βN channels for fewer colors β cheaper models
|
| 78 |
-
- Uses **PyTorch learned conv**: multi-seed Adam training, ternary weight snapping
|
| 79 |
-
- Uses **two-layer conv**: ConvβReLUβConv for complex patterns
|
| 80 |
-
- Validates against `train + arc-gen[:30]` (capped at 30 arc-gen examples)
|
| 81 |
-
- Result: 288 from own solver + more from blended inputs
|
| 82 |
-
|
| 83 |
-
### What they have that we don't
|
| 84 |
-
| Feature | Them | Us |
|
| 85 |
-
|---------|------|-----|
|
| 86 |
-
| ARC-GEN validation | β
validate against arc-gen | β v3 ignores arc-gen |
|
| 87 |
-
| ARC-GEN in fitting | β
uses arc-gen[:3] in detectors | β fits only train+test |
|
| 88 |
-
| Opset 17 | β
uses freely | β stuck on opset 10 |
|
| 89 |
-
| Shift detector | β
| β |
|
| 90 |
-
| Gravity detector | β
| β |
|
| 91 |
-
| Mirror detectors | β
(h, v, quad) | β |
|
| 92 |
-
| Fixed crop detector | β
| β |
|
| 93 |
-
| Extract outline | β
| β |
|
| 94 |
-
| Composition (rot+color) | β
| β |
|
| 95 |
-
| Channel reduction | β
(fewer channels = cheaper) | β |
|
| 96 |
-
| PyTorch learned conv | β
(multi-seed, ternary snap) | β (lstsq only) |
|
| 97 |
-
| Two-layer conv | β
(ConvβReLUβConv) | β |
|
| 98 |
-
| Blend from other notebooks | β
(12+ sources) | β |
|
| 99 |
-
|
| 100 |
-
## 4. The Submission Score Gap Problem
|
| 101 |
-
|
| 102 |
-
### Why LB = 501 when local = 3267
|
| 103 |
-
Our 307 solved tasks generate ONNX models locally. But on Kaggle:
|
| 104 |
-
1. Models are validated against `train + test + arc-gen` (all splits)
|
| 105 |
-
2. Conv models fitted on 6 examples often fail on 250+ arc-gen examples
|
| 106 |
-
3. Failed models score 0 (not even the 1.0 minimum)
|
| 107 |
-
4. Likely only ~40-50 of our 307 models actually pass on Kaggle
|
| 108 |
-
|
| 109 |
-
### The fix priority
|
| 110 |
-
1. **Validate locally against arc-gen** before submitting β only include models that pass
|
| 111 |
-
2. **Include arc-gen examples in conv fitting** β more data = better generalization
|
| 112 |
-
3. **Add more analytical solvers** (shift, mirror, gravity, crop) β these always generalize
|
| 113 |
-
4. **Try opset 17** β unlocks more ops, may work fine on Kaggle
|
| 114 |
-
|
| 115 |
-
## 5. Architecture & Code Structure
|
| 116 |
-
|
| 117 |
-
### `neurogolf_solver.py` structure
|
| 118 |
-
```
|
| 119 |
-
Constants: BATCH=1, CH=10, GH=GW=30
|
| 120 |
-
EXCLUDED_TASKS = {21, 55, 80, 184, 202, 366}
|
| 121 |
-
|
| 122 |
-
load_tasks_dir(data_dir, arcgen_dir) # Load + merge ARC-GEN
|
| 123 |
-
to_onehot(grid) # Grid β [1,10,30,30]
|
| 124 |
-
validate(path, td) # Check model on ALL splits
|
| 125 |
-
score_network(path) # MACs + memory + params
|
| 126 |
-
|
| 127 |
-
Analytical Solvers (priority order):
|
| 128 |
-
identity β constant β color_map β transpose β flip β rotate β
|
| 129 |
-
tile β upscale β kronecker β concat β concat_enhanced β
|
| 130 |
-
diagonal_tile β spatial_gather β varshape_spatial_gather
|
| 131 |
-
|
| 132 |
-
Conv Solvers:
|
| 133 |
-
solve_conv_fixed() β Fixed same-shape: SliceβConvβArgMaxβEqual+CastβPad
|
| 134 |
-
solve_conv_variable() β Variable same-shape: Conv(30Γ30)βArgMaxβEqual+CastβMul(mask)
|
| 135 |
-
solve_conv_diffshape()β Fixed diff-shape (outputβ€input)
|
| 136 |
-
solve_conv_var_diff() β Variable diff-shape (outputβ€input)
|
| 137 |
-
|
| 138 |
-
Main: solve_task() β run_tasks() β generate submission.zip + submission.csv
|
| 139 |
-
```
|
| 140 |
-
|
| 141 |
-
### ONNX Building Patterns (opset 10)
|
| 142 |
-
```python
|
| 143 |
-
# Model skeleton
|
| 144 |
-
def mk(nodes, inits=None):
|
| 145 |
-
x = helper.make_tensor_value_info("input", DT, [1,10,30,30])
|
| 146 |
-
y = helper.make_tensor_value_info("output", DT, [1,10,30,30])
|
| 147 |
-
g = helper.make_graph(nodes, "g", [x], [y], initializer=inits or [])
|
| 148 |
-
return helper.make_model(g, ir_version=10, opset_imports=[helper.make_opsetid("", 10)])
|
| 149 |
-
|
| 150 |
-
# One-hot via Equal+Cast (NOT OneHot β has CUDA issues)
|
| 151 |
-
classes = np.arange(10).reshape(1,10,1,1)
|
| 152 |
-
Equal(argmax_output, classes) β Cast(to=FLOAT)
|
| 153 |
-
|
| 154 |
-
# Spatial remap via Gather (NOT GatherElements β requires opset 11!)
|
| 155 |
-
Reshape([1,10,30,30] β [1,10,900]) β Gather(axis=2, indices=[900]) β Reshape back
|
| 156 |
|
| 157 |
-
#
|
| 158 |
-
Conv(input, W, kernel_shape=[ks,ks], pads=[pad]*4) β ArgMax β Equal+Cast β Mul(mask)
|
| 159 |
|
| 160 |
-
|
| 161 |
-
```
|
| 162 |
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
| OneHot | 9 | β οΈ No CUDA kernel. Use Equal+Cast instead |
|
| 169 |
-
| Conv | 1 | β
Safe |
|
| 170 |
-
| ArgMax | 1 | β
Safe |
|
| 171 |
-
| ReduceSum | 1 | β
Safe |
|
| 172 |
-
| Pad | 2 (opset 10 syntax) | β
Use `pads` attribute for opset 10 |
|
| 173 |
-
| Slice | 10 | β
With starts/ends as inputs |
|
| 174 |
-
| Tile | 6 | β
Safe |
|
| 175 |
-
| ScatterElements | 11 | β οΈ Requires opset 11+ |
|
| 176 |
|
| 177 |
-
##
|
| 178 |
|
| 179 |
-
###
|
| 180 |
-
```python
|
| 181 |
-
patches = [] # [N, 10*ks*ks] feature vectors
|
| 182 |
-
targets = [] # [N] integer class labels
|
| 183 |
-
P, T_oh = build_from_examples(exs)
|
| 184 |
-
WT = np.linalg.lstsq(P, T_oh)[0] # Closed-form optimal weights
|
| 185 |
-
if np.argmax(P @ WT, 1) == T: SUCCESS # Perfect fit check
|
| 186 |
```
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
# Train with MSE or cross-entropy, export with torch.onnx.export(model, dummy, path, opset_version=10)
|
| 200 |
-
# Then add argmax+equal+cast+mask post-processing in ONNX manually
|
| 201 |
```
|
| 202 |
-
- Can fit nonlinear patterns lstsq can't
|
| 203 |
-
- Multi-seed training (0, 7, 42) for robustness
|
| 204 |
-
- Ternary weight snapping: round weights to {-1, 0, 1} for smaller models
|
| 205 |
-
|
| 206 |
-
### ARC-GEN for conv fitting
|
| 207 |
-
The conv MUST generalize to arc-gen examples. Two approaches:
|
| 208 |
-
1. **Include arc-gen in fitting data** β use `train + test + arc-gen[:20]` for lstsq
|
| 209 |
-
2. **Validate against arc-gen after fitting** β only accept if passes all splits
|
| 210 |
-
|
| 211 |
-
## 7. Unsolved Tasks (94 in v3)
|
| 212 |
-
|
| 213 |
-
### Categories
|
| 214 |
-
| Category | Count | Why Unsolved |
|
| 215 |
-
|----------|-------|-------------|
|
| 216 |
-
| Variable diff-shape (output smaller) | ~60 | Output shape depends on input content |
|
| 217 |
-
| Variable diff-shape (output larger) | ~17 | Same problem |
|
| 218 |
-
| Same-shape, complex pattern | ~10 | Need larger kernels or multi-layer |
|
| 219 |
-
| Fixed diff-shape, output larger | ~7 | Input-content-dependent patterns |
|
| 220 |
-
|
| 221 |
-
### Fundamental Blocker
|
| 222 |
-
Variable-shape tasks where output size depends on input CONTENT cannot be solved with a static ONNX graph. The only workaround: conv learns to put valid content in the right region, masked by input-derived spatial mask.
|
| 223 |
-
|
| 224 |
-
## 8. Mistakes Log (DO NOT REPEAT)
|
| 225 |
-
|
| 226 |
-
### GatherElements (opset 11) β Fixed in v3
|
| 227 |
-
`GatherElements` requires opset 11. Works on Kaggle's old ORT but fails on ORT 1.25+. Replaced with `Gather` (opset 1) using 1D indices on flattened spatial dim.
|
| 228 |
-
|
| 229 |
-
### s_flip still used GatherElements β Fixed in v4
|
| 230 |
-
The `s_flip` solver was still using `GatherElements`. Must use `_build_gather_model()` instead.
|
| 231 |
-
|
| 232 |
-
### ARC-GEN not loaded β The #1 score killer
|
| 233 |
-
v3 had `if 'arc-gen' in td` in validate() but never loaded arc-gen data into `td`. So validation always passed (no arc-gen to check), but Kaggle validated against arc-gen and most conv models failed.
|
| 234 |
-
|
| 235 |
-
### Conv fitted on too few examples
|
| 236 |
-
Fitting on 6 train+test examples β overfits to small sample. Must include arc-gen examples in fitting data for better generalization.
|
| 237 |
-
|
| 238 |
-
### No submission.csv
|
| 239 |
-
Kaggle may need submission.csv alongside submission.zip.
|
| 240 |
-
|
| 241 |
-
### Wrong score_network without onnx_tool
|
| 242 |
-
Our fallback `score_network` returned `(0, 0, 0)` instead of real costs. Need static profiler that matches Kaggle's calculation.
|
| 243 |
-
|
| 244 |
-
### Ignored EXCLUDED tasks
|
| 245 |
-
Wasted time trying to solve tasks 21, 55, 80, 184, 202, 366 which are officially excluded.
|
| 246 |
-
|
| 247 |
-
## 9. Competitive Strategy
|
| 248 |
-
|
| 249 |
-
### Path to 4800+ LB score
|
| 250 |
-
1. **Fix ARC-GEN validation** β immediately recover ~200 points from models that actually work
|
| 251 |
-
2. **Add missing analytical solvers** (shift, mirror, gravity, crop, composition) β +20-30 tasks, ~13 points each
|
| 252 |
-
3. **PyTorch multi-layer conv** β solve 5-10 more complex same-shape tasks
|
| 253 |
-
4. **Channel reduction** β reduce cost of existing solutions by 30-50%
|
| 254 |
-
5. **Blend with other notebooks** β the 4200 notebook proves this is the meta-strategy
|
| 255 |
-
|
| 256 |
-
### Quick wins
|
| 257 |
-
- Transpose: score=25.0 (cost=0, just permute dims) β already have
|
| 258 |
-
- Identity: score=25.0 β already have
|
| 259 |
-
- Color map via channel Gather: cheaper than Conv 1Γ1 (params+nbytes only, no MACs)
|
| 260 |
-
- Analytical solvers: ~13 points each (cost β 165K)
|
| 261 |
-
- Small conv (ks=1): ~11-13 points
|
| 262 |
-
- Large conv (ks=29): ~7 points
|
| 263 |
-
|
| 264 |
-
## 10. Data & File Locations
|
| 265 |
-
|
| 266 |
-
### On Kaggle
|
| 267 |
-
```
|
| 268 |
-
/kaggle/input/competitions/neurogolf-2026/
|
| 269 |
-
task001.json ... task400.json (with train+test+arc-gen)
|
| 270 |
-
neurogolf_utils/neurogolf_utils.py
|
| 271 |
-
```
|
| 272 |
-
|
| 273 |
-
### Locally
|
| 274 |
-
```
|
| 275 |
-
ARC-AGI/data/training/ # 400 hex-named .json files (train+test only)
|
| 276 |
-
ARC-GEN-100K/ # 400 hex-named .json files (arc-gen examples)
|
| 277 |
-
neurogolf-solver/
|
| 278 |
-
neurogolf_solver.py # Main solver
|
| 279 |
-
neurogolf_utils.py # Official Kaggle utils (needs onnx_tool, IPython)
|
| 280 |
-
```
|
| 281 |
-
|
| 282 |
-
### ARC-GEN file format
|
| 283 |
-
```python
|
| 284 |
-
# ARC-GEN-100K/{hex_id}.json is a LIST of examples:
|
| 285 |
-
[{"input": [[...]], "output": [[...]]}, ...]
|
| 286 |
-
# Must be merged into task data as td['arc-gen'] = list_of_examples
|
| 287 |
-
```
|
| 288 |
-
|
| 289 |
-
### ARC-GEN GitHub generator
|
| 290 |
-
https://github.com/google/ARC-GEN β Can generate MORE examples per task if needed.
|
| 291 |
-
|
| 292 |
-
## 11. Reference Notebooks (in repo as neurogolf-2026-solver-notebooks.zip)
|
| 293 |
-
|
| 294 |
-
| Notebook | LB Score | Tasks | Key Technique |
|
| 295 |
-
|----------|----------|-------|---------------|
|
| 296 |
-
| neurogolf-2026-tiny-onnx-solver | ~4200 | 338 | Mega-blend of 12+ notebooks |
|
| 297 |
-
| 4200-v5-neurogolf-fix | ~5700 est | 341 | Same blend, manual LLM rescue tasks |
|
| 298 |
-
| the-2026-neurogolf-championship | ~3200 est | 288 | Own solver + blend |
|
| 299 |
-
| neurogolf-logic-driven-ensembling | β | 401 | Pure ensembling from zips |
|
| 300 |
-
|
| 301 |
-
## 12. Testing Checklist
|
| 302 |
|
| 303 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 304 |
- [ ] All models validated against train + test + arc-gen (locally)
|
| 305 |
- [ ] EXCLUDED tasks {21,55,80,184,202,366} not included
|
| 306 |
-
- [ ] No GatherElements
|
| 307 |
-
- [ ] No banned ops
|
| 308 |
-
- [ ] Each .onnx
|
| 309 |
-
- [ ] submission.zip < 1.44 MB total
|
| 310 |
- [ ] submission.csv generated
|
| 311 |
-
- [ ] Local estimated score calculated
|
| 312 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
description: Build and improve an ONNX model generator for the NeuroGolf Championship (Kaggle). Produces 400 tiny ONNX models (opset 10, IR 10, input/output [1,10,30,30] one-hot float32) for ARC-AGI tasks. Scoring = max(1, 25 - ln(MACs + memory_bytes + params)). Lower cost = higher score. Use this skill whenever working on this competition, debugging submission failures, or starting a fresh session.
|
| 4 |
---
|
| 5 |
|
| 6 |
+
# NeuroGolf Solver
|
| 7 |
|
| 8 |
+
## Quick Reference
|
| 9 |
|
| 10 |
+
- **Repo**: `rogermt/neurogolf-solver`
|
| 11 |
+
- **Current version**: v4.1 β 50 arc-gen-validated tasks, est LB ~670
|
| 12 |
+
- **Kaggle runtime**: 12 hours for submission
|
| 13 |
+
- **Target**: 4800+ LB (first page)
|
| 14 |
+
- **Detailed history, mistakes, analysis**: see `LEARNING.md`
|
| 15 |
|
| 16 |
+
## 1. Competition Rules
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
+
| Item | Value |
|
| 19 |
+
|------|-------|
|
| 20 |
+
| Input/Output | `"input"`/`"output"` float32 `[1,10,30,30]` one-hot |
|
| 21 |
+
| Opset | 10 (IR 10). Opset 17 also works on Kaggle |
|
| 22 |
+
| Max file size | 1.44 MB per model |
|
| 23 |
+
| Banned ops | Loop, Scan, NonZero, Unique, Script, Function |
|
| 24 |
+
| Scoring | `max(1.0, 25.0 - ln(MACs + memory + params))` per task |
|
| 25 |
+
| Excluded tasks | {21, 55, 80, 184, 202, 366} β skip these |
|
| 26 |
+
| Validation | Models checked against **train + test + arc-gen** (ALL splits) |
|
| 27 |
+
| Submission | `submission.zip` with `task001.onnx`β`task400.onnx` + optional `submission.csv` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
+
## 2. ARC-GEN Data β THE Critical Factor
|
|
|
|
| 30 |
|
| 31 |
+
**A model that passes train+test but fails arc-gen scores ZERO on Kaggle.**
|
|
|
|
| 32 |
|
| 33 |
+
- Kaggle tasks at `/kaggle/input/competitions/neurogolf-2026/taskNNN.json` contain `{"train":[], "test":[], "arc-gen":[]}`
|
| 34 |
+
- Up to 262 arc-gen examples per task (100K total)
|
| 35 |
+
- Locally: ARC-GEN in `ARC-GEN-100K/{hex_id}.json` as list of `{input, output}` β merge into task data
|
| 36 |
+
- Conv fitting: include arc-gen examples **only when grid sizes match** train/test (otherwise lstsq fails)
|
| 37 |
+
- Validation: always check against `arc-gen[:30]` minimum
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
+
## 3. Architecture
|
| 40 |
|
| 41 |
+
### Solver Pipeline
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
```
|
| 43 |
+
1. Analytical solvers (instant, zero/low cost, always arc-gen safe):
|
| 44 |
+
identity β constant β color_map β transpose β flip β rotate β
|
| 45 |
+
shift β tile β upscale β kronecker β nonuniform_scale β
|
| 46 |
+
mirror_h β mirror_v β quad_mirror β concat β concat_enhanced β
|
| 47 |
+
diagonal_tile β fixed_crop β spatial_gather β varshape_spatial_gather
|
| 48 |
+
|
| 49 |
+
2. Conv solvers (lstsq fitted, validated against arc-gen):
|
| 50 |
+
conv_fixed β SliceβConvβArgMaxβEqual+CastβPad
|
| 51 |
+
conv_variable β Conv(30Γ30)βArgMaxβEqual+CastβMul(mask)
|
| 52 |
+
conv_diffshapeβ SliceβConvβSlice(crop)βArgMaxβEqual+CastβPad
|
| 53 |
+
conv_var_diff β Conv(30Γ30)βArgMaxβEqual+CastβMul(input_mask)
|
|
|
|
|
|
|
|
|
|
| 54 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
+
### ONNX Building Rules
|
| 57 |
+
- **Gather** (opset 1) for spatial remapping β NEVER use GatherElements (opset 11)
|
| 58 |
+
- **Equal+Cast** for one-hot β NEVER use OneHot (no CUDA kernel)
|
| 59 |
+
- **Channel Gather** for permutation color maps (0 MACs, score ~21 vs ~13 for Conv 1Γ1)
|
| 60 |
+
- **Conv 1Γ1** for non-permutation color maps (has MACs but correct)
|
| 61 |
+
- **ReduceSum(input, axes=[1])** for variable-shape mask
|
| 62 |
+
|
| 63 |
+
### Conv Fitting Strategy
|
| 64 |
+
- lstsq on train+test (+arc-gen when same grid size, capped at 10 examples)
|
| 65 |
+
- Kernel sizes: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]
|
| 66 |
+
- Try no-bias first, then bias
|
| 67 |
+
- **Validate against arc-gen BEFORE accepting** β reject if fails
|
| 68 |
+
- Bottleneck is algorithmic (O(nΒ³) SVD), NOT device β GPU/CuPy doesn't help, just crashes
|
| 69 |
+
|
| 70 |
+
## 4. Performance Bottleneck
|
| 71 |
+
|
| 72 |
+
**The lstsq conv solver is the speed bottleneck.** For ks=29 on 21Γ21 grids with 16 examples: 7056Γ8410 matrix SVD. This is pure math cost β moving to GPU (CuPy) doesn't help because:
|
| 73 |
+
1. Same O(nΒ³) algorithmic cost
|
| 74 |
+
2. GPU memory fills up (~1GB for large matrices) and crashes
|
| 75 |
+
3. Falls back to CPU anyway after CUDA error
|
| 76 |
+
|
| 77 |
+
**Do NOT** try to GPU-accelerate lstsq. Use `--conv_budget` to cap time per task (10-20s locally, 60s on Kaggle's 12hr runtime). The real win is more analytical solvers, not faster conv.
|
| 78 |
+
|
| 79 |
+
## 5. Score Accounting (v4.1)
|
| 80 |
+
|
| 81 |
+
| Category | Tasks | Avg Score | Total |
|
| 82 |
+
|----------|-------|-----------|-------|
|
| 83 |
+
| Analytical (gather, rotate, etc.) | 25 | ~16 | ~400 |
|
| 84 |
+
| Conv (arc-gen validated) | 25 | ~11 | ~275 |
|
| 85 |
+
| Unsolved | 344 | 1.0 | 344 |
|
| 86 |
+
| **Estimated LB** | | | **~670** |
|
| 87 |
+
|
| 88 |
+
### Path to 4800+
|
| 89 |
+
1. β
ARC-GEN validation (fixed: +155 pts by eliminating 0-scoring models)
|
| 90 |
+
2. β
New analytical solvers: shift, mirror, crop, quad_mirror (+8 tasks)
|
| 91 |
+
3. β
Color map Gather for permutations (+15 pts)
|
| 92 |
+
4. π² PyTorch multi-layer conv with ternary snap (est +20-50 tasks)
|
| 93 |
+
5. π² Channel reduction (fewer colors β cheaper models)
|
| 94 |
+
6. π² Composition detectors: rot+color, flip+color, transpose+color
|
| 95 |
+
7. π² Blend with other notebooks on Kaggle (the meta-strategy for 4000+)
|
| 96 |
+
|
| 97 |
+
## 6. Submission Checklist
|
| 98 |
+
|
| 99 |
+
Before submitting to Kaggle:
|
| 100 |
- [ ] All models validated against train + test + arc-gen (locally)
|
| 101 |
- [ ] EXCLUDED tasks {21,55,80,184,202,366} not included
|
| 102 |
+
- [ ] No GatherElements in any model
|
| 103 |
+
- [ ] No banned ops
|
| 104 |
+
- [ ] Each .onnx < 1.44 MB, submission.zip < 1.44 MB
|
|
|
|
| 105 |
- [ ] submission.csv generated
|
| 106 |
+
- [ ] Local estimated score calculated and compared to expected LB
|
| 107 |
+
|
| 108 |
+
## 7. Files & Locations
|
| 109 |
+
|
| 110 |
+
| Location | Path | Notes |
|
| 111 |
+
|----------|------|-------|
|
| 112 |
+
| HF Repo | `rogermt/neurogolf-solver` | All code + data |
|
| 113 |
+
| Solver | `neurogolf_solver.py` | v4.1, 1270 lines |
|
| 114 |
+
| Official utils | `neurogolf_utils.py` | Kaggle scoring lib (needs onnx_tool) |
|
| 115 |
+
| ARC-GEN data | `ARC-GEN-100K.zip` | 400 files, 100K examples |
|
| 116 |
+
| Notebooks | `neurogolf-2026-solver-notebooks.zip` | 5 reference notebooks |
|
| 117 |
+
| Kaggle data | `/kaggle/input/competitions/neurogolf-2026/` | task JSONs with arc-gen |
|
| 118 |
+
| Local ARC data | `ARC-AGI/data/training/` | 400 hex-named JSONs |
|
| 119 |
+
|
| 120 |
+
## 8. LEARNING.md Maintenance Rules
|
| 121 |
+
|
| 122 |
+
`LEARNING.md` is the knowledge accumulation file. Update it when:
|
| 123 |
+
- A bug is found and fixed β add to Mistakes Log with root cause
|
| 124 |
+
- A new approach is tried β record what worked, what didn't, and why
|
| 125 |
+
- Competition analysis reveals new insights β add to Competitive Intelligence
|
| 126 |
+
- Version milestones β update the Version History table
|
| 127 |
+
- Performance measurements β add concrete numbers
|
| 128 |
+
|
| 129 |
+
Structure: chronological within sections, newest entries first. Always include dates and version numbers. The goal is that a fresh agent with zero context can read LEARNING.md and understand every mistake to avoid and every technique that works.
|