name: neurogolf-solver
description: >-
Build and improve an ONNX model generator for the NeuroGolf Championship
(Kaggle). Produces 400 tiny ONNX models (opset 17, IR 8, input/output
[1,10,30,30] one-hot float32) for ARC-AGI tasks. Scoring = max(1, 25 - ln(MACs
+ memory_bytes + params)). Lower cost = higher score. Use this skill whenever
working on this competition, debugging submission failures, or starting a
fresh session.
NeuroGolf Solver
Development Methodology: The Closed-Loop
Research β Design β Experiment β Analyze β Research β ...
Rule: Loop until we have a CONFIRMED increase in arc-gen validated score.
| Phase | What | Exit Criteria |
|---|---|---|
| Research | Read papers, understand theory, find what works in similar regimes | Have a testable hypothesis with cited evidence |
| Design | Write MINIMAL code to test the hypothesis | Code is <200 lines, focused on ONE feature |
| Experiment | Run on representative task sample (β₯20 tasks, or all 400 if cheap) | Full arc-gen validation completed |
| Analyze | Compare with/without feature. Measure: tasks solved, arc-gen survival, total score | Data shows >10% improvement in arc-gen survival rate OR total score |
| Research | If failed: why? Read more papers. If succeeded: can we combine with other wins? | Next hypothesis ready |
Critical rules:
- NEVER write >200 lines without running them first
- NEVER claim a feature "works" until arc-gen validated on β₯20 tasks
- NEVER upload code to repo that hasn't been validated
- Theory from papers is NOT proof for our data β always test
- If a feature shows no improvement after testing, DELETE it β don't leave dead code
- Make surgical edits to individual files β NEVER rewrite the entire codebase in one shot
Quick Reference
- Repo:
rogermt/neurogolf-solver - Current version: v5.2 β 52 solved, ~710 score, est LB ~1058
- Previous best on Kaggle: v4.3 β 50 arc-gen-validated tasks, est LB ~670
- Kaggle runtime: 12 hours for submission
- Target: 3000+ LB (our own solver, no blending)
- Detailed history, mistakes, analysis: see
LEARNING.md - Roadmap & experiment queue: see
TODO.md
1. Competition Rules
| Item | Value |
|---|---|
| Input/Output | "input"/"output" float32 [1,10,30,30] one-hot |
| Opset | 17 (IR 8). Opset 10 also accepted on Kaggle |
| Max .onnx file size | 1.44 MB per ONNX file (not submission zip) |
| Static shapes | All tensors and parameters must have statically-defined shapes |
| Banned ops | Loop, Scan, NonZero, Unique, Script, Function |
| Scoring | max(1.0, 25.0 - ln(MACs + memory + params)) per task |
| Tasks | All 400 count. There are NO excluded tasks. Unsolved = 1.0 pt. |
| Validation | Models checked against train + test + arc-gen (ALL splits) |
| Submission | submission.zip with task001.onnxβtask400.onnx + optional submission.csv |
2. ARC-GEN Data β THE Critical Factor
A model that passes train+test but fails arc-gen scores ZERO on Kaggle.
- Kaggle tasks at
/kaggle/input/competitions/neurogolf-2026/taskNNN.jsoncontain{"train":[], "test":[], "arc-gen":[]} - Up to 262 arc-gen examples per task (100K total)
- Locally: ARC-GEN in
ARC-GEN-100K/{hex_id}.jsonas list of{input, output}β merge into task data - Conv fitting: include arc-gen examples only when grid sizes match train/test (otherwise lstsq fails)
- Validation: always check against
arc-gen[:30]minimum
3. Architecture
Package Structure (v5.2)
neurogolf_solver/
βββ constants.py # Grid dims, opset, limits (NO excluded tasks)
βββ config.py # Runtime providers, opset factory
βββ data_loader.py # Task loading, one-hot, example extraction
βββ validators.py # Model validation against all splits
βββ profiler.py # Static cost profiler (onnx_tool fallback)
βββ onnx_helpers.py # Opset 17 builders: Slice, Pad, ReduceSum, mk()
βββ gather_helpers.py # Gather-based spatial remapping models
βββ submission.py # run_tasks (W&B logging), zip/csv generation
βββ main.py # Entry point with argparse
βββ solvers/
βββ analytical.py # identity, constant, color_map, transpose
βββ geometric.py # flip, rotate, shift, crop, gravity (detect only)
βββ tiling.py # tile, upscale, mirror, concat, spatial_gather
βββ conv.py # lstsq conv (fixed, variable, diffshape, var_diff) + PCR fallback
βββ gravity.py # Unrolled bubble-sort gravity (Conv+Where, 4 dirs) β Task 78
βββ edge.py # Laplacian edge detection (0 matches currently)
βββ mode.py # Mode fill (ReduceSumβArgMaxβExpand) β Task 129
βββ solver_registry.py # ANALYTICAL_SOLVERS list + solve_task()
Run with: python -m neurogolf_solver.main [args]
Solver Pipeline
1. Analytical solvers (instant, zero/low cost, always arc-gen safe):
identity β constant β color_map β transpose β flip β rotate β
shift β tile β upscale β kronecker β nonuniform_scale β
mirror_h β mirror_v β quad_mirror β concat β concat_enhanced β
diagonal_tile β fixed_crop β spatial_gather β varshape_spatial_gather β
gravity_unrolled β edge_detect β mode_fill
2. Conv solvers (lstsq fitted, validated against arc-gen, PCR fallback):
conv_fixed β SliceβConvβArgMaxβEqual+CastβPad
conv_variable β Conv(30Γ30)βArgMaxβEqual+CastβMul(mask)
conv_diffshape β SliceβConvβSlice(crop)βArgMaxβEqual+CastβPad
conv_var_diff β Conv(30Γ30)βArgMaxβEqual+CastβMul(input_mask)
ONNX Building Rules (opset 17)
- All shapes must be static β no dynamic dimensions
- Max 1.44 MB per .onnx file β checked by Kaggle validator
- Slice(step=-1) for flip/rotate β zero MACs, replaces Gather for these transforms
- Gather (opset 1) for spatial remapping β used by concat, spatial_gather, mirrors, etc.
- NEVER use GatherElements (opset 11)
- Equal+Cast for one-hot β NEVER use OneHot (no CUDA kernel)
- Channel Gather for permutation color maps (0 MACs, score ~21 vs ~13 for Conv 1Γ1)
- Conv 1Γ1 for non-permutation color maps (has MACs but correct)
- ReduceSum with axes as tensor input (opset 13+ requirement)
- Pad with tensor-based
padsinput (opset 11+ requirement) - lstsq calls must be wrapped in
try/except (LinAlgError, ValueError)β SVD can fail to converge - ArgMax + Equal+Cast before Pad to ensure clean one-hot in padded region (gravity solver lesson)
Conv Fitting
Conv ceiling: ~25 tasks. Regularization (Ridge, PCA/SVD, skip-ks) all tested and rejected. Root cause: architecture mismatch β most unsolved tasks need non-local ops, not local conv patches.
Current fitting strategy (v5.1+):
- Composable primitives:
_build_patch_matrix+_solve_weights+_extract_weights - PCR fallback via
_solve_weights_pcr(deferred 2nd pass, 0 new solves but no regressions) - Kernel sizes: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]
- Try no-bias first, then bias
- lstsq wrapped in try/except for SVD non-convergence
- Validate against arc-gen BEFORE accepting β reject if fails
New Solver Architectures (v5.2)
gravity.py β Unrolled bubble-sort via Conv+Where
- 4 directions Γ 10 bg colors, max(IH,IW) steps
- Per step: 2Γ Conv(3Γ3 shift), 3Γ ReduceSum, 3Γ Greater, 2Γ And, 2Γ Where
- Final: ArgMax + Equal+Cast + Pad (clean one-hot)
- Cost: ~16M (10Γ10 grid), score ~8.4
- Validated: Task 78 (direction=up, bg=0)
edge.py β Laplacian conv boundary detection
- Conv 1Γ1 (channel collapse) β Conv 3Γ3 (Laplacian) β Abs β Greater β And β Where
- Cost: ~16K MACs, score ~15
- 0 matches currently β edge definition may be too strict
mode.py β Global majority color fill
- Slice β ReduceSum(axes=[2,3]) β ArgMax β Equal+Cast β Expand β Pad
- Cost: ~2K, score ~19.5
- Validated: Task 129
4. Performance
The lstsq conv solver is the speed bottleneck. Use --conv_budget to cap time per task (5s locally, 60s on Kaggle).
Do NOT try to GPU-accelerate lstsq. The bottleneck is algorithmic (O(nΒ³) SVD), not device.
5. Score Accounting (v5.2)
| Category | Tasks | Avg Score | Notes |
|---|---|---|---|
| Analytical | 24 | ~16 | identity, constant, color_map, transpose, flip, rotate, shift, tile, mirrors, etc. |
| Conv (lstsq) | 25 | ~10.5 | conv_fixed, conv_var, conv_diff, conv_var_diff |
| Gravity | 1 | 8.4 | Task 78 |
| Mode fill | 1 | 19.5 | Task 129 |
| Timing artifact | 1 | 8.2 | Task 61 (conv_var, only on slow hardware) |
| Unsolved | 348 | 1.0 | Minimum score |
| Total | 52/400 | ~710 solved + 348 = ~1058 est LB |
Path to 3000+
- β ARC-GEN validation (v4)
- β New analytical solvers (v4)
- β Opset 17 Slice-based transforms (v5)
- β lstsq crash fix + modular package (v5)
- β PCR fallback in conv (v5.1 β 0 new solves but clean code)
- β Gravity solver (v5.2 β Task 78)
- β Mode fill solver (v5.2 β Task 129)
- π² Phase 3 solvers: flood fill, composition, color LUT, CumSum β see TODO.md
- π² Phase 1a: Opset 17 conversions for existing analytical tasks (score optimization)
- π² Phase 4: ONNX optimizer, best-of-N selection
Blending is EXPLICITLY excluded β user's competitive philosophy.
6. Submission Checklist
Before submitting to Kaggle:
- All models validated against train + test + arc-gen (locally)
- All 400 tasks attempted (no exclusions)
- No GatherElements in any model
- No banned ops (Loop, Scan, NonZero, Unique, Script, Function)
- All tensor shapes are static
- Each .onnx file < 1.44 MB
- Local estimated score calculated and compared to expected LB
- A/B test: ran both old and new solver on same tasks, new solver scores higher
7. Files & Locations
| Location | Path | Notes |
|---|---|---|
| HF Repo | rogermt/neurogolf-solver |
All code + data |
| Solver package | neurogolf_solver/ |
v5.2 β 19 files, modular |
| Legacy monolith | neurogolf_solver.py |
v4, kept for reference β do not edit |
| Official utils | neurogolf_utils.py |
Kaggle scoring lib (needs onnx_tool) |
| ARC-GEN data | ARC-GEN-100K.zip |
400 files, 100K examples |
| Notebooks | neurogolf-2026-solver-notebooks.zip |
5 reference notebooks |
| Kaggle data | /kaggle/input/competitions/neurogolf-2026/ |
task JSONs with arc-gen |
| Roadmap | TODO.md |
Experiment queue with status key |
| Learning | LEARNING.md |
Knowledge accumulation β read before coding |
8. LEARNING.md Maintenance Rules
LEARNING.md is the knowledge accumulation file. Update it when:
- A bug is found and fixed β add to Mistakes Log with root cause
- A new approach is tried β record what worked, what didn't, and why
- Competition analysis reveals new insights β add to Competitive Intelligence
- Version milestones β update the Version History table
- Performance measurements β add concrete numbers
Structure: chronological within sections, newest entries first. Always include dates and version numbers.