rogermt's picture
Move own-solver/SKILL.md to own-solver/
022a14c verified
metadata
name: neurogolf-solver
description: >-
  Build and improve an ONNX model generator for the NeuroGolf Championship
  (Kaggle). Produces 400 tiny ONNX models (opset 17, IR 8, input/output
  [1,10,30,30] one-hot float32) for ARC-AGI tasks. Scoring = max(1, 25 - ln(MACs
  + memory_bytes + params)). Lower cost = higher score. Use this skill whenever
  working on this competition, debugging submission failures, or starting a
  fresh session.

NeuroGolf Solver

Development Methodology: The Closed-Loop

Research β†’ Design β†’ Experiment β†’ Analyze β†’ Research β†’ ...

Rule: Loop until we have a CONFIRMED increase in arc-gen validated score.

Phase What Exit Criteria
Research Read papers, understand theory, find what works in similar regimes Have a testable hypothesis with cited evidence
Design Write MINIMAL code to test the hypothesis Code is <200 lines, focused on ONE feature
Experiment Run on representative task sample (β‰₯20 tasks, or all 400 if cheap) Full arc-gen validation completed
Analyze Compare with/without feature. Measure: tasks solved, arc-gen survival, total score Data shows >10% improvement in arc-gen survival rate OR total score
Research If failed: why? Read more papers. If succeeded: can we combine with other wins? Next hypothesis ready

Critical rules:

  • NEVER write >200 lines without running them first
  • NEVER claim a feature "works" until arc-gen validated on β‰₯20 tasks
  • NEVER upload code to repo that hasn't been validated
  • Theory from papers is NOT proof for our data β€” always test
  • If a feature shows no improvement after testing, DELETE it β€” don't leave dead code
  • Make surgical edits to individual files β€” NEVER rewrite the entire codebase in one shot

Quick Reference

  • Repo: rogermt/neurogolf-solver
  • Current version: v5.2 β€” 52 solved, ~710 score, est LB ~1058
  • Previous best on Kaggle: v4.3 β€” 50 arc-gen-validated tasks, est LB ~670
  • Kaggle runtime: 12 hours for submission
  • Target: 3000+ LB (our own solver, no blending)
  • Detailed history, mistakes, analysis: see LEARNING.md
  • Roadmap & experiment queue: see TODO.md

1. Competition Rules

Item Value
Input/Output "input"/"output" float32 [1,10,30,30] one-hot
Opset 17 (IR 8). Opset 10 also accepted on Kaggle
Max .onnx file size 1.44 MB per ONNX file (not submission zip)
Static shapes All tensors and parameters must have statically-defined shapes
Banned ops Loop, Scan, NonZero, Unique, Script, Function
Scoring max(1.0, 25.0 - ln(MACs + memory + params)) per task
Tasks All 400 count. There are NO excluded tasks. Unsolved = 1.0 pt.
Validation Models checked against train + test + arc-gen (ALL splits)
Submission submission.zip with task001.onnx–task400.onnx + optional submission.csv

2. ARC-GEN Data β€” THE Critical Factor

A model that passes train+test but fails arc-gen scores ZERO on Kaggle.

  • Kaggle tasks at /kaggle/input/competitions/neurogolf-2026/taskNNN.json contain {"train":[], "test":[], "arc-gen":[]}
  • Up to 262 arc-gen examples per task (100K total)
  • Locally: ARC-GEN in ARC-GEN-100K/{hex_id}.json as list of {input, output} β€” merge into task data
  • Conv fitting: include arc-gen examples only when grid sizes match train/test (otherwise lstsq fails)
  • Validation: always check against arc-gen[:30] minimum

3. Architecture

Package Structure (v5.2)

neurogolf_solver/
β”œβ”€β”€ constants.py          # Grid dims, opset, limits (NO excluded tasks)
β”œβ”€β”€ config.py             # Runtime providers, opset factory
β”œβ”€β”€ data_loader.py        # Task loading, one-hot, example extraction
β”œβ”€β”€ validators.py         # Model validation against all splits
β”œβ”€β”€ profiler.py           # Static cost profiler (onnx_tool fallback)
β”œβ”€β”€ onnx_helpers.py       # Opset 17 builders: Slice, Pad, ReduceSum, mk()
β”œβ”€β”€ gather_helpers.py     # Gather-based spatial remapping models
β”œβ”€β”€ submission.py         # run_tasks (W&B logging), zip/csv generation
β”œβ”€β”€ main.py               # Entry point with argparse
└── solvers/
    β”œβ”€β”€ analytical.py     # identity, constant, color_map, transpose
    β”œβ”€β”€ geometric.py      # flip, rotate, shift, crop, gravity (detect only)
    β”œβ”€β”€ tiling.py         # tile, upscale, mirror, concat, spatial_gather
    β”œβ”€β”€ conv.py           # lstsq conv (fixed, variable, diffshape, var_diff) + PCR fallback
    β”œβ”€β”€ gravity.py        # Unrolled bubble-sort gravity (Conv+Where, 4 dirs) β€” Task 78
    β”œβ”€β”€ edge.py           # Laplacian edge detection (0 matches currently)
    β”œβ”€β”€ mode.py           # Mode fill (ReduceSumβ†’ArgMaxβ†’Expand) β€” Task 129
    └── solver_registry.py # ANALYTICAL_SOLVERS list + solve_task()

Run with: python -m neurogolf_solver.main [args]

Solver Pipeline

1. Analytical solvers (instant, zero/low cost, always arc-gen safe):
   identity β†’ constant β†’ color_map β†’ transpose β†’ flip β†’ rotate β†’
   shift β†’ tile β†’ upscale β†’ kronecker β†’ nonuniform_scale β†’
   mirror_h β†’ mirror_v β†’ quad_mirror β†’ concat β†’ concat_enhanced β†’
   diagonal_tile β†’ fixed_crop β†’ spatial_gather β†’ varshape_spatial_gather β†’
   gravity_unrolled β†’ edge_detect β†’ mode_fill

2. Conv solvers (lstsq fitted, validated against arc-gen, PCR fallback):
   conv_fixed     — Slice→Conv→ArgMax→Equal+Cast→Pad
   conv_variable  — Conv(30×30)→ArgMax→Equal+Cast→Mul(mask)
   conv_diffshape — Slice→Conv→Slice(crop)→ArgMax→Equal+Cast→Pad
   conv_var_diff  — Conv(30×30)→ArgMax→Equal+Cast→Mul(input_mask)

ONNX Building Rules (opset 17)

  • All shapes must be static β€” no dynamic dimensions
  • Max 1.44 MB per .onnx file β€” checked by Kaggle validator
  • Slice(step=-1) for flip/rotate β€” zero MACs, replaces Gather for these transforms
  • Gather (opset 1) for spatial remapping β€” used by concat, spatial_gather, mirrors, etc.
  • NEVER use GatherElements (opset 11)
  • Equal+Cast for one-hot β€” NEVER use OneHot (no CUDA kernel)
  • Channel Gather for permutation color maps (0 MACs, score ~21 vs ~13 for Conv 1Γ—1)
  • Conv 1Γ—1 for non-permutation color maps (has MACs but correct)
  • ReduceSum with axes as tensor input (opset 13+ requirement)
  • Pad with tensor-based pads input (opset 11+ requirement)
  • lstsq calls must be wrapped in try/except (LinAlgError, ValueError) β€” SVD can fail to converge
  • ArgMax + Equal+Cast before Pad to ensure clean one-hot in padded region (gravity solver lesson)

Conv Fitting

Conv ceiling: ~25 tasks. Regularization (Ridge, PCA/SVD, skip-ks) all tested and rejected. Root cause: architecture mismatch β€” most unsolved tasks need non-local ops, not local conv patches.

Current fitting strategy (v5.1+):

  • Composable primitives: _build_patch_matrix + _solve_weights + _extract_weights
  • PCR fallback via _solve_weights_pcr (deferred 2nd pass, 0 new solves but no regressions)
  • Kernel sizes: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]
  • Try no-bias first, then bias
  • lstsq wrapped in try/except for SVD non-convergence
  • Validate against arc-gen BEFORE accepting β€” reject if fails

New Solver Architectures (v5.2)

gravity.py β€” Unrolled bubble-sort via Conv+Where

  • 4 directions Γ— 10 bg colors, max(IH,IW) steps
  • Per step: 2Γ— Conv(3Γ—3 shift), 3Γ— ReduceSum, 3Γ— Greater, 2Γ— And, 2Γ— Where
  • Final: ArgMax + Equal+Cast + Pad (clean one-hot)
  • Cost: ~16M (10Γ—10 grid), score ~8.4
  • Validated: Task 78 (direction=up, bg=0)

edge.py β€” Laplacian conv boundary detection

  • Conv 1Γ—1 (channel collapse) β†’ Conv 3Γ—3 (Laplacian) β†’ Abs β†’ Greater β†’ And β†’ Where
  • Cost: ~16K MACs, score ~15
  • 0 matches currently β€” edge definition may be too strict

mode.py β€” Global majority color fill

  • Slice β†’ ReduceSum(axes=[2,3]) β†’ ArgMax β†’ Equal+Cast β†’ Expand β†’ Pad
  • Cost: ~2K, score ~19.5
  • Validated: Task 129

4. Performance

The lstsq conv solver is the speed bottleneck. Use --conv_budget to cap time per task (5s locally, 60s on Kaggle).

Do NOT try to GPU-accelerate lstsq. The bottleneck is algorithmic (O(nΒ³) SVD), not device.

5. Score Accounting (v5.2)

Category Tasks Avg Score Notes
Analytical 24 ~16 identity, constant, color_map, transpose, flip, rotate, shift, tile, mirrors, etc.
Conv (lstsq) 25 ~10.5 conv_fixed, conv_var, conv_diff, conv_var_diff
Gravity 1 8.4 Task 78
Mode fill 1 19.5 Task 129
Timing artifact 1 8.2 Task 61 (conv_var, only on slow hardware)
Unsolved 348 1.0 Minimum score
Total 52/400 ~710 solved + 348 = ~1058 est LB

Path to 3000+

  1. βœ… ARC-GEN validation (v4)
  2. βœ… New analytical solvers (v4)
  3. βœ… Opset 17 Slice-based transforms (v5)
  4. βœ… lstsq crash fix + modular package (v5)
  5. βœ… PCR fallback in conv (v5.1 β€” 0 new solves but clean code)
  6. βœ… Gravity solver (v5.2 β€” Task 78)
  7. βœ… Mode fill solver (v5.2 β€” Task 129)
  8. πŸ”² Phase 3 solvers: flood fill, composition, color LUT, CumSum β€” see TODO.md
  9. πŸ”² Phase 1a: Opset 17 conversions for existing analytical tasks (score optimization)
  10. πŸ”² Phase 4: ONNX optimizer, best-of-N selection

Blending is EXPLICITLY excluded β€” user's competitive philosophy.

6. Submission Checklist

Before submitting to Kaggle:

  • All models validated against train + test + arc-gen (locally)
  • All 400 tasks attempted (no exclusions)
  • No GatherElements in any model
  • No banned ops (Loop, Scan, NonZero, Unique, Script, Function)
  • All tensor shapes are static
  • Each .onnx file < 1.44 MB
  • Local estimated score calculated and compared to expected LB
  • A/B test: ran both old and new solver on same tasks, new solver scores higher

7. Files & Locations

Location Path Notes
HF Repo rogermt/neurogolf-solver All code + data
Solver package neurogolf_solver/ v5.2 β€” 19 files, modular
Legacy monolith neurogolf_solver.py v4, kept for reference β€” do not edit
Official utils neurogolf_utils.py Kaggle scoring lib (needs onnx_tool)
ARC-GEN data ARC-GEN-100K.zip 400 files, 100K examples
Notebooks neurogolf-2026-solver-notebooks.zip 5 reference notebooks
Kaggle data /kaggle/input/competitions/neurogolf-2026/ task JSONs with arc-gen
Roadmap TODO.md Experiment queue with status key
Learning LEARNING.md Knowledge accumulation β€” read before coding

8. LEARNING.md Maintenance Rules

LEARNING.md is the knowledge accumulation file. Update it when:

  • A bug is found and fixed β€” add to Mistakes Log with root cause
  • A new approach is tried β€” record what worked, what didn't, and why
  • Competition analysis reveals new insights β€” add to Competitive Intelligence
  • Version milestones β€” update the Version History table
  • Performance measurements β€” add concrete numbers

Structure: chronological within sections, newest entries first. Always include dates and version numbers.