Move own-solver/SKILL.md to own-solver/

022a14c verified 13 days ago

11.5 kB

name: neurogolf-solver
description: >-
  Build and improve an ONNX model generator for the NeuroGolf Championship
  (Kaggle). Produces 400 tiny ONNX models (opset 17, IR 8, input/output
  [1,10,30,30] one-hot float32) for ARC-AGI tasks. Scoring = max(1, 25 - ln(MACs
  + memory_bytes + params)). Lower cost = higher score. Use this skill whenever
  working on this competition, debugging submission failures, or starting a
  fresh session.

NeuroGolf Solver

Development Methodology: The Closed-Loop

Research → Design → Experiment → Analyze → Research → ...

Rule: Loop until we have a CONFIRMED increase in arc-gen validated score.

Phase	What	Exit Criteria
Research	Read papers, understand theory, find what works in similar regimes	Have a testable hypothesis with cited evidence
Design	Write MINIMAL code to test the hypothesis	Code is <200 lines, focused on ONE feature
Experiment	Run on representative task sample (≥20 tasks, or all 400 if cheap)	Full arc-gen validation completed
Analyze	Compare with/without feature. Measure: tasks solved, arc-gen survival, total score	Data shows >10% improvement in arc-gen survival rate OR total score
Research	If failed: why? Read more papers. If succeeded: can we combine with other wins?	Next hypothesis ready

Critical rules:

NEVER write >200 lines without running them first
NEVER claim a feature "works" until arc-gen validated on ≥20 tasks
NEVER upload code to repo that hasn't been validated
Theory from papers is NOT proof for our data — always test
If a feature shows no improvement after testing, DELETE it — don't leave dead code
Make surgical edits to individual files — NEVER rewrite the entire codebase in one shot

Quick Reference

Repo: rogermt/neurogolf-solver
Current version: v5.2 — 52 solved, ~710 score, est LB ~1058
Previous best on Kaggle: v4.3 — 50 arc-gen-validated tasks, est LB ~670
Kaggle runtime: 12 hours for submission
Target: 3000+ LB (our own solver, no blending)
Detailed history, mistakes, analysis: see LEARNING.md
Roadmap & experiment queue: see TODO.md

1. Competition Rules

Item	Value
Input/Output	`"input"`/`"output"` float32 `[1,10,30,30]` one-hot
Opset	17 (IR 8). Opset 10 also accepted on Kaggle
Max .onnx file size	1.44 MB per ONNX file (not submission zip)
Static shapes	All tensors and parameters must have statically-defined shapes
Banned ops	Loop, Scan, NonZero, Unique, Script, Function
Scoring	`max(1.0, 25.0 - ln(MACs + memory + params))` per task
Tasks	All 400 count. There are NO excluded tasks. Unsolved = 1.0 pt.
Validation	Models checked against train + test + arc-gen (ALL splits)
Submission	`submission.zip` with `task001.onnx`–`task400.onnx` + optional `submission.csv`

2. ARC-GEN Data — THE Critical Factor

A model that passes train+test but fails arc-gen scores ZERO on Kaggle.

Kaggle tasks at /kaggle/input/competitions/neurogolf-2026/taskNNN.json contain {"train":[], "test":[], "arc-gen":[]}
Up to 262 arc-gen examples per task (100K total)
Locally: ARC-GEN in ARC-GEN-100K/{hex_id}.json as list of {input, output} — merge into task data
Conv fitting: include arc-gen examples only when grid sizes match train/test (otherwise lstsq fails)
Validation: always check against arc-gen[:30] minimum

3. Architecture

Package Structure (v5.2)

neurogolf_solver/
├── constants.py          # Grid dims, opset, limits (NO excluded tasks)
├── config.py             # Runtime providers, opset factory
├── data_loader.py        # Task loading, one-hot, example extraction
├── validators.py         # Model validation against all splits
├── profiler.py           # Static cost profiler (onnx_tool fallback)
├── onnx_helpers.py       # Opset 17 builders: Slice, Pad, ReduceSum, mk()
├── gather_helpers.py     # Gather-based spatial remapping models
├── submission.py         # run_tasks (W&B logging), zip/csv generation
├── main.py               # Entry point with argparse
└── solvers/
    ├── analytical.py     # identity, constant, color_map, transpose
    ├── geometric.py      # flip, rotate, shift, crop, gravity (detect only)
    ├── tiling.py         # tile, upscale, mirror, concat, spatial_gather
    ├── conv.py           # lstsq conv (fixed, variable, diffshape, var_diff) + PCR fallback
    ├── gravity.py        # Unrolled bubble-sort gravity (Conv+Where, 4 dirs) — Task 78
    ├── edge.py           # Laplacian edge detection (0 matches currently)
    ├── mode.py           # Mode fill (ReduceSum→ArgMax→Expand) — Task 129
    └── solver_registry.py # ANALYTICAL_SOLVERS list + solve_task()

Run with: python -m neurogolf_solver.main [args]

Solver Pipeline

1. Analytical solvers (instant, zero/low cost, always arc-gen safe):
   identity → constant → color_map → transpose → flip → rotate →
   shift → tile → upscale → kronecker → nonuniform_scale →
   mirror_h → mirror_v → quad_mirror → concat → concat_enhanced →
   diagonal_tile → fixed_crop → spatial_gather → varshape_spatial_gather →
   gravity_unrolled → edge_detect → mode_fill

2. Conv solvers (lstsq fitted, validated against arc-gen, PCR fallback):
   conv_fixed     — Slice→Conv→ArgMax→Equal+Cast→Pad
   conv_variable  — Conv(30×30)→ArgMax→Equal+Cast→Mul(mask)
   conv_diffshape — Slice→Conv→Slice(crop)→ArgMax→Equal+Cast→Pad
   conv_var_diff  — Conv(30×30)→ArgMax→Equal+Cast→Mul(input_mask)

ONNX Building Rules (opset 17)

All shapes must be static — no dynamic dimensions
Max 1.44 MB per .onnx file — checked by Kaggle validator
Slice(step=-1) for flip/rotate — zero MACs, replaces Gather for these transforms
Gather (opset 1) for spatial remapping — used by concat, spatial_gather, mirrors, etc.
NEVER use GatherElements (opset 11)
Equal+Cast for one-hot — NEVER use OneHot (no CUDA kernel)
Channel Gather for permutation color maps (0 MACs, score ~21 vs ~13 for Conv 1×1)
Conv 1×1 for non-permutation color maps (has MACs but correct)
ReduceSum with axes as tensor input (opset 13+ requirement)
Pad with tensor-based pads input (opset 11+ requirement)
lstsq calls must be wrapped in try/except (LinAlgError, ValueError) — SVD can fail to converge
ArgMax + Equal+Cast before Pad to ensure clean one-hot in padded region (gravity solver lesson)

Conv Fitting

Conv ceiling: ~25 tasks. Regularization (Ridge, PCA/SVD, skip-ks) all tested and rejected. Root cause: architecture mismatch — most unsolved tasks need non-local ops, not local conv patches.

Current fitting strategy (v5.1+):

Composable primitives: _build_patch_matrix + _solve_weights + _extract_weights
PCR fallback via _solve_weights_pcr (deferred 2nd pass, 0 new solves but no regressions)
Kernel sizes: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]
Try no-bias first, then bias
lstsq wrapped in try/except for SVD non-convergence
Validate against arc-gen BEFORE accepting — reject if fails

New Solver Architectures (v5.2)

gravity.py — Unrolled bubble-sort via Conv+Where

4 directions × 10 bg colors, max(IH,IW) steps
Per step: 2× Conv(3×3 shift), 3× ReduceSum, 3× Greater, 2× And, 2× Where
Final: ArgMax + Equal+Cast + Pad (clean one-hot)
Cost: ~16M (10×10 grid), score ~8.4
Validated: Task 78 (direction=up, bg=0)

edge.py — Laplacian conv boundary detection

Conv 1×1 (channel collapse) → Conv 3×3 (Laplacian) → Abs → Greater → And → Where
Cost: ~16K MACs, score ~15
0 matches currently — edge definition may be too strict

mode.py — Global majority color fill

Slice → ReduceSum(axes=[2,3]) → ArgMax → Equal+Cast → Expand → Pad
Cost: ~2K, score ~19.5
Validated: Task 129

4. Performance

The lstsq conv solver is the speed bottleneck. Use --conv_budget to cap time per task (5s locally, 60s on Kaggle).

Do NOT try to GPU-accelerate lstsq. The bottleneck is algorithmic (O(n³) SVD), not device.

5. Score Accounting (v5.2)

Category	Tasks	Avg Score	Notes
Analytical	24	~16	identity, constant, color_map, transpose, flip, rotate, shift, tile, mirrors, etc.
Conv (lstsq)	25	~10.5	conv_fixed, conv_var, conv_diff, conv_var_diff
Gravity	1	8.4	Task 78
Mode fill	1	19.5	Task 129
Timing artifact	1	8.2	Task 61 (conv_var, only on slow hardware)
Unsolved	348	1.0	Minimum score
Total	52/400		~710 solved + 348 = ~1058 est LB

Path to 3000+

✅ ARC-GEN validation (v4)
✅ New analytical solvers (v4)
✅ Opset 17 Slice-based transforms (v5)
✅ lstsq crash fix + modular package (v5)
✅ PCR fallback in conv (v5.1 — 0 new solves but clean code)
✅ Gravity solver (v5.2 — Task 78)
✅ Mode fill solver (v5.2 — Task 129)
🔲 Phase 3 solvers: flood fill, composition, color LUT, CumSum — see TODO.md
🔲 Phase 1a: Opset 17 conversions for existing analytical tasks (score optimization)
🔲 Phase 4: ONNX optimizer, best-of-N selection

Blending is EXPLICITLY excluded — user's competitive philosophy.

6. Submission Checklist

Before submitting to Kaggle:

All models validated against train + test + arc-gen (locally)
All 400 tasks attempted (no exclusions)
No GatherElements in any model
No banned ops (Loop, Scan, NonZero, Unique, Script, Function)
All tensor shapes are static
Each .onnx file < 1.44 MB
Local estimated score calculated and compared to expected LB
A/B test: ran both old and new solver on same tasks, new solver scores higher

7. Files & Locations

Location	Path	Notes
HF Repo	`rogermt/neurogolf-solver`	All code + data
Solver package	`neurogolf_solver/`	v5.2 — 19 files, modular
Legacy monolith	`neurogolf_solver.py`	v4, kept for reference — do not edit
Official utils	`neurogolf_utils.py`	Kaggle scoring lib (needs onnx_tool)
ARC-GEN data	`ARC-GEN-100K.zip`	400 files, 100K examples
Notebooks	`neurogolf-2026-solver-notebooks.zip`	5 reference notebooks
Kaggle data	`/kaggle/input/competitions/neurogolf-2026/`	task JSONs with arc-gen
Roadmap	`TODO.md`	Experiment queue with status key
Learning	`LEARNING.md`	Knowledge accumulation — read before coding

8. LEARNING.md Maintenance Rules

LEARNING.md is the knowledge accumulation file. Update it when:

A bug is found and fixed — add to Mistakes Log with root cause
A new approach is tried — record what worked, what didn't, and why
Competition analysis reveals new insights — add to Competitive Intelligence
Version milestones — update the Version History table
Performance measurements — add concrete numbers

Structure: chronological within sections, newest entries first. Always include dates and version numbers.