rogermt
/

neurogolf-solver

Model card Files Files and versions

xet

Community

rogermt commited on 11 days ago

Commit

022a14c

verified ·

1 Parent(s): 872fabe

Move own-solver/SKILL.md to own-solver/

Browse files

Files changed (1) hide show

own-solver/SKILL.md +222 -0

own-solver/SKILL.md ADDED Viewed

	@@ -0,0 +1,222 @@

+---
+name: neurogolf-solver
+description: Build and improve an ONNX model generator for the NeuroGolf Championship (Kaggle). Produces 400 tiny ONNX models (opset 17, IR 8, input/output [1,10,30,30] one-hot float32) for ARC-AGI tasks. Scoring = max(1, 25 - ln(MACs + memory_bytes + params)). Lower cost = higher score. Use this skill whenever working on this competition, debugging submission failures, or starting a fresh session.
+---
+# NeuroGolf Solver
+## Development Methodology: The Closed-Loop
+```
+Research → Design → Experiment → Analyze → Research → ...
+```
+**Rule: Loop until we have a CONFIRMED increase in arc-gen validated score.**
+| Phase | What | Exit Criteria |
+|-------|------|---------------|
+| **Research** | Read papers, understand theory, find what works in similar regimes | Have a testable hypothesis with cited evidence |
+| **Design** | Write MINIMAL code to test the hypothesis | Code is <200 lines, focused on ONE feature |
+| **Experiment** | Run on representative task sample (≥20 tasks, or all 400 if cheap) | Full arc-gen validation completed |
+| **Analyze** | Compare with/without feature. Measure: tasks solved, arc-gen survival, total score | Data shows >10% improvement in arc-gen survival rate OR total score |
+| **Research** | If failed: why? Read more papers. If succeeded: can we combine with other wins? | Next hypothesis ready |
+**Critical rules:**
+- NEVER write >200 lines without running them first
+- NEVER claim a feature "works" until arc-gen validated on ≥20 tasks
+- NEVER upload code to repo that hasn't been validated
+- Theory from papers is NOT proof for our data — always test
+- If a feature shows no improvement after testing, DELETE it — don't leave dead code
+- Make surgical edits to individual files — NEVER rewrite the entire codebase in one shot
+## Quick Reference
+- **Repo**: `rogermt/neurogolf-solver`
+- **Current version**: v5.2 — 52 solved, ~710 score, est LB ~1058
+- **Previous best on Kaggle**: v4.3 — 50 arc-gen-validated tasks, est LB ~670
+- **Kaggle runtime**: 12 hours for submission
+- **Target**: 3000+ LB (our own solver, no blending)
+- **Detailed history, mistakes, analysis**: see `LEARNING.md`
+- **Roadmap & experiment queue**: see `TODO.md`
+## 1. Competition Rules
+| Item | Value |
+|------|-------|
+| Input/Output | `"input"`/`"output"` float32 `[1,10,30,30]` one-hot |
+| Opset | 17 (IR 8). Opset 10 also accepted on Kaggle |
+| **Max .onnx file size** | **1.44 MB per ONNX file** (not submission zip) |
+| Static shapes | **All tensors and parameters must have statically-defined shapes** |
+| Banned ops | **Loop, Scan, NonZero, Unique, Script, Function** |
+| Scoring | `max(1.0, 25.0 - ln(MACs + memory + params))` per task |
+| Tasks | **All 400 count. There are NO excluded tasks. Unsolved = 1.0 pt.** |
+| Validation | Models checked against **train + test + arc-gen** (ALL splits) |
+| Submission | `submission.zip` with `task001.onnx`–`task400.onnx` + optional `submission.csv` |
+## 2. ARC-GEN Data — THE Critical Factor
+**A model that passes train+test but fails arc-gen scores ZERO on Kaggle.**
+- Kaggle tasks at `/kaggle/input/competitions/neurogolf-2026/taskNNN.json` contain `{"train":[], "test":[], "arc-gen":[]}`
+- Up to 262 arc-gen examples per task (100K total)
+- Locally: ARC-GEN in `ARC-GEN-100K/{hex_id}.json` as list of `{input, output}` — merge into task data
+- Conv fitting: include arc-gen examples **only when grid sizes match** train/test (otherwise lstsq fails)
+- Validation: always check against `arc-gen[:30]` minimum
+## 3. Architecture
+### Package Structure (v5.2)
+```
+neurogolf_solver/
+├── constants.py          # Grid dims, opset, limits (NO excluded tasks)
+├── config.py             # Runtime providers, opset factory
+├── data_loader.py        # Task loading, one-hot, example extraction
+├── validators.py         # Model validation against all splits
+├── profiler.py           # Static cost profiler (onnx_tool fallback)
+├── onnx_helpers.py       # Opset 17 builders: Slice, Pad, ReduceSum, mk()
+├── gather_helpers.py     # Gather-based spatial remapping models
+├── submission.py         # run_tasks (W&B logging), zip/csv generation
+├── main.py               # Entry point with argparse
+└── solvers/
+    ├── analytical.py     # identity, constant, color_map, transpose
+    ├── geometric.py      # flip, rotate, shift, crop, gravity (detect only)
+    ├── tiling.py         # tile, upscale, mirror, concat, spatial_gather
+    ├── conv.py           # lstsq conv (fixed, variable, diffshape, var_diff) + PCR fallback
+    ├── gravity.py        # Unrolled bubble-sort gravity (Conv+Where, 4 dirs) — Task 78
+    ├── edge.py           # Laplacian edge detection (0 matches currently)
+    ├── mode.py           # Mode fill (ReduceSum→ArgMax→Expand) — Task 129
+    └── solver_registry.py # ANALYTICAL_SOLVERS list + solve_task()
+```
+Run with: `python -m neurogolf_solver.main [args]`
+### Solver Pipeline
+```
+1. Analytical solvers (instant, zero/low cost, always arc-gen safe):
+   identity → constant → color_map → transpose → flip → rotate →
+   shift → tile → upscale → kronecker → nonuniform_scale →
+   mirror_h → mirror_v → quad_mirror → concat → concat_enhanced →
+   diagonal_tile → fixed_crop → spatial_gather → varshape_spatial_gather →
+   gravity_unrolled → edge_detect → mode_fill
+2. Conv solvers (lstsq fitted, validated against arc-gen, PCR fallback):
+   conv_fixed     — Slice→Conv→ArgMax→Equal+Cast→Pad
+   conv_variable  — Conv(30×30)→ArgMax→Equal+Cast→Mul(mask)
+   conv_diffshape — Slice→Conv→Slice(crop)→ArgMax→Equal+Cast→Pad
+   conv_var_diff  — Conv(30×30)→ArgMax→Equal+Cast→Mul(input_mask)
+```
+### ONNX Building Rules (opset 17)
+- **All shapes must be static** — no dynamic dimensions
+- **Max 1.44 MB per .onnx file** — checked by Kaggle validator
+- **Slice(step=-1)** for flip/rotate — zero MACs, replaces Gather for these transforms
+- **Gather** (opset 1) for spatial remapping — used by concat, spatial_gather, mirrors, etc.
+- **NEVER** use GatherElements (opset 11)
+- **Equal+Cast** for one-hot — NEVER use OneHot (no CUDA kernel)
+- **Channel Gather** for permutation color maps (0 MACs, score ~21 vs ~13 for Conv 1×1)
+- **Conv 1×1** for non-permutation color maps (has MACs but correct)
+- **ReduceSum** with axes as **tensor input** (opset 13+ requirement)
+- **Pad** with tensor-based `pads` input (opset 11+ requirement)
+- **lstsq calls** must be wrapped in `try/except (LinAlgError, ValueError)` — SVD can fail to converge
+- **ArgMax + Equal+Cast** before Pad to ensure clean one-hot in padded region (gravity solver lesson)
+### Conv Fitting
+**Conv ceiling: ~25 tasks.** Regularization (Ridge, PCA/SVD, skip-ks) all tested and rejected.
+Root cause: architecture mismatch — most unsolved tasks need non-local ops, not local conv patches.
+**Current fitting strategy (v5.1+):**
+- Composable primitives: `_build_patch_matrix` + `_solve_weights` + `_extract_weights`
+- PCR fallback via `_solve_weights_pcr` (deferred 2nd pass, 0 new solves but no regressions)
+- Kernel sizes: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]
+- Try no-bias first, then bias
+- lstsq wrapped in try/except for SVD non-convergence
+- **Validate against arc-gen BEFORE accepting** — reject if fails
+### New Solver Architectures (v5.2)
+**gravity.py** — Unrolled bubble-sort via Conv+Where
+- 4 directions × 10 bg colors, max(IH,IW) steps
+- Per step: 2× Conv(3×3 shift), 3× ReduceSum, 3× Greater, 2× And, 2× Where
+- Final: ArgMax + Equal+Cast + Pad (clean one-hot)
+- Cost: ~16M (10×10 grid), score ~8.4
+- **Validated: Task 78 (direction=up, bg=0)**
+**edge.py** — Laplacian conv boundary detection
+- Conv 1×1 (channel collapse) → Conv 3×3 (Laplacian) → Abs → Greater → And → Where
+- Cost: ~16K MACs, score ~15
+- **0 matches currently** — edge definition may be too strict
+**mode.py** — Global majority color fill
+- Slice → ReduceSum(axes=[2,3]) → ArgMax → Equal+Cast → Expand → Pad
+- Cost: ~2K, score ~19.5
+- **Validated: Task 129**
+## 4. Performance
+**The lstsq conv solver is the speed bottleneck.** Use `--conv_budget` to cap time per task (5s locally, 60s on Kaggle).
+**Do NOT** try to GPU-accelerate lstsq. The bottleneck is algorithmic (O(n³) SVD), not device.
+## 5. Score Accounting (v5.2)
+| Category | Tasks | Avg Score | Notes |
+|----------|-------|-----------|-------|
+| Analytical | 24 | ~16 | identity, constant, color_map, transpose, flip, rotate, shift, tile, mirrors, etc. |
+| Conv (lstsq) | 25 | ~10.5 | conv_fixed, conv_var, conv_diff, conv_var_diff |
+| Gravity | 1 | 8.4 | Task 78 |
+| Mode fill | 1 | 19.5 | Task 129 |
+| Timing artifact | 1 | 8.2 | Task 61 (conv_var, only on slow hardware) |
+| **Unsolved** | **348** | **1.0** | Minimum score |
+| **Total** | **52/400** | | **~710 solved + 348 = ~1058 est LB** |
+### Path to 3000+
+1. ✅ ARC-GEN validation (v4)
+2. ✅ New analytical solvers (v4)
+3. ✅ Opset 17 Slice-based transforms (v5)
+4. ✅ lstsq crash fix + modular package (v5)
+5. ✅ PCR fallback in conv (v5.1 — 0 new solves but clean code)
+6. ✅ Gravity solver (v5.2 — Task 78)
+7. ✅ Mode fill solver (v5.2 — Task 129)
+8. 🔲 **Phase 3 solvers**: flood fill, composition, color LUT, CumSum — see TODO.md
+9. 🔲 **Phase 1a**: Opset 17 conversions for existing analytical tasks (score optimization)
+10. 🔲 **Phase 4**: ONNX optimizer, best-of-N selection
+**Blending is EXPLICITLY excluded** — user's competitive philosophy.
+## 6. Submission Checklist
+Before submitting to Kaggle:
+- [ ] All models validated against train + test + arc-gen (locally)
+- [ ] **All 400 tasks attempted** (no exclusions)
+- [ ] No GatherElements in any model
+- [ ] No banned ops (Loop, Scan, NonZero, Unique, Script, Function)
+- [ ] All tensor shapes are static
+- [ ] **Each .onnx file < 1.44 MB**
+- [ ] Local estimated score calculated and compared to expected LB
+- [ ] **A/B test**: ran both old and new solver on same tasks, new solver scores higher
+## 7. Files & Locations
+| Location | Path | Notes |
+|----------|------|-------|
+| HF Repo | `rogermt/neurogolf-solver` | All code + data |
+| **Solver package** | `neurogolf_solver/` | **v5.2 — 19 files, modular** |
+| Legacy monolith | `neurogolf_solver.py` | v4, kept for reference — do not edit |
+| Official utils | `neurogolf_utils.py` | Kaggle scoring lib (needs onnx_tool) |
+| ARC-GEN data | `ARC-GEN-100K.zip` | 400 files, 100K examples |
+| Notebooks | `neurogolf-2026-solver-notebooks.zip` | 5 reference notebooks |
+| Kaggle data | `/kaggle/input/competitions/neurogolf-2026/` | task JSONs with arc-gen |
+| Roadmap | `TODO.md` | Experiment queue with status key |
+| Learning | `LEARNING.md` | Knowledge accumulation — read before coding |
+## 8. LEARNING.md Maintenance Rules
+`LEARNING.md` is the knowledge accumulation file. Update it when:
+- A bug is found and fixed — add to Mistakes Log with root cause
+- A new approach is tried — record what worked, what didn't, and why
+- Competition analysis reveals new insights — add to Competitive Intelligence
+- Version milestones — update the Version History table
+- Performance measurements — add concrete numbers
+Structure: chronological within sections, newest entries first. Always include dates and version numbers.