rogermt
/

neurogolf-solver

Model card Files Files and versions

xet

Community

rogermt commited on 14 days ago

Commit

6941e70

verified ·

1 Parent(s): 863483e

v4.3: Update SKILL.md with closed-loop methodology, development rules, updated status

Browse files

Files changed (1) hide show

SKILL.md +68 -19

SKILL.md CHANGED Viewed

@@ -1,24 +1,49 @@
 ---
 name: neurogolf-solver
-description: Build and improve an ONNX model generator for the NeuroGolf Championship (Kaggle). Produces 400 tiny ONNX models (opset 10, IR 10, input/output [1,10,30,30] one-hot float32) for ARC-AGI tasks. Scoring = max(1, 25 - ln(MACs + memory_bytes + params)). Lower cost = higher score. Use this skill whenever working on this competition, debugging submission failures, or starting a fresh session.
 ---
 # NeuroGolf Solver
 ## Quick Reference
 - **Repo**: `rogermt/neurogolf-solver`
-- **Current version**: v4.1 — 50 arc-gen-validated tasks, est LB ~670
 - **Kaggle runtime**: 12 hours for submission
-- **Target**: 4800+ LB (first page)
 - **Detailed history, mistakes, analysis**: see `LEARNING.md`
 ## 1. Competition Rules
 | Item | Value |
 |------|-------|
 | Input/Output | `"input"`/`"output"` float32 `[1,10,30,30]` one-hot |
-| Opset | 10 (IR 10). Opset 17 also works on Kaggle |
 | Max file size | 1.44 MB per model |
 | Banned ops | Loop, Scan, NonZero, Unique, Script, Function |
 | Scoring | `max(1.0, 25.0 - ln(MACs + memory + params))` per task |
@@ -59,24 +84,43 @@ description: Build and improve an ONNX model generator for the NeuroGolf Champio
 - **Channel Gather** for permutation color maps (0 MACs, score ~21 vs ~13 for Conv 1×1)
 - **Conv 1×1** for non-permutation color maps (has MACs but correct)
 - **ReduceSum(input, axes=[1])** for variable-shape mask
-### Conv Fitting Strategy
 - lstsq on train+test (+arc-gen when same grid size, capped at 10 examples)
 - Kernel sizes: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]
 - Try no-bias first, then bias
 - **Validate against arc-gen BEFORE accepting** — reject if fails
-- Bottleneck is algorithmic (O(n³) SVD), NOT device — GPU/CuPy doesn't help, just crashes
-## 4. Performance Bottleneck
-**The lstsq conv solver is the speed bottleneck.** For ks=29 on 21×21 grids with 16 examples: 7056×8410 matrix SVD. This is pure math cost — moving to GPU (CuPy) doesn't help because:
-1. Same O(n³) algorithmic cost
-2. GPU memory fills up (~1GB for large matrices) and crashes
-3. Falls back to CPU anyway after CUDA error
-**Do NOT** try to GPU-accelerate lstsq. Use `--conv_budget` to cap time per task (10-20s locally, 60s on Kaggle's 12hr runtime). The real win is more analytical solvers, not faster conv.
-## 5. Score Accounting (v4.1)
 | Category | Tasks | Avg Score | Total |
 |----------|-------|-----------|-------|
@@ -85,14 +129,16 @@ description: Build and improve an ONNX model generator for the NeuroGolf Champio
 | Unsolved | 344 | 1.0 | 344 |
 | **Estimated LB** | | | **~670** |
-### Path to 4800+
 1. ✅ ARC-GEN validation (fixed: +155 pts by eliminating 0-scoring models)
 2. ✅ New analytical solvers: shift, mirror, crop, quad_mirror (+8 tasks)
 3. ✅ Color map Gather for permutations (+15 pts)
-4. 🔲 PyTorch multi-layer conv with ternary snap (est +20-50 tasks)
-5. 🔲 Channel reduction (fewer colors → cheaper models)
-6. 🔲 Composition detectors: rot+color, flip+color, transpose+color
-7. 🔲 Blend with other notebooks on Kaggle (the meta-strategy for 4000+)
 ## 6. Submission Checklist
@@ -104,18 +150,21 @@ Before submitting to Kaggle:
 - [ ] Each .onnx < 1.44 MB, submission.zip < 1.44 MB
 - [ ] submission.csv generated
 - [ ] Local estimated score calculated and compared to expected LB
 ## 7. Files & Locations
 | Location | Path | Notes |
 |----------|------|-------|
 | HF Repo | `rogermt/neurogolf-solver` | All code + data |
-| Solver | `neurogolf_solver.py` | v4.1, 1270 lines |
 | Official utils | `neurogolf_utils.py` | Kaggle scoring lib (needs onnx_tool) |
 | ARC-GEN data | `ARC-GEN-100K.zip` | 400 files, 100K examples |
 | Notebooks | `neurogolf-2026-solver-notebooks.zip` | 5 reference notebooks |
 | Kaggle data | `/kaggle/input/competitions/neurogolf-2026/` | task JSONs with arc-gen |
 | Local ARC data | `ARC-AGI/data/training/` | 400 hex-named JSONs |
 ## 8. LEARNING.md Maintenance Rules

 ---
 name: neurogolf-solver
+description: Build and improve an ONNX model generator for the NeuroGolf Championship (Kaggle). Produces 400 tiny ONNX models (opset 10/17, IR 10, input/output [1,10,30,30] one-hot float32) for ARC-AGI tasks. Scoring = max(1, 25 - ln(MACs + memory_bytes + params)). Lower cost = higher score. Use this skill whenever working on this competition, debugging submission failures, or starting a fresh session.
 ---
 # NeuroGolf Solver
+## Development Methodology: The Closed-Loop
+```
+Research → Design → Experiment → Analyze → Research → ...
+```
+**Rule: Loop until we have a CONFIRMED increase in arc-gen validated score.**
+| Phase | What | Exit Criteria |
+|-------|------|---------------|
+| **Research** | Read papers, understand theory, find what works in similar regimes | Have a testable hypothesis with cited evidence |
+| **Design** | Write MINIMAL code to test the hypothesis | Code is <200 lines, focused on ONE feature |
+| **Experiment** | Run on representative task sample (≥20 tasks, or all 400 if cheap) | Full arc-gen validation completed |
+| **Analyze** | Compare with/without feature. Measure: tasks solved, arc-gen survival, total score | Data shows >10% improvement in arc-gen survival rate OR total score |
+| **Research** | If failed: why? Read more papers. If succeeded: can we combine with other wins? | Next hypothesis ready |
+**Critical rules:**
+- NEVER write >200 lines without running them first
+- NEVER claim a feature "works" until arc-gen validated on ≥20 tasks
+- NEVER upload code to repo that hasn't been validated
+- NEVER overwrite neurogolf_solver.py with unvalidated code
+- Theory from papers is NOT proof for our data — always test
+- If a feature shows no improvement after testing, DELETE it — don't leave dead code
 ## Quick Reference
 - **Repo**: `rogermt/neurogolf-solver`
+- **Current version**: v4.3 — 50 arc-gen-validated tasks, est LB ~670
 - **Kaggle runtime**: 12 hours for submission
+- **Target**: 3000+ LB (our own solver, no blending)
 - **Detailed history, mistakes, analysis**: see `LEARNING.md`
+- **Roadmap & experiment queue**: see `TODO.md`
 ## 1. Competition Rules
 | Item | Value |
 |------|-------|
 | Input/Output | `"input"`/`"output"` float32 `[1,10,30,30]` one-hot |
+| Opset | 10 (IR 10). **Opeset 17 also works on Kaggle** |
 | Max file size | 1.44 MB per model |
 | Banned ops | Loop, Scan, NonZero, Unique, Script, Function |
 | Scoring | `max(1.0, 25.0 - ln(MACs + memory + params))` per task |
 - **Channel Gather** for permutation color maps (0 MACs, score ~21 vs ~13 for Conv 1×1)
 - **Conv 1×1** for non-permutation color maps (has MACs but correct)
 - **ReduceSum(input, axes=[1])** for variable-shape mask
+- **Pad** (opset 17): use tensor-based `pads` input, NOT attribute-based (opset 10 style)
+### Conv Fitting — THE #1 BLOCKER
+**We solve 307 locally but only 50 survive arc-gen. This is CATASTROPHIC overfitting, not a hyperparameter problem.**
+- Patch matrix P has n rows (patches) and p columns (10×ks² features)
+- For ks=7 on 7×7 grid: n≈196, p=490 → underdetermined → min-norm among infinite fits → overfits
+- For ks=7 on 21×21 grid: n≈7056, p=490 → determined, but arc-gen still fails
+- **Root cause**: LOW effective rank of patch covariance (~10-40) due to few active colors → noise concentrates in low-rank directions
+- **Double descent**: ks=5,7,9 are at/near interpolation threshold where test error PEAKS
+**Current fitting strategy (v4.2):**
 - lstsq on train+test (+arc-gen when same grid size, capped at 10 examples)
 - Kernel sizes: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]
 - Try no-bias first, then bias
 - **Validate against arc-gen BEFORE accepting** — reject if fails
+**What does NOT help lstsq overfitting:**
+- ❌ Ridge/LOOCV λ tuning — theory predicts failure for low effective rank (Bartlett et al., arXiv:2306.13185)
+- ❌ More arc-gen examples in lstsq — adding constraints to underdetermined system doesn't fix wrong model
+- ❌ GPU/CuPy for lstsq — same O(n³) cost, crashes on memory
+**What MIGHT help (evidence-backed, needs testing):**
+- 🔲 Skip ks=5,7,9 — avoid interpolation threshold (double descent peak)
+- 🔲 PCA dimensionality reduction — project to top-20 components, ensure p_reduced << n
+- 🔲 Lasso (ℓ₁) instead of lstsq — matches sparse signal structure (arXiv:2302.00257)
+- 🔲 Gradient descent with early stopping — implicit regularization, don't interpolate
+- 🔲 PyTorch conv trained on arc-gen data — needs GPU, multi-seed, ternary snap
+## 4. Performance
+**The lstsq conv solver is the speed bottleneck.** For ks=29 on 21×21 grids with 16 examples: 7056×8410 matrix SVD. This is pure math cost — moving to GPU (CuPy) doesn't help.
+**Do NOT** try to GPU-accelerate lstsq. Use `--conv_budget` to cap time per task (10-20s locally, 60s on Kaggle's 12hr runtime). The real win is more analytical solvers + fixing arc-gen survival, not faster conv.
+## 5. Score Accounting (v4.2)
 | Category | Tasks | Avg Score | Total |
 |----------|-------|-----------|-------|
 | Unsolved | 344 | 1.0 | 344 |
 | **Estimated LB** | | | **~670** |
+### Path to 3000+
 1. ✅ ARC-GEN validation (fixed: +155 pts by eliminating 0-scoring models)
 2. ✅ New analytical solvers: shift, mirror, crop, quad_mirror (+8 tasks)
 3. ✅ Color map Gather for permutations (+15 pts)
+4. 🔲 **Phase 1: Cheap wins** — opset 17 transforms, channel reduction, composition detectors
+5. 🔲 **Phase 2: Fix arc-gen survival** — PCA, Lasso, skip bad ks, GD with early stopping
+6. 🔲 **Phase 3: Hard tasks** — hash matchers, run-length detectors, LLM rescue
+7. 🔲 **Phase 4: Score optimization** — ONNX optimizer, best-of-N selection
+**Blending with public datasets is EXPLICITLY excluded** — user's competitive philosophy. See LEARNING.md "What Others Do" for market intelligence only.
 ## 6. Submission Checklist
 - [ ] Each .onnx < 1.44 MB, submission.zip < 1.44 MB
 - [ ] submission.csv generated
 - [ ] Local estimated score calculated and compared to expected LB
+- [ ] **A/B test**: ran both old and new solver on same tasks, new solver scores higher
 ## 7. Files & Locations
 | Location | Path | Notes |
 |----------|------|-------|
 | HF Repo | `rogermt/neurogolf-solver` | All code + data |
+| Solver | `neurogolf_solver.py` | v4.2 (repo has unvalidated v5 code at 1919 lines — needs revert or validation) |
 | Official utils | `neurogolf_utils.py` | Kaggle scoring lib (needs onnx_tool) |
 | ARC-GEN data | `ARC-GEN-100K.zip` | 400 files, 100K examples |
 | Notebooks | `neurogolf-2026-solver-notebooks.zip` | 5 reference notebooks |
 | Kaggle data | `/kaggle/input/competitions/neurogolf-2026/` | task JSONs with arc-gen |
 | Local ARC data | `ARC-AGI/data/training/` | 400 hex-named JSONs |
+| Roadmap | `TODO.md` | Experiment queue with status key |
+| Learning | `LEARNING.md` | Knowledge accumulation — read before coding |
 ## 8. LEARNING.md Maintenance Rules