rogermt
/

neurogolf-solver

Model card Files Files and versions

xet

Community

rogermt commited on 17 days ago

Commit

c228c49

verified ·

1 Parent(s): feaf007

Add SKILL.md - complete knowledge base for NeuroGolf solver

Browse files

Files changed (1) hide show

SKILL.md +312 -0

SKILL.md ADDED Viewed

	@@ -0,0 +1,312 @@

+---
+name: neurogolf-solver
+description: Build and improve an ONNX model generator for the NeuroGolf Championship (Kaggle). Produces 400 tiny ONNX models (opset 10, IR 10, input/output [1,10,30,30] one-hot float32) for ARC-AGI tasks. Scoring = max(1, 25 - ln(MACs + memory_bytes + params)). Lower cost = higher score. Use this skill whenever working on this competition, debugging submission failures, or starting a fresh session.
+---
+# NeuroGolf Solver — Complete Knowledge Base
+## 1. Competition Format
+### What is NeuroGolf?
+IJCAI-ECAI 2026 NeuroGolf Challenge on Kaggle. You build 400 tiny ONNX neural networks, one per ARC-AGI task. Each network transforms a one-hot encoded grid to another grid. Scoring rewards small, efficient networks.
+### ONNX Model Spec
+- **Input**: `"input"` float32 `[1, 10, 30, 30]` — one-hot encoded grid (10 color channels, 30×30 spatial)
+- **Output**: `"output"` float32 `[1, 10, 30, 30]` — same format
+- **Opset**: 10, IR version: 10 (but opset 17 ALSO works on Kaggle — see §3)
+- **Max file size**: 1.44 MB per model (floppy disk limit)
+- **Banned ops**: Loop, Scan, NonZero, Unique, Script, Function
+### Scoring Formula
+```
+score_per_task = max(1.0, 25.0 - ln(MACs + memory_bytes + params))
+total_score = sum(score_per_task for all 400 tasks)
+```
+- Unsolved tasks score 1.0 (not 0!)
+- Max possible per task: 25.0 (cost=0, e.g. Identity)
+- **Excluded tasks**: {21, 55, 80, 184, 202, 366} — officially excluded, score 0 regardless
+### Submission Format
+- `submission.zip` containing `task001.onnx` through `task400.onnx`
+- Models must pass validation against ALL examples: **train + test + arc-gen**
+- Optional: `submission.csv` with columns `task_id, total_cost`
+### ARC-GEN Data (CRITICAL)
+On Kaggle, each task JSON at `/kaggle/input/competitions/neurogolf-2026/taskNNN.json` contains:
+```json
+{"train": [...], "test": [...], "arc-gen": [...]}
+```
+The `arc-gen` key has **up to 262 additional examples per task** (100K total across 400 tasks) generated by Google's ARC-GEN system. **Models are validated against ALL splits including arc-gen.** A model that passes train+test but fails arc-gen scores ZERO on Kaggle.
+Locally, ARC-GEN data is in separate files at `ARC-GEN-100K/{hex_id}.json` as a list of `{input, output}` dicts. Must be merged with the ARC-AGI task data.
+## 2. Current State (v3 → v4 in progress)
+### v3 Results: 307/400 solved locally, LB score ~501 (NOT ~3267)
+The massive gap (3267 local vs 501 LB) means **most of our conv models fail ARC-GEN validation on Kaggle**. The conv is fitted on ~6 train+test examples but must generalize to ~250 arc-gen examples of varying sizes. Many don't.
+### Solver Breakdown (v3)
+```
+conv_var: 125, conv_fixed: 107, conv_diff: 39, spatial_gather: 16,
+concat: 5, color_map: 4, concat_enhanced: 4, rotate: 3,
+transpose: 2, upscale: 1, varshape_spatial_gather: 1
+```
+### Repository
+- HF: `rogermt/neurogolf-solver`
+- Files: `neurogolf_solver.py`, `neurogolf_utils.py` (official Kaggle utils), `ARC-GEN-100K.zip`, `neurogolf-2026-solver-notebooks.zip`
+## 3. Key Differences: Our Solver vs High-Scoring Notebooks
+### The 4200-point notebook (`neurogolf-2026-tiny-onnx-solver`)
+This is a **BLEND notebook** — it does NOT solve tasks from scratch. It:
+1. **Phase 1**: Loads 12+ other notebooks' `submission.zip` files as inputs
+2. For each task, picks the cheapest valid model across all sources
+3. **Phase 2**: Tries loose ONNX files from dataset inputs
+4. **Phase 3**: Runs its own solver only on remaining unsolved tasks
+5. Validates EVERYTHING against train+test+arc-gen before including
+6. Result: 338/400 solved, est. score 4197.5
+**Critical insight**: The 4200 score comes from BLENDING many solutions, not from a single solver. The solver itself only adds 0 new tasks in Phase 3. All 338 come from other notebooks' pre-built models.
+### The championship notebook (`the-2026-neurogolf-championship`)
+Also a blend but with its own solver. Key differences from ours:
+- Uses **opset 17** (not 10!) — works fine on Kaggle
+- Has **shift detector**, **gravity detector**, **mirror detectors**, **fixed crop detector**, **outline detector**
+- Has **composition detectors**: rotation+color, transpose+color, flip+color
+- Has **channel reduction**: reduces 10→N channels for fewer colors → cheaper models
+- Uses **PyTorch learned conv**: multi-seed Adam training, ternary weight snapping
+- Uses **two-layer conv**: Conv→ReLU→Conv for complex patterns
+- Validates against `train + arc-gen[:30]` (capped at 30 arc-gen examples)
+- Result: 288 from own solver + more from blended inputs
+### What they have that we don't
+| Feature | Them | Us |
+|---------|------|-----|
+| ARC-GEN validation | ✅ validate against arc-gen | ❌ v3 ignores arc-gen |
+| ARC-GEN in fitting | ✅ uses arc-gen[:3] in detectors | ❌ fits only train+test |
+| Opset 17 | ✅ uses freely | ❌ stuck on opset 10 |
+| Shift detector | ✅ | ❌ |
+| Gravity detector | ✅ | ❌ |
+| Mirror detectors | ✅ (h, v, quad) | ❌ |
+| Fixed crop detector | ✅ | ❌ |
+| Extract outline | ✅ | ❌ |
+| Composition (rot+color) | ✅ | ❌ |
+| Channel reduction | ✅ (fewer channels = cheaper) | ❌ |
+| PyTorch learned conv | ✅ (multi-seed, ternary snap) | ❌ (lstsq only) |
+| Two-layer conv | ✅ (Conv→ReLU→Conv) | ❌ |
+| Blend from other notebooks | ✅ (12+ sources) | ❌ |
+## 4. The Submission Score Gap Problem
+### Why LB = 501 when local = 3267
+Our 307 solved tasks generate ONNX models locally. But on Kaggle:
+1. Models are validated against `train + test + arc-gen` (all splits)
+2. Conv models fitted on 6 examples often fail on 250+ arc-gen examples
+3. Failed models score 0 (not even the 1.0 minimum)
+4. Likely only ~40-50 of our 307 models actually pass on Kaggle
+### The fix priority
+1. **Validate locally against arc-gen** before submitting — only include models that pass
+2. **Include arc-gen examples in conv fitting** — more data = better generalization
+3. **Add more analytical solvers** (shift, mirror, gravity, crop) — these always generalize
+4. **Try opset 17** — unlocks more ops, may work fine on Kaggle
+## 5. Architecture & Code Structure
+### `neurogolf_solver.py` structure
+```
+Constants: BATCH=1, CH=10, GH=GW=30
+EXCLUDED_TASKS = {21, 55, 80, 184, 202, 366}
+load_tasks_dir(data_dir, arcgen_dir)  # Load + merge ARC-GEN
+to_onehot(grid)                        # Grid → [1,10,30,30]
+validate(path, td)                     # Check model on ALL splits
+score_network(path)                    # MACs + memory + params
+Analytical Solvers (priority order):
+  identity → constant → color_map → transpose → flip → rotate →
+  tile → upscale → kronecker → concat → concat_enhanced →
+  diagonal_tile → spatial_gather → varshape_spatial_gather
+Conv Solvers:
+  solve_conv_fixed()    — Fixed same-shape: Slice→Conv→ArgMax→Equal+Cast→Pad
+  solve_conv_variable() — Variable same-shape: Conv(30×30)→ArgMax→Equal+Cast→Mul(mask)
+  solve_conv_diffshape()— Fixed diff-shape (output≤input)
+  solve_conv_var_diff() — Variable diff-shape (output≤input)
+Main: solve_task() → run_tasks() → generate submission.zip + submission.csv
+```
+### ONNX Building Patterns (opset 10)
+```python
+# Model skeleton
+def mk(nodes, inits=None):
+    x = helper.make_tensor_value_info("input", DT, [1,10,30,30])
+    y = helper.make_tensor_value_info("output", DT, [1,10,30,30])
+    g = helper.make_graph(nodes, "g", [x], [y], initializer=inits or [])
+    return helper.make_model(g, ir_version=10, opset_imports=[helper.make_opsetid("", 10)])
+# One-hot via Equal+Cast (NOT OneHot — has CUDA issues)
+classes = np.arange(10).reshape(1,10,1,1)
+Equal(argmax_output, classes) → Cast(to=FLOAT)
+# Spatial remap via Gather (NOT GatherElements — requires opset 11!)
+Reshape([1,10,30,30] → [1,10,900]) → Gather(axis=2, indices=[900]) → Reshape back
+# Conv pattern
+Conv(input, W, kernel_shape=[ks,ks], pads=[pad]*4) → ArgMax → Equal+Cast → Mul(mask)
+# Mask for variable-shape: ReduceSum(input, axes=[1], keepdims=1) gives 1 where content exists
+```
+### Critical Op Compatibility
+| Op | Opset Required | Notes |
+|----|---------------|-------|
+| Gather | 1 | ✅ Safe. Use axis=2 on flattened [1,10,900] |
+| GatherElements | 11 | ❌ DO NOT USE with opset 10. Will fail on ORT 1.25+ |
+| OneHot | 9 | ⚠️ No CUDA kernel. Use Equal+Cast instead |
+| Conv | 1 | ✅ Safe |
+| ArgMax | 1 | ✅ Safe |
+| ReduceSum | 1 | ✅ Safe |
+| Pad | 2 (opset 10 syntax) | ✅ Use `pads` attribute for opset 10 |
+| Slice | 10 | ✅ With starts/ends as inputs |
+| Tile | 6 | ✅ Safe |
+| ScatterElements | 11 | ⚠️ Requires opset 11+ |
+## 6. Conv Fitting: lstsq vs PyTorch
+### Current: lstsq (single-layer, closed-form)
+```python
+patches = []  # [N, 10*ks*ks] feature vectors
+targets = []  # [N] integer class labels
+P, T_oh = build_from_examples(exs)
+WT = np.linalg.lstsq(P, T_oh)[0]  # Closed-form optimal weights
+if np.argmax(P @ WT, 1) == T: SUCCESS  # Perfect fit check
+```
+- Fast, deterministic, optimal for linear case
+- FAILS when: pattern is nonlinear, too few examples, kernel too small
+### Needed: PyTorch gradient descent (multi-layer)
+```python
+class TinyARC(nn.Module):
+    def __init__(self, hidden=32, ks=5):
+        self.conv1 = nn.Conv2d(10, hidden, ks, padding=ks//2)
+        self.conv2 = nn.Conv2d(hidden, 10, ks, padding=ks//2)
+    def forward(self, x):
+        return self.conv2(torch.relu(self.conv1(x)))
+# Train with MSE or cross-entropy, export with torch.onnx.export(model, dummy, path, opset_version=10)
+# Then add argmax+equal+cast+mask post-processing in ONNX manually
+```
+- Can fit nonlinear patterns lstsq can't
+- Multi-seed training (0, 7, 42) for robustness
+- Ternary weight snapping: round weights to {-1, 0, 1} for smaller models
+### ARC-GEN for conv fitting
+The conv MUST generalize to arc-gen examples. Two approaches:
+1. **Include arc-gen in fitting data** — use `train + test + arc-gen[:20]` for lstsq
+2. **Validate against arc-gen after fitting** — only accept if passes all splits
+## 7. Unsolved Tasks (94 in v3)
+### Categories
+| Category | Count | Why Unsolved |
+|----------|-------|-------------|
+| Variable diff-shape (output smaller) | ~60 | Output shape depends on input content |
+| Variable diff-shape (output larger) | ~17 | Same problem |
+| Same-shape, complex pattern | ~10 | Need larger kernels or multi-layer |
+| Fixed diff-shape, output larger | ~7 | Input-content-dependent patterns |
+### Fundamental Blocker
+Variable-shape tasks where output size depends on input CONTENT cannot be solved with a static ONNX graph. The only workaround: conv learns to put valid content in the right region, masked by input-derived spatial mask.
+## 8. Mistakes Log (DO NOT REPEAT)
+### GatherElements (opset 11) — Fixed in v3
+`GatherElements` requires opset 11. Works on Kaggle's old ORT but fails on ORT 1.25+. Replaced with `Gather` (opset 1) using 1D indices on flattened spatial dim.
+### s_flip still used GatherElements — Fixed in v4
+The `s_flip` solver was still using `GatherElements`. Must use `_build_gather_model()` instead.
+### ARC-GEN not loaded — The #1 score killer
+v3 had `if 'arc-gen' in td` in validate() but never loaded arc-gen data into `td`. So validation always passed (no arc-gen to check), but Kaggle validated against arc-gen and most conv models failed.
+### Conv fitted on too few examples
+Fitting on 6 train+test examples → overfits to small sample. Must include arc-gen examples in fitting data for better generalization.
+### No submission.csv
+Kaggle may need submission.csv alongside submission.zip.
+### Wrong score_network without onnx_tool
+Our fallback `score_network` returned `(0, 0, 0)` instead of real costs. Need static profiler that matches Kaggle's calculation.
+### Ignored EXCLUDED tasks
+Wasted time trying to solve tasks 21, 55, 80, 184, 202, 366 which are officially excluded.
+## 9. Competitive Strategy
+### Path to 4800+ LB score
+1. **Fix ARC-GEN validation** — immediately recover ~200 points from models that actually work
+2. **Add missing analytical solvers** (shift, mirror, gravity, crop, composition) — +20-30 tasks, ~13 points each
+3. **PyTorch multi-layer conv** — solve 5-10 more complex same-shape tasks
+4. **Channel reduction** — reduce cost of existing solutions by 30-50%
+5. **Blend with other notebooks** — the 4200 notebook proves this is the meta-strategy
+### Quick wins
+- Transpose: score=25.0 (cost=0, just permute dims) — already have
+- Identity: score=25.0 — already have
+- Color map via channel Gather: cheaper than Conv 1×1 (params+nbytes only, no MACs)
+- Analytical solvers: ~13 points each (cost ≈ 165K)
+- Small conv (ks=1): ~11-13 points
+- Large conv (ks=29): ~7 points
+## 10. Data & File Locations
+### On Kaggle
+```
+/kaggle/input/competitions/neurogolf-2026/
+  task001.json ... task400.json  (with train+test+arc-gen)
+  neurogolf_utils/neurogolf_utils.py
+```
+### Locally
+```
+ARC-AGI/data/training/           # 400 hex-named .json files (train+test only)
+ARC-GEN-100K/                    # 400 hex-named .json files (arc-gen examples)
+neurogolf-solver/
+  neurogolf_solver.py            # Main solver
+  neurogolf_utils.py             # Official Kaggle utils (needs onnx_tool, IPython)
+```
+### ARC-GEN file format
+```python
+# ARC-GEN-100K/{hex_id}.json is a LIST of examples:
+[{"input": [[...]], "output": [[...]]}, ...]
+# Must be merged into task data as td['arc-gen'] = list_of_examples
+```
+### ARC-GEN GitHub generator
+https://github.com/google/ARC-GEN — Can generate MORE examples per task if needed.
+## 11. Reference Notebooks (in repo as neurogolf-2026-solver-notebooks.zip)
+| Notebook | LB Score | Tasks | Key Technique |
+|----------|----------|-------|---------------|
+| neurogolf-2026-tiny-onnx-solver | ~4200 | 338 | Mega-blend of 12+ notebooks |
+| 4200-v5-neurogolf-fix | ~5700 est | 341 | Same blend, manual LLM rescue tasks |
+| the-2026-neurogolf-championship | ~3200 est | 288 | Own solver + blend |
+| neurogolf-logic-driven-ensembling | — | 401 | Pure ensembling from zips |
+## 12. Testing Checklist
+Before any Kaggle submission:
+- [ ] All models validated against train + test + arc-gen (locally)
+- [ ] EXCLUDED tasks {21,55,80,184,202,366} not included
+- [ ] No GatherElements (opset 11) in any model
+- [ ] No banned ops (Loop, Scan, NonZero, Unique)
+- [ ] Each .onnx file < 1.44 MB
+- [ ] submission.zip < 1.44 MB total
+- [ ] submission.csv generated
+- [ ] Local estimated score calculated with static profiler
+- [ ] Compared local score vs expected LB (should be close now)