Add SKILL.md - complete knowledge base for NeuroGolf solver
Browse files
SKILL.md
ADDED
|
@@ -0,0 +1,312 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
name: neurogolf-solver
|
| 3 |
+
description: Build and improve an ONNX model generator for the NeuroGolf Championship (Kaggle). Produces 400 tiny ONNX models (opset 10, IR 10, input/output [1,10,30,30] one-hot float32) for ARC-AGI tasks. Scoring = max(1, 25 - ln(MACs + memory_bytes + params)). Lower cost = higher score. Use this skill whenever working on this competition, debugging submission failures, or starting a fresh session.
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# NeuroGolf Solver — Complete Knowledge Base
|
| 7 |
+
|
| 8 |
+
## 1. Competition Format
|
| 9 |
+
|
| 10 |
+
### What is NeuroGolf?
|
| 11 |
+
IJCAI-ECAI 2026 NeuroGolf Challenge on Kaggle. You build 400 tiny ONNX neural networks, one per ARC-AGI task. Each network transforms a one-hot encoded grid to another grid. Scoring rewards small, efficient networks.
|
| 12 |
+
|
| 13 |
+
### ONNX Model Spec
|
| 14 |
+
- **Input**: `"input"` float32 `[1, 10, 30, 30]` — one-hot encoded grid (10 color channels, 30×30 spatial)
|
| 15 |
+
- **Output**: `"output"` float32 `[1, 10, 30, 30]` — same format
|
| 16 |
+
- **Opset**: 10, IR version: 10 (but opset 17 ALSO works on Kaggle — see §3)
|
| 17 |
+
- **Max file size**: 1.44 MB per model (floppy disk limit)
|
| 18 |
+
- **Banned ops**: Loop, Scan, NonZero, Unique, Script, Function
|
| 19 |
+
|
| 20 |
+
### Scoring Formula
|
| 21 |
+
```
|
| 22 |
+
score_per_task = max(1.0, 25.0 - ln(MACs + memory_bytes + params))
|
| 23 |
+
total_score = sum(score_per_task for all 400 tasks)
|
| 24 |
+
```
|
| 25 |
+
- Unsolved tasks score 1.0 (not 0!)
|
| 26 |
+
- Max possible per task: 25.0 (cost=0, e.g. Identity)
|
| 27 |
+
- **Excluded tasks**: {21, 55, 80, 184, 202, 366} — officially excluded, score 0 regardless
|
| 28 |
+
|
| 29 |
+
### Submission Format
|
| 30 |
+
- `submission.zip` containing `task001.onnx` through `task400.onnx`
|
| 31 |
+
- Models must pass validation against ALL examples: **train + test + arc-gen**
|
| 32 |
+
- Optional: `submission.csv` with columns `task_id, total_cost`
|
| 33 |
+
|
| 34 |
+
### ARC-GEN Data (CRITICAL)
|
| 35 |
+
On Kaggle, each task JSON at `/kaggle/input/competitions/neurogolf-2026/taskNNN.json` contains:
|
| 36 |
+
```json
|
| 37 |
+
{"train": [...], "test": [...], "arc-gen": [...]}
|
| 38 |
+
```
|
| 39 |
+
The `arc-gen` key has **up to 262 additional examples per task** (100K total across 400 tasks) generated by Google's ARC-GEN system. **Models are validated against ALL splits including arc-gen.** A model that passes train+test but fails arc-gen scores ZERO on Kaggle.
|
| 40 |
+
|
| 41 |
+
Locally, ARC-GEN data is in separate files at `ARC-GEN-100K/{hex_id}.json` as a list of `{input, output}` dicts. Must be merged with the ARC-AGI task data.
|
| 42 |
+
|
| 43 |
+
## 2. Current State (v3 → v4 in progress)
|
| 44 |
+
|
| 45 |
+
### v3 Results: 307/400 solved locally, LB score ~501 (NOT ~3267)
|
| 46 |
+
The massive gap (3267 local vs 501 LB) means **most of our conv models fail ARC-GEN validation on Kaggle**. The conv is fitted on ~6 train+test examples but must generalize to ~250 arc-gen examples of varying sizes. Many don't.
|
| 47 |
+
|
| 48 |
+
### Solver Breakdown (v3)
|
| 49 |
+
```
|
| 50 |
+
conv_var: 125, conv_fixed: 107, conv_diff: 39, spatial_gather: 16,
|
| 51 |
+
concat: 5, color_map: 4, concat_enhanced: 4, rotate: 3,
|
| 52 |
+
transpose: 2, upscale: 1, varshape_spatial_gather: 1
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
### Repository
|
| 56 |
+
- HF: `rogermt/neurogolf-solver`
|
| 57 |
+
- Files: `neurogolf_solver.py`, `neurogolf_utils.py` (official Kaggle utils), `ARC-GEN-100K.zip`, `neurogolf-2026-solver-notebooks.zip`
|
| 58 |
+
|
| 59 |
+
## 3. Key Differences: Our Solver vs High-Scoring Notebooks
|
| 60 |
+
|
| 61 |
+
### The 4200-point notebook (`neurogolf-2026-tiny-onnx-solver`)
|
| 62 |
+
This is a **BLEND notebook** — it does NOT solve tasks from scratch. It:
|
| 63 |
+
1. **Phase 1**: Loads 12+ other notebooks' `submission.zip` files as inputs
|
| 64 |
+
2. For each task, picks the cheapest valid model across all sources
|
| 65 |
+
3. **Phase 2**: Tries loose ONNX files from dataset inputs
|
| 66 |
+
4. **Phase 3**: Runs its own solver only on remaining unsolved tasks
|
| 67 |
+
5. Validates EVERYTHING against train+test+arc-gen before including
|
| 68 |
+
6. Result: 338/400 solved, est. score 4197.5
|
| 69 |
+
|
| 70 |
+
**Critical insight**: The 4200 score comes from BLENDING many solutions, not from a single solver. The solver itself only adds 0 new tasks in Phase 3. All 338 come from other notebooks' pre-built models.
|
| 71 |
+
|
| 72 |
+
### The championship notebook (`the-2026-neurogolf-championship`)
|
| 73 |
+
Also a blend but with its own solver. Key differences from ours:
|
| 74 |
+
- Uses **opset 17** (not 10!) — works fine on Kaggle
|
| 75 |
+
- Has **shift detector**, **gravity detector**, **mirror detectors**, **fixed crop detector**, **outline detector**
|
| 76 |
+
- Has **composition detectors**: rotation+color, transpose+color, flip+color
|
| 77 |
+
- Has **channel reduction**: reduces 10→N channels for fewer colors → cheaper models
|
| 78 |
+
- Uses **PyTorch learned conv**: multi-seed Adam training, ternary weight snapping
|
| 79 |
+
- Uses **two-layer conv**: Conv→ReLU→Conv for complex patterns
|
| 80 |
+
- Validates against `train + arc-gen[:30]` (capped at 30 arc-gen examples)
|
| 81 |
+
- Result: 288 from own solver + more from blended inputs
|
| 82 |
+
|
| 83 |
+
### What they have that we don't
|
| 84 |
+
| Feature | Them | Us |
|
| 85 |
+
|---------|------|-----|
|
| 86 |
+
| ARC-GEN validation | ✅ validate against arc-gen | ❌ v3 ignores arc-gen |
|
| 87 |
+
| ARC-GEN in fitting | ✅ uses arc-gen[:3] in detectors | ❌ fits only train+test |
|
| 88 |
+
| Opset 17 | ✅ uses freely | ❌ stuck on opset 10 |
|
| 89 |
+
| Shift detector | ✅ | ❌ |
|
| 90 |
+
| Gravity detector | ✅ | ❌ |
|
| 91 |
+
| Mirror detectors | ✅ (h, v, quad) | ❌ |
|
| 92 |
+
| Fixed crop detector | ✅ | ❌ |
|
| 93 |
+
| Extract outline | ✅ | ❌ |
|
| 94 |
+
| Composition (rot+color) | ✅ | ❌ |
|
| 95 |
+
| Channel reduction | ✅ (fewer channels = cheaper) | ❌ |
|
| 96 |
+
| PyTorch learned conv | ✅ (multi-seed, ternary snap) | ❌ (lstsq only) |
|
| 97 |
+
| Two-layer conv | ✅ (Conv→ReLU→Conv) | ❌ |
|
| 98 |
+
| Blend from other notebooks | ✅ (12+ sources) | ❌ |
|
| 99 |
+
|
| 100 |
+
## 4. The Submission Score Gap Problem
|
| 101 |
+
|
| 102 |
+
### Why LB = 501 when local = 3267
|
| 103 |
+
Our 307 solved tasks generate ONNX models locally. But on Kaggle:
|
| 104 |
+
1. Models are validated against `train + test + arc-gen` (all splits)
|
| 105 |
+
2. Conv models fitted on 6 examples often fail on 250+ arc-gen examples
|
| 106 |
+
3. Failed models score 0 (not even the 1.0 minimum)
|
| 107 |
+
4. Likely only ~40-50 of our 307 models actually pass on Kaggle
|
| 108 |
+
|
| 109 |
+
### The fix priority
|
| 110 |
+
1. **Validate locally against arc-gen** before submitting — only include models that pass
|
| 111 |
+
2. **Include arc-gen examples in conv fitting** — more data = better generalization
|
| 112 |
+
3. **Add more analytical solvers** (shift, mirror, gravity, crop) — these always generalize
|
| 113 |
+
4. **Try opset 17** — unlocks more ops, may work fine on Kaggle
|
| 114 |
+
|
| 115 |
+
## 5. Architecture & Code Structure
|
| 116 |
+
|
| 117 |
+
### `neurogolf_solver.py` structure
|
| 118 |
+
```
|
| 119 |
+
Constants: BATCH=1, CH=10, GH=GW=30
|
| 120 |
+
EXCLUDED_TASKS = {21, 55, 80, 184, 202, 366}
|
| 121 |
+
|
| 122 |
+
load_tasks_dir(data_dir, arcgen_dir) # Load + merge ARC-GEN
|
| 123 |
+
to_onehot(grid) # Grid → [1,10,30,30]
|
| 124 |
+
validate(path, td) # Check model on ALL splits
|
| 125 |
+
score_network(path) # MACs + memory + params
|
| 126 |
+
|
| 127 |
+
Analytical Solvers (priority order):
|
| 128 |
+
identity → constant → color_map → transpose → flip → rotate →
|
| 129 |
+
tile → upscale → kronecker → concat → concat_enhanced →
|
| 130 |
+
diagonal_tile → spatial_gather → varshape_spatial_gather
|
| 131 |
+
|
| 132 |
+
Conv Solvers:
|
| 133 |
+
solve_conv_fixed() — Fixed same-shape: Slice→Conv→ArgMax→Equal+Cast→Pad
|
| 134 |
+
solve_conv_variable() — Variable same-shape: Conv(30×30)→ArgMax→Equal+Cast→Mul(mask)
|
| 135 |
+
solve_conv_diffshape()— Fixed diff-shape (output≤input)
|
| 136 |
+
solve_conv_var_diff() — Variable diff-shape (output≤input)
|
| 137 |
+
|
| 138 |
+
Main: solve_task() → run_tasks() → generate submission.zip + submission.csv
|
| 139 |
+
```
|
| 140 |
+
|
| 141 |
+
### ONNX Building Patterns (opset 10)
|
| 142 |
+
```python
|
| 143 |
+
# Model skeleton
|
| 144 |
+
def mk(nodes, inits=None):
|
| 145 |
+
x = helper.make_tensor_value_info("input", DT, [1,10,30,30])
|
| 146 |
+
y = helper.make_tensor_value_info("output", DT, [1,10,30,30])
|
| 147 |
+
g = helper.make_graph(nodes, "g", [x], [y], initializer=inits or [])
|
| 148 |
+
return helper.make_model(g, ir_version=10, opset_imports=[helper.make_opsetid("", 10)])
|
| 149 |
+
|
| 150 |
+
# One-hot via Equal+Cast (NOT OneHot — has CUDA issues)
|
| 151 |
+
classes = np.arange(10).reshape(1,10,1,1)
|
| 152 |
+
Equal(argmax_output, classes) → Cast(to=FLOAT)
|
| 153 |
+
|
| 154 |
+
# Spatial remap via Gather (NOT GatherElements — requires opset 11!)
|
| 155 |
+
Reshape([1,10,30,30] → [1,10,900]) → Gather(axis=2, indices=[900]) → Reshape back
|
| 156 |
+
|
| 157 |
+
# Conv pattern
|
| 158 |
+
Conv(input, W, kernel_shape=[ks,ks], pads=[pad]*4) → ArgMax → Equal+Cast → Mul(mask)
|
| 159 |
+
|
| 160 |
+
# Mask for variable-shape: ReduceSum(input, axes=[1], keepdims=1) gives 1 where content exists
|
| 161 |
+
```
|
| 162 |
+
|
| 163 |
+
### Critical Op Compatibility
|
| 164 |
+
| Op | Opset Required | Notes |
|
| 165 |
+
|----|---------------|-------|
|
| 166 |
+
| Gather | 1 | ✅ Safe. Use axis=2 on flattened [1,10,900] |
|
| 167 |
+
| GatherElements | 11 | ❌ DO NOT USE with opset 10. Will fail on ORT 1.25+ |
|
| 168 |
+
| OneHot | 9 | ⚠️ No CUDA kernel. Use Equal+Cast instead |
|
| 169 |
+
| Conv | 1 | ✅ Safe |
|
| 170 |
+
| ArgMax | 1 | ✅ Safe |
|
| 171 |
+
| ReduceSum | 1 | ✅ Safe |
|
| 172 |
+
| Pad | 2 (opset 10 syntax) | ✅ Use `pads` attribute for opset 10 |
|
| 173 |
+
| Slice | 10 | ✅ With starts/ends as inputs |
|
| 174 |
+
| Tile | 6 | ✅ Safe |
|
| 175 |
+
| ScatterElements | 11 | ⚠️ Requires opset 11+ |
|
| 176 |
+
|
| 177 |
+
## 6. Conv Fitting: lstsq vs PyTorch
|
| 178 |
+
|
| 179 |
+
### Current: lstsq (single-layer, closed-form)
|
| 180 |
+
```python
|
| 181 |
+
patches = [] # [N, 10*ks*ks] feature vectors
|
| 182 |
+
targets = [] # [N] integer class labels
|
| 183 |
+
P, T_oh = build_from_examples(exs)
|
| 184 |
+
WT = np.linalg.lstsq(P, T_oh)[0] # Closed-form optimal weights
|
| 185 |
+
if np.argmax(P @ WT, 1) == T: SUCCESS # Perfect fit check
|
| 186 |
+
```
|
| 187 |
+
- Fast, deterministic, optimal for linear case
|
| 188 |
+
- FAILS when: pattern is nonlinear, too few examples, kernel too small
|
| 189 |
+
|
| 190 |
+
### Needed: PyTorch gradient descent (multi-layer)
|
| 191 |
+
```python
|
| 192 |
+
class TinyARC(nn.Module):
|
| 193 |
+
def __init__(self, hidden=32, ks=5):
|
| 194 |
+
self.conv1 = nn.Conv2d(10, hidden, ks, padding=ks//2)
|
| 195 |
+
self.conv2 = nn.Conv2d(hidden, 10, ks, padding=ks//2)
|
| 196 |
+
def forward(self, x):
|
| 197 |
+
return self.conv2(torch.relu(self.conv1(x)))
|
| 198 |
+
|
| 199 |
+
# Train with MSE or cross-entropy, export with torch.onnx.export(model, dummy, path, opset_version=10)
|
| 200 |
+
# Then add argmax+equal+cast+mask post-processing in ONNX manually
|
| 201 |
+
```
|
| 202 |
+
- Can fit nonlinear patterns lstsq can't
|
| 203 |
+
- Multi-seed training (0, 7, 42) for robustness
|
| 204 |
+
- Ternary weight snapping: round weights to {-1, 0, 1} for smaller models
|
| 205 |
+
|
| 206 |
+
### ARC-GEN for conv fitting
|
| 207 |
+
The conv MUST generalize to arc-gen examples. Two approaches:
|
| 208 |
+
1. **Include arc-gen in fitting data** — use `train + test + arc-gen[:20]` for lstsq
|
| 209 |
+
2. **Validate against arc-gen after fitting** — only accept if passes all splits
|
| 210 |
+
|
| 211 |
+
## 7. Unsolved Tasks (94 in v3)
|
| 212 |
+
|
| 213 |
+
### Categories
|
| 214 |
+
| Category | Count | Why Unsolved |
|
| 215 |
+
|----------|-------|-------------|
|
| 216 |
+
| Variable diff-shape (output smaller) | ~60 | Output shape depends on input content |
|
| 217 |
+
| Variable diff-shape (output larger) | ~17 | Same problem |
|
| 218 |
+
| Same-shape, complex pattern | ~10 | Need larger kernels or multi-layer |
|
| 219 |
+
| Fixed diff-shape, output larger | ~7 | Input-content-dependent patterns |
|
| 220 |
+
|
| 221 |
+
### Fundamental Blocker
|
| 222 |
+
Variable-shape tasks where output size depends on input CONTENT cannot be solved with a static ONNX graph. The only workaround: conv learns to put valid content in the right region, masked by input-derived spatial mask.
|
| 223 |
+
|
| 224 |
+
## 8. Mistakes Log (DO NOT REPEAT)
|
| 225 |
+
|
| 226 |
+
### GatherElements (opset 11) — Fixed in v3
|
| 227 |
+
`GatherElements` requires opset 11. Works on Kaggle's old ORT but fails on ORT 1.25+. Replaced with `Gather` (opset 1) using 1D indices on flattened spatial dim.
|
| 228 |
+
|
| 229 |
+
### s_flip still used GatherElements — Fixed in v4
|
| 230 |
+
The `s_flip` solver was still using `GatherElements`. Must use `_build_gather_model()` instead.
|
| 231 |
+
|
| 232 |
+
### ARC-GEN not loaded — The #1 score killer
|
| 233 |
+
v3 had `if 'arc-gen' in td` in validate() but never loaded arc-gen data into `td`. So validation always passed (no arc-gen to check), but Kaggle validated against arc-gen and most conv models failed.
|
| 234 |
+
|
| 235 |
+
### Conv fitted on too few examples
|
| 236 |
+
Fitting on 6 train+test examples → overfits to small sample. Must include arc-gen examples in fitting data for better generalization.
|
| 237 |
+
|
| 238 |
+
### No submission.csv
|
| 239 |
+
Kaggle may need submission.csv alongside submission.zip.
|
| 240 |
+
|
| 241 |
+
### Wrong score_network without onnx_tool
|
| 242 |
+
Our fallback `score_network` returned `(0, 0, 0)` instead of real costs. Need static profiler that matches Kaggle's calculation.
|
| 243 |
+
|
| 244 |
+
### Ignored EXCLUDED tasks
|
| 245 |
+
Wasted time trying to solve tasks 21, 55, 80, 184, 202, 366 which are officially excluded.
|
| 246 |
+
|
| 247 |
+
## 9. Competitive Strategy
|
| 248 |
+
|
| 249 |
+
### Path to 4800+ LB score
|
| 250 |
+
1. **Fix ARC-GEN validation** — immediately recover ~200 points from models that actually work
|
| 251 |
+
2. **Add missing analytical solvers** (shift, mirror, gravity, crop, composition) — +20-30 tasks, ~13 points each
|
| 252 |
+
3. **PyTorch multi-layer conv** — solve 5-10 more complex same-shape tasks
|
| 253 |
+
4. **Channel reduction** — reduce cost of existing solutions by 30-50%
|
| 254 |
+
5. **Blend with other notebooks** — the 4200 notebook proves this is the meta-strategy
|
| 255 |
+
|
| 256 |
+
### Quick wins
|
| 257 |
+
- Transpose: score=25.0 (cost=0, just permute dims) — already have
|
| 258 |
+
- Identity: score=25.0 — already have
|
| 259 |
+
- Color map via channel Gather: cheaper than Conv 1×1 (params+nbytes only, no MACs)
|
| 260 |
+
- Analytical solvers: ~13 points each (cost ≈ 165K)
|
| 261 |
+
- Small conv (ks=1): ~11-13 points
|
| 262 |
+
- Large conv (ks=29): ~7 points
|
| 263 |
+
|
| 264 |
+
## 10. Data & File Locations
|
| 265 |
+
|
| 266 |
+
### On Kaggle
|
| 267 |
+
```
|
| 268 |
+
/kaggle/input/competitions/neurogolf-2026/
|
| 269 |
+
task001.json ... task400.json (with train+test+arc-gen)
|
| 270 |
+
neurogolf_utils/neurogolf_utils.py
|
| 271 |
+
```
|
| 272 |
+
|
| 273 |
+
### Locally
|
| 274 |
+
```
|
| 275 |
+
ARC-AGI/data/training/ # 400 hex-named .json files (train+test only)
|
| 276 |
+
ARC-GEN-100K/ # 400 hex-named .json files (arc-gen examples)
|
| 277 |
+
neurogolf-solver/
|
| 278 |
+
neurogolf_solver.py # Main solver
|
| 279 |
+
neurogolf_utils.py # Official Kaggle utils (needs onnx_tool, IPython)
|
| 280 |
+
```
|
| 281 |
+
|
| 282 |
+
### ARC-GEN file format
|
| 283 |
+
```python
|
| 284 |
+
# ARC-GEN-100K/{hex_id}.json is a LIST of examples:
|
| 285 |
+
[{"input": [[...]], "output": [[...]]}, ...]
|
| 286 |
+
# Must be merged into task data as td['arc-gen'] = list_of_examples
|
| 287 |
+
```
|
| 288 |
+
|
| 289 |
+
### ARC-GEN GitHub generator
|
| 290 |
+
https://github.com/google/ARC-GEN — Can generate MORE examples per task if needed.
|
| 291 |
+
|
| 292 |
+
## 11. Reference Notebooks (in repo as neurogolf-2026-solver-notebooks.zip)
|
| 293 |
+
|
| 294 |
+
| Notebook | LB Score | Tasks | Key Technique |
|
| 295 |
+
|----------|----------|-------|---------------|
|
| 296 |
+
| neurogolf-2026-tiny-onnx-solver | ~4200 | 338 | Mega-blend of 12+ notebooks |
|
| 297 |
+
| 4200-v5-neurogolf-fix | ~5700 est | 341 | Same blend, manual LLM rescue tasks |
|
| 298 |
+
| the-2026-neurogolf-championship | ~3200 est | 288 | Own solver + blend |
|
| 299 |
+
| neurogolf-logic-driven-ensembling | — | 401 | Pure ensembling from zips |
|
| 300 |
+
|
| 301 |
+
## 12. Testing Checklist
|
| 302 |
+
|
| 303 |
+
Before any Kaggle submission:
|
| 304 |
+
- [ ] All models validated against train + test + arc-gen (locally)
|
| 305 |
+
- [ ] EXCLUDED tasks {21,55,80,184,202,366} not included
|
| 306 |
+
- [ ] No GatherElements (opset 11) in any model
|
| 307 |
+
- [ ] No banned ops (Loop, Scan, NonZero, Unique)
|
| 308 |
+
- [ ] Each .onnx file < 1.44 MB
|
| 309 |
+
- [ ] submission.zip < 1.44 MB total
|
| 310 |
+
- [ ] submission.csv generated
|
| 311 |
+
- [ ] Local estimated score calculated with static profiler
|
| 312 |
+
- [ ] Compared local score vs expected LB (should be close now)
|