rogermt
/

neurogolf-solver

Model card Files Files and versions

xet

Community

rogermt commited on 16 days ago

Commit

5a08c65

verified ·

1 Parent(s): 2d9804f

Replace TODO with roadmap (2026-04-25)

Browse files

Files changed (1) hide show

TODO.md +67 -22

TODO.md CHANGED Viewed

@@ -1,29 +1,74 @@
-Here's what I need from you and what the situation looks like:
-Current Status: 306/400 ✅ (was 294)
-What the model_runner.py code IS
-That's the 2024 ARC Prize winning LLM infrastructure — it fine-tunes a language model to generate ARC solutions as text. It's a completely different
-competition format (text answers,
-not ONNX models). It won't help us directly.
-What WOULD help — things you can search for:
-1. ARC-GEN additional training data The ARC-GEN paper generates extra input/output examples for each task. More examples → our lstsq conv solver gets more
-2. data → better fitting → more tasks solved. Search Kaggle/GitHub for "arc-gen" data files or the generator code.
-3. NeuroGolf / Code Golf Championship notebooks Search Kaggle for neurogolf 2026 or code golf championship ARC ONNX. Other competitors' approaches would be gold.
-4. CompressARC code (paper) This 76K param approach trains a tiny network PER TASK via gradient descent. If we could adapt it to produce opset-10 ONNX models,
-5. it could solve many of our 94 remaining tasks. Search GitHub for "CompressARC" or the authors' repos.
-What's actually blocking us (94 unsolved tasks)
-Category	Count	Example	Why unsolved
-Variable diff-shape (output smaller)	~60	Extract subregion from grid	Output shape depends on input content — can't build static ONNX
-Variable diff-shape (output larger)	~17	Tile/upscale by variable factor	Same problem
-Same-shape but complex	~10	Multi-step reasoning, flood fill	Conv can't learn non-local/algorithmic patterns
-Fixed diff-shape (output larger)	~7	Input-driven block placement	Output depends on input VALUES, not just positions
-The fundamental challenge: 94 tasks require reasoning that depends on input content (not just a fixed pixel remapping or local conv pattern).
-Our current ONNX opset 10 toolkit (Conv, Gather, ArgMax, etc.) can only express fixed mappings.
-We'd need to find tasks where the mapping IS fixed but our solver just hasn't found it yet — likely by adding more training examples
-(ARC-GEN) or trying bigger conv kernels with more time budget.

+# NeuroGolf Solver — Roadmap
+> Current: v4.2 · 50 arc-gen validated · ~670 LB · Target: 3000+
+## Phase 1: Cheap Wins (est +400 pts → ~1100)
+- [ ] **Switch to opset 17** — replace all Gather-index models with Slice+Transpose builders
+  - Rotation: `Crop → Transpose → Slice(step=-1)` = ~0 cost (was ~165K)
+  - Flip: `Crop → Slice(step=-1)` = ~0 cost (was ~165K)
+  - Transpose: `Crop → Transpose(perm)` = ~0 cost (was ~36K)
+  - ~25 analytical tasks go from ~15 pts → ~25 pts each
+- [ ] **Channel reduction wrapper** — `Conv1x1(10→N) → transform → Conv1x1(N→10)` when <8 colors used
+  - Saves ~20-40% MACs on conv tasks with few colors
+- [ ] **Composition detectors** — rotation+color, flip+color, transpose+color
+  - These are tasks where two operations are combined (e.g. rotate then recolor)
+  - Top notebooks have these, we don't
+## Phase 2: Fix Arc-Gen Survival (est +100-150 tasks → ~2000-2500)
+This is the #1 blocker. We solve 307 locally but only 50 survive arc-gen.
+- [ ] **PyTorch learned conv on GPU** — train on train+test+arc-gen data
+  - Multi-seed Adam (seeds 0,7,42), 3000 steps, lr=0.03
+  - Try ks=1,3,5 single-layer + ks=(3,1) and (5,1) two-layer with ReLU
+  - **Ternary weight snap** — after training, snap weights to {-1,0,1}, re-validate
+  - Must include arc-gen examples in training data (not just validation)
+  - Needs GPU (T4 minimum) — CPU too slow for 400 tasks × 3 seeds × multiple ks
+- [ ] **Increase arc-gen in lstsq fitting** — currently capped at 10, try 20-50 for fixed-size tasks
+  - More data = more constraints = less overfitting in underdetermined systems
+- [ ] **Generate MORE arc-gen data** — use ARC-GEN generator (github.com/google/ARC-GEN) to produce 1000+ examples per task instead of ~250
+  - More fitting data = better generalization
+## Phase 3: Hard Tasks — Hash Matchers & LLM Rescue (est +20-50 tasks → ~2500-3000)
+For tasks no automated solver can handle.
+- [ ] **Hash-based matcher builder** — automated version of the LLM rescue pattern
+  - Flatten input → MatMul(hash_weights) → match against all known examples → apply stored delta
+  - Requires opset 17 (ScatterND)
+  - Works for ANY task where all examples fit in 1.44MB model
+  - Build a generic `build_hash_matcher(task_data) → onnx_bytes` function
+- [ ] **Per-task LLM rescue** — for the ~20 hardest tasks with algorithmic patterns
+  - Feed task JSON + Python solution to LLM, get back ONNX builder function
+  - Priority tasks: gravity, flood fill, outline extraction, pattern counting
+- [ ] **Run-length / gap pattern detector** — like task096 in the notebooks
+  - Depthwise conv to detect runs of N, gap patterns
+  - Template for a class of "count and classify" tasks
+## Phase 4: Score Optimization (est +200-500 pts on existing tasks)
+- [ ] **ONNX optimizer pass** — `onnxoptimizer.optimize()` with dead-code elimination, identity removal
+  - Top notebooks do this; can shrink models 5-20%
+- [ ] **Best-of-N model selection** — for each task, generate multiple candidate models (different ks, bias/no-bias, etc.), keep cheapest valid one
+  - Already partially done but could be more aggressive
+- [ ] **Validate with official `neurogolf_utils.score_network()`** — use `onnx_tool` for exact cost matching
+  - Our static profiler is close but may diverge on edge cases
+## Optional: Blend Pipeline
+If the above isn't enough, we can build our own blend pipeline:
+- [ ] Upload our solver's `submission.zip` as a Kaggle dataset
+- [ ] Create a blend notebook that loads our own output + runs a second-pass solver
+- [ ] Attach public datasets (see LEARNING.md for the full list of 24 sources)
+- [ ] `strict_validate()` every model through `neurogolf_utils` before submission
+## Status Key
+| Symbol | Meaning |
+|--------|---------|
+| `[ ]` | Not started |
+| `[~]` | In progress |
+| `[x]` | Done |
+| `[!]` | Blocked |