rogermt commited on
Commit
5a08c65
Β·
verified Β·
1 Parent(s): 2d9804f

Replace TODO with roadmap (2026-04-25)

Browse files
Files changed (1) hide show
  1. TODO.md +67 -22
TODO.md CHANGED
@@ -1,29 +1,74 @@
1
- Here's what I need from you and what the situation looks like:
2
 
3
- Current Status: 306/400 βœ… (was 294)
4
- What the model_runner.py code IS
5
- That's the 2024 ARC Prize winning LLM infrastructure β€” it fine-tunes a language model to generate ARC solutions as text. It's a completely different
6
- competition format (text answers,
7
- not ONNX models). It won't help us directly.
8
 
9
- What WOULD help β€” things you can search for:
10
- 1. ARC-GEN additional training data The ARC-GEN paper generates extra input/output examples for each task. More examples β†’ our lstsq conv solver gets more
11
 
12
- 2. data β†’ better fitting β†’ more tasks solved. Search Kaggle/GitHub for "arc-gen" data files or the generator code.
 
 
 
 
 
 
 
 
 
13
 
14
- 3. NeuroGolf / Code Golf Championship notebooks Search Kaggle for neurogolf 2026 or code golf championship ARC ONNX. Other competitors' approaches would be gold.
15
 
16
- 4. CompressARC code (paper) This 76K param approach trains a tiny network PER TASK via gradient descent. If we could adapt it to produce opset-10 ONNX models,
17
 
18
- 5. it could solve many of our 94 remaining tasks. Search GitHub for "CompressARC" or the authors' repos.
 
 
 
 
 
 
 
 
 
19
 
20
- What's actually blocking us (94 unsolved tasks)
21
- Category Count Example Why unsolved
22
- Variable diff-shape (output smaller) ~60 Extract subregion from grid Output shape depends on input content β€” can't build static ONNX
23
- Variable diff-shape (output larger) ~17 Tile/upscale by variable factor Same problem
24
- Same-shape but complex ~10 Multi-step reasoning, flood fill Conv can't learn non-local/algorithmic patterns
25
- Fixed diff-shape (output larger) ~7 Input-driven block placement Output depends on input VALUES, not just positions
26
- The fundamental challenge: 94 tasks require reasoning that depends on input content (not just a fixed pixel remapping or local conv pattern).
27
- Our current ONNX opset 10 toolkit (Conv, Gather, ArgMax, etc.) can only express fixed mappings.
28
- We'd need to find tasks where the mapping IS fixed but our solver just hasn't found it yet β€” likely by adding more training examples
29
- (ARC-GEN) or trying bigger conv kernels with more time budget.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # NeuroGolf Solver β€” Roadmap
2
 
3
+ > Current: v4.2 Β· 50 arc-gen validated Β· ~670 LB Β· Target: 3000+
 
 
 
 
4
 
5
+ ## Phase 1: Cheap Wins (est +400 pts β†’ ~1100)
 
6
 
7
+ - [ ] **Switch to opset 17** β€” replace all Gather-index models with Slice+Transpose builders
8
+ - Rotation: `Crop β†’ Transpose β†’ Slice(step=-1)` = ~0 cost (was ~165K)
9
+ - Flip: `Crop β†’ Slice(step=-1)` = ~0 cost (was ~165K)
10
+ - Transpose: `Crop β†’ Transpose(perm)` = ~0 cost (was ~36K)
11
+ - ~25 analytical tasks go from ~15 pts β†’ ~25 pts each
12
+ - [ ] **Channel reduction wrapper** — `Conv1x1(10→N) → transform → Conv1x1(N→10)` when <8 colors used
13
+ - Saves ~20-40% MACs on conv tasks with few colors
14
+ - [ ] **Composition detectors** β€” rotation+color, flip+color, transpose+color
15
+ - These are tasks where two operations are combined (e.g. rotate then recolor)
16
+ - Top notebooks have these, we don't
17
 
18
+ ## Phase 2: Fix Arc-Gen Survival (est +100-150 tasks β†’ ~2000-2500)
19
 
20
+ This is the #1 blocker. We solve 307 locally but only 50 survive arc-gen.
21
 
22
+ - [ ] **PyTorch learned conv on GPU** β€” train on train+test+arc-gen data
23
+ - Multi-seed Adam (seeds 0,7,42), 3000 steps, lr=0.03
24
+ - Try ks=1,3,5 single-layer + ks=(3,1) and (5,1) two-layer with ReLU
25
+ - **Ternary weight snap** β€” after training, snap weights to {-1,0,1}, re-validate
26
+ - Must include arc-gen examples in training data (not just validation)
27
+ - Needs GPU (T4 minimum) β€” CPU too slow for 400 tasks Γ— 3 seeds Γ— multiple ks
28
+ - [ ] **Increase arc-gen in lstsq fitting** β€” currently capped at 10, try 20-50 for fixed-size tasks
29
+ - More data = more constraints = less overfitting in underdetermined systems
30
+ - [ ] **Generate MORE arc-gen data** β€” use ARC-GEN generator (github.com/google/ARC-GEN) to produce 1000+ examples per task instead of ~250
31
+ - More fitting data = better generalization
32
 
33
+ ## Phase 3: Hard Tasks β€” Hash Matchers & LLM Rescue (est +20-50 tasks β†’ ~2500-3000)
34
+
35
+ For tasks no automated solver can handle.
36
+
37
+ - [ ] **Hash-based matcher builder** β€” automated version of the LLM rescue pattern
38
+ - Flatten input β†’ MatMul(hash_weights) β†’ match against all known examples β†’ apply stored delta
39
+ - Requires opset 17 (ScatterND)
40
+ - Works for ANY task where all examples fit in 1.44MB model
41
+ - Build a generic `build_hash_matcher(task_data) β†’ onnx_bytes` function
42
+ - [ ] **Per-task LLM rescue** β€” for the ~20 hardest tasks with algorithmic patterns
43
+ - Feed task JSON + Python solution to LLM, get back ONNX builder function
44
+ - Priority tasks: gravity, flood fill, outline extraction, pattern counting
45
+ - [ ] **Run-length / gap pattern detector** β€” like task096 in the notebooks
46
+ - Depthwise conv to detect runs of N, gap patterns
47
+ - Template for a class of "count and classify" tasks
48
+
49
+ ## Phase 4: Score Optimization (est +200-500 pts on existing tasks)
50
+
51
+ - [ ] **ONNX optimizer pass** β€” `onnxoptimizer.optimize()` with dead-code elimination, identity removal
52
+ - Top notebooks do this; can shrink models 5-20%
53
+ - [ ] **Best-of-N model selection** β€” for each task, generate multiple candidate models (different ks, bias/no-bias, etc.), keep cheapest valid one
54
+ - Already partially done but could be more aggressive
55
+ - [ ] **Validate with official `neurogolf_utils.score_network()`** β€” use `onnx_tool` for exact cost matching
56
+ - Our static profiler is close but may diverge on edge cases
57
+
58
+ ## Optional: Blend Pipeline
59
+
60
+ If the above isn't enough, we can build our own blend pipeline:
61
+
62
+ - [ ] Upload our solver's `submission.zip` as a Kaggle dataset
63
+ - [ ] Create a blend notebook that loads our own output + runs a second-pass solver
64
+ - [ ] Attach public datasets (see LEARNING.md for the full list of 24 sources)
65
+ - [ ] `strict_validate()` every model through `neurogolf_utils` before submission
66
+
67
+ ## Status Key
68
+
69
+ | Symbol | Meaning |
70
+ |--------|---------|
71
+ | `[ ]` | Not started |
72
+ | `[~]` | In progress |
73
+ | `[x]` | Done |
74
+ | `[!]` | Blocked |