rogermt
/

neurogolf-solver

Model card Files Files and versions

xet

Community

rogermt commited on 12 days ago

Commit

4041952

verified ·

1 Parent(s): fa9d7c5

Update SKILL.md: v5.2 structure, new solvers, no excluded tasks, current scores

Browse files

Files changed (1) hide show

SKILL.md +65 -51

SKILL.md CHANGED Viewed

@@ -32,8 +32,8 @@ Research → Design → Experiment → Analyze → Research → ...
 ## Quick Reference
 - **Repo**: `rogermt/neurogolf-solver`
-- **Current version**: v5 — refactored package, opset 17, currently running on Kaggle
-- **Previous best**: v4.3 — 50 arc-gen-validated tasks, est LB ~670
 - **Kaggle runtime**: 12 hours for submission
 - **Target**: 3000+ LB (our own solver, no blending)
 - **Detailed history, mistakes, analysis**: see `LEARNING.md`
@@ -48,7 +48,7 @@ Research → Design → Experiment → Analyze → Research → ...
 | Max file size | 1.44 MB per model |
 | Banned ops | Loop, Scan, NonZero, Unique, Script, Function |
 | Scoring | `max(1.0, 25.0 - ln(MACs + memory + params))` per task |
-| Excluded tasks | {21, 55, 80, 184, 202, 366} — skip these |
 | Validation | Models checked against **train + test + arc-gen** (ALL splits) |
 | Submission | `submission.zip` with `task001.onnx`–`task400.onnx` + optional `submission.csv` |
@@ -64,10 +64,10 @@ Research → Design → Experiment → Analyze → Research → ...
 ## 3. Architecture
-### Package Structure (v5)
 ```
 neurogolf_solver/
-├── constants.py          # Grid dims, opset, excluded tasks, limits
 ├── config.py             # Runtime providers, opset factory
 ├── data_loader.py        # Task loading, one-hot, example extraction
 ├── validators.py         # Model validation against all splits
@@ -78,9 +78,12 @@ neurogolf_solver/
 ├── main.py               # Entry point with argparse
 └── solvers/
     ├── analytical.py     # identity, constant, color_map, transpose
-    ├── geometric.py      # flip, rotate, shift, crop, gravity
     ├── tiling.py         # tile, upscale, mirror, concat, spatial_gather
-    ├── conv.py           # lstsq conv (fixed, variable, diffshape, var_diff)
     └── solver_registry.py # ANALYTICAL_SOLVERS list + solve_task()
 ```
@@ -92,13 +95,14 @@ Run with: `python -m neurogolf_solver.main [args]`
    identity → constant → color_map → transpose → flip → rotate →
    shift → tile → upscale → kronecker → nonuniform_scale →
    mirror_h → mirror_v → quad_mirror → concat → concat_enhanced →
-   diagonal_tile → fixed_crop → spatial_gather → varshape_spatial_gather
-2. Conv solvers (lstsq fitted, validated against arc-gen):
-   conv_fixed    — Slice→Conv→ArgMax→Equal+Cast→Pad
-   conv_variable — Conv(30×30)→ArgMax→Equal+Cast→Mul(mask)
-   conv_diffshape— Slice→Conv→Slice(crop)→ArgMax→Equal+Cast→Pad
-   conv_var_diff — Conv(30×30)→ArgMax→Equal+Cast→Mul(input_mask)
 ```
 ### ONNX Building Rules (opset 17)
@@ -111,59 +115,69 @@ Run with: `python -m neurogolf_solver.main [args]`
 - **ReduceSum** with axes as **tensor input** (opset 13+ requirement)
 - **Pad** with tensor-based `pads` input (opset 11+ requirement)
 - **lstsq calls** must be wrapped in `try/except (LinAlgError, ValueError)` — SVD can fail to converge
-### Conv Fitting — THE #1 BLOCKER
-**We solve 307 locally but only ~50 survive arc-gen. This is CATASTROPHIC overfitting.**
-- Patch matrix P has n rows (patches) and p columns (10×ks² features)
-- **Root cause**: LOW effective rank of patch covariance (~10-40) due to few active colors
-- **Double descent**: ks=5,7,9 are at/near interpolation threshold where test error PEAKS
-**Current fitting strategy (v5):**
-- lstsq on train+test (+arc-gen when same grid size, capped at 10 examples)
 - Kernel sizes: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]
 - Try no-bias first, then bias
 - lstsq wrapped in try/except for SVD non-convergence
 - **Validate against arc-gen BEFORE accepting** — reject if fails
-**What does NOT help:**
-- ❌ Ridge/LOOCV λ tuning — theory predicts failure for low effective rank
-- ❌ More arc-gen examples in lstsq — adding constraints to underdetermined system doesn't fix wrong model
-- ❌ GPU/CuPy for lstsq — same O(n³) cost, crashes on memory
-**What MIGHT help (evidence-backed, needs testing):**
-- 🔲 Skip ks=5,7,9 — avoid interpolation threshold (double descent peak)
-- 🔲 PCA dimensionality reduction — project to top-20 components, ensure p_reduced << n
-- 🔲 Lasso (ℓ₁) instead of lstsq — matches sparse signal structure
-- 🔲 Gradient descent with early stopping — implicit regularization, don't interpolate
 ## 4. Performance
-**The lstsq conv solver is the speed bottleneck.** Use `--conv_budget` to cap time per task (30s locally, 60s on Kaggle).
 **Do NOT** try to GPU-accelerate lstsq. The bottleneck is algorithmic (O(n³) SVD), not device.
-## 5. Score Accounting
-| Category | Tasks (v4) | Avg Score | Notes |
-|----------|------------|-----------|-------|
-| Analytical (Slice/Gather) | ~25 | ~13-21 | v5 Slice-based should be ~20-25 |
-| Conv (arc-gen validated) | ~25 | ~11 | Unchanged in v5 |
-| Unsolved | ~350 | 1.0 | Minimum score |
-| **v4 Est LB** | | | **~670** |
-| **v5 Est LB** | | | **TBD (running)** |
 ### Path to 3000+
-1. ✅ ARC-GEN validation (v4: +155 pts)
-2. ✅ New analytical solvers: shift, mirror, crop, quad_mirror (v4: +8 tasks)
-3. ✅ Color map Gather for permutations (v4: +15 pts)
-4. ✅ Opset 17 Slice-based flip/rotate (v5: ~0 MACs for these transforms)
-5. ✅ Refactored to modular package (v5)
-6. ✅ lstsq crash fix — try/except for SVD non-convergence (v5)
-7. 🔲 **Fix arc-gen survival** — PCA, Lasso, skip bad ks, GD with early stopping
-8. 🔲 **Hard tasks** — hash matchers, run-length detectors, LLM rescue
-9. 🔲 **Score optimization** — ONNX optimizer, best-of-N selection, channel reduction
 **Blending is EXPLICITLY excluded** — user's competitive philosophy.
@@ -171,7 +185,7 @@ Run with: `python -m neurogolf_solver.main [args]`
 Before submitting to Kaggle:
 - [ ] All models validated against train + test + arc-gen (locally)
-- [ ] EXCLUDED tasks {21,55,80,184,202,366} not included
 - [ ] No GatherElements in any model
 - [ ] No banned ops
 - [ ] Each .onnx < 1.44 MB
@@ -184,7 +198,7 @@ Before submitting to Kaggle:
 | Location | Path | Notes |
 |----------|------|-------|
 | HF Repo | `rogermt/neurogolf-solver` | All code + data |
-| **Solver package** | `neurogolf_solver/` | **v5 — 16 files, modular** |
 | Legacy monolith | `neurogolf_solver.py` | v4, kept for reference — do not edit |
 | Official utils | `neurogolf_utils.py` | Kaggle scoring lib (needs onnx_tool) |
 | ARC-GEN data | `ARC-GEN-100K.zip` | 400 files, 100K examples |

 ## Quick Reference
 - **Repo**: `rogermt/neurogolf-solver`
+- **Current version**: v5.2 — 52 solved, ~710 score, est LB ~1058
+- **Previous best on Kaggle**: v4.3 — 50 arc-gen-validated tasks, est LB ~670
 - **Kaggle runtime**: 12 hours for submission
 - **Target**: 3000+ LB (our own solver, no blending)
 - **Detailed history, mistakes, analysis**: see `LEARNING.md`
 | Max file size | 1.44 MB per model |
 | Banned ops | Loop, Scan, NonZero, Unique, Script, Function |
 | Scoring | `max(1.0, 25.0 - ln(MACs + memory + params))` per task |
+| Tasks | **All 400 count. There are NO excluded tasks.** |
 | Validation | Models checked against **train + test + arc-gen** (ALL splits) |
 | Submission | `submission.zip` with `task001.onnx`–`task400.onnx` + optional `submission.csv` |
 ## 3. Architecture
+### Package Structure (v5.2)
 ```
 neurogolf_solver/
+├── constants.py          # Grid dims, opset, limits (NO excluded tasks)
 ├── config.py             # Runtime providers, opset factory
 ├── data_loader.py        # Task loading, one-hot, example extraction
 ├── validators.py         # Model validation against all splits
 ├── main.py               # Entry point with argparse
 └── solvers/
     ├── analytical.py     # identity, constant, color_map, transpose
+    ├── geometric.py      # flip, rotate, shift, crop, gravity (detect only)
     ├── tiling.py         # tile, upscale, mirror, concat, spatial_gather
+    ├── conv.py           # lstsq conv (fixed, variable, diffshape, var_diff) + PCR fallback
+    ├── gravity.py        # Unrolled bubble-sort gravity (Conv+Where, 4 dirs) — Task 78
+    ├── edge.py           # Laplacian edge detection (0 matches currently)
+    ├── mode.py           # Mode fill (ReduceSum→ArgMax→Expand) — Task 129
     └── solver_registry.py # ANALYTICAL_SOLVERS list + solve_task()
 ```
    identity → constant → color_map → transpose → flip → rotate →
    shift → tile → upscale → kronecker → nonuniform_scale →
    mirror_h → mirror_v → quad_mirror → concat → concat_enhanced →
+   diagonal_tile → fixed_crop → spatial_gather → varshape_spatial_gather →
+   gravity_unrolled → edge_detect → mode_fill
+2. Conv solvers (lstsq fitted, validated against arc-gen, PCR fallback):
+   conv_fixed     — Slice→Conv→ArgMax→Equal+Cast→Pad
+   conv_variable  — Conv(30×30)→ArgMax→Equal+Cast→Mul(mask)
+   conv_diffshape — Slice→Conv→Slice(crop)→ArgMax→Equal+Cast→Pad
+   conv_var_diff  — Conv(30×30)→ArgMax→Equal+Cast→Mul(input_mask)
 ```
 ### ONNX Building Rules (opset 17)
 - **ReduceSum** with axes as **tensor input** (opset 13+ requirement)
 - **Pad** with tensor-based `pads` input (opset 11+ requirement)
 - **lstsq calls** must be wrapped in `try/except (LinAlgError, ValueError)` — SVD can fail to converge
+- **ArgMax + Equal+Cast** before Pad to ensure clean one-hot in padded region (gravity solver lesson)
+### Conv Fitting
+**Conv ceiling: ~25 tasks.** Regularization (Ridge, PCA/SVD, skip-ks) all tested and rejected.
+Root cause: architecture mismatch — most unsolved tasks need non-local ops, not local conv patches.
+**Current fitting strategy (v5.1+):**
+- Composable primitives: `_build_patch_matrix` + `_solve_weights` + `_extract_weights`
+- PCR fallback via `_solve_weights_pcr` (deferred 2nd pass, 0 new solves but no regressions)
 - Kernel sizes: [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]
 - Try no-bias first, then bias
 - lstsq wrapped in try/except for SVD non-convergence
 - **Validate against arc-gen BEFORE accepting** — reject if fails
+### New Solver Architectures (v5.2)
+**gravity.py** — Unrolled bubble-sort via Conv+Where
+- 4 directions × 10 bg colors, max(IH,IW) steps
+- Per step: 2× Conv(3×3 shift), 3× ReduceSum, 3× Greater, 2× And, 2× Where
+- Final: ArgMax + Equal+Cast + Pad (clean one-hot)
+- Cost: ~16M (10×10 grid), score ~8.4
+- **Validated: Task 78 (direction=up, bg=0)**
+**edge.py** — Laplacian conv boundary detection
+- Conv 1×1 (channel collapse) → Conv 3×3 (Laplacian) → Abs → Greater → And → Where
+- Cost: ~16K MACs, score ~15
+- **0 matches currently** — edge definition may be too strict
+**mode.py** — Global majority color fill
+- Slice → ReduceSum(axes=[2,3]) → ArgMax → Equal+Cast → Expand → Pad
+- Cost: ~2K, score ~19.5
+- **Validated: Task 129**
 ## 4. Performance
+**The lstsq conv solver is the speed bottleneck.** Use `--conv_budget` to cap time per task (5s locally, 60s on Kaggle).
 **Do NOT** try to GPU-accelerate lstsq. The bottleneck is algorithmic (O(n³) SVD), not device.
+## 5. Score Accounting (v5.2)
+| Category | Tasks | Avg Score | Notes |
+|----------|-------|-----------|-------|
+| Analytical | 24 | ~16 | identity, constant, color_map, transpose, flip, rotate, shift, tile, mirrors, etc. |
+| Conv (lstsq) | 25 | ~10.5 | conv_fixed, conv_var, conv_diff, conv_var_diff |
+| Gravity | 1 | 8.4 | Task 78 |
+| Mode fill | 1 | 19.5 | Task 129 |
+| Timing artifact | 1 | 8.2 | Task 61 (conv_var, only on slow hardware) |
+| **Unsolved** | **348** | **1.0** | Minimum score |
+| **Total** | **52/400** | | **~710 solved + 348 = ~1058 est LB** |
 ### Path to 3000+
+1. ✅ ARC-GEN validation (v4)
+2. ✅ New analytical solvers (v4)
+3. ✅ Opset 17 Slice-based transforms (v5)
+4. ✅ lstsq crash fix + modular package (v5)
+5. ✅ PCR fallback in conv (v5.1 — 0 new solves but clean code)
+6. ✅ Gravity solver (v5.2 — Task 78)
+7. ✅ Mode fill solver (v5.2 — Task 129)
+8. 🔲 **Phase 3 solvers**: flood fill, composition, color LUT, CumSum — see TODO.md
+9. 🔲 **Phase 1a**: Opset 17 conversions for existing analytical tasks (score optimization)
+10. 🔲 **Phase 4**: ONNX optimizer, best-of-N selection
 **Blending is EXPLICITLY excluded** — user's competitive philosophy.
 Before submitting to Kaggle:
 - [ ] All models validated against train + test + arc-gen (locally)
+- [ ] **All 400 tasks attempted** (no exclusions)
 - [ ] No GatherElements in any model
 - [ ] No banned ops
 - [ ] Each .onnx < 1.44 MB
 | Location | Path | Notes |
 |----------|------|-------|
 | HF Repo | `rogermt/neurogolf-solver` | All code + data |
+| **Solver package** | `neurogolf_solver/` | **v5.2 — 19 files, modular** |
 | Legacy monolith | `neurogolf_solver.py` | v4, kept for reference — do not edit |
 | Official utils | `neurogolf_utils.py` | Kaggle scoring lib (needs onnx_tool) |
 | ARC-GEN data | `ARC-GEN-100K.zip` | 400 files, 100K examples |