ARC-AGI / TODO.md
rogermt's picture
Update TODO.md — σ=0 achieved, all blockers resolved, document findings"
e2ddfc5 verified

TODO (Prioritised)

✅ σ=0 ACHIEVED — Solver works!

The solver now finds Id∘KroneckerSelfSimilar at depth 1 and achieves σ=0 on all 6 pairs of ARC task 007bbfb7.

What was wrong

  1. Wrong target — the repo's example1 target had 4 incorrect cells (row 7). Discovered by cross-referencing against the real ARC dataset (data/training/007bbfb7.json). Fixed.
  2. Limited transform library — the beam only had vanilla tile, fill_enclosed, rotate90, reflect_h. None could express the Kronecker self-similar pattern. Fixed: added 19 new transforms.
  3. Beam only tried resized input — shape-changing transforms (Kronecker: 3×3→9×9) need the original input, not the tiled 9×9 intermediate. The beam now uses a dual-strategy: each transform is tried on both the resized field AND the original input.

The actual transformation (ARC task 007bbfb7)

output = np.kron((input != 0).astype(int), input) — a Kronecker product where the input's own nonzero mask determines the meta-layout for placing copies of itself.


Immediate (blockers) — ALL RESOLVED

  • Add candidate snapshot to beam logs ✅ Done.
  • Ensure gate values are booleans ✅ Done.
  • Make tile transform nontrivial ✅ Done — ShiftedTile wired in + 19 new transforms including KroneckerSelfSimilar.
  • Implement robust fill_enclosed ✅ Done — BFS in solver_core.py.
  • Fix σ=98 flatline ✅ Done — σ=0 on all 6 pairs.

Short term

  • Add CLI entrypoint with --use_wandb flag ✅ Done — scripts/entrypoint.py.
  • Add unit tests for transforms ✅ Done — tests/test_transforms.py (40 tests, all pass).
  • Add small visualization notebook for phi_best, diff maps, and Layer‑1 masks.

Medium term

  • Improve Layer‑1 mask generation ✅ Done.
  • Add a toggle to include/exclude candidate_array in logs to control log size.
  • Create a reproducible benchmark harness ✅ Done — experiment_driver.sweep() + results.csv.
  • Expand to more ARC tasks — test on other 3×3→9×9 tasks and different task families.
  • Benchmark the enriched library — run sweep across multiple ARC tasks, measure solve rate.

Long term

  • Integrate a safe external W&B uploader ✅ Done.
  • Build task loader for full ARC dataset — load any task from fchollet/ARC-AGI by ID.
  • Add more transform families — connected components, object extraction, voronoi fill (see Icecuber DSL: arxiv:2402.03507).
  • Automated evaluation harness — run solver on all 400 ARC training tasks, report solve rate.
  • Document reproducibility steps and expected outputs for each example task.

Code hygiene (completed)

  • Duplicate Transform class in transforms.py ✅ Fixed.
  • Duplicate imports/paste blocks in solver_core.py ✅ Fixed.
  • Lambda closure bug in default_atomic_factory ✅ Fixed.
  • wandb_runner.py int(generate_id(), 36) crash ✅ Fixed.
  • minimal_runner.py TARGET was all‑zeros ✅ Fixed.
  • README.md referenced non‑existent paths ✅ Fixed.
  • Committed .pyc files ✅ Fixed.
  • itt_solver/README.md.md double extension ✅ Fixed.
  • Wrong target in example1 (4 cells off) ✅ Fixed — corrected in entrypoint.py, experiments_analysis.py, fix_and_inspect_logs.py, minimal_runner.py.
  • Beam search only applied transforms to resized field ✅ Fixed — dual-strategy (resized + original).

New transforms added (19 total)

Transform Description
KroneckerSelfSimilar kron((I≠0), I) — self-similar meta-layout
KroneckerSelfSimilarInv kron(I, (I≠0)) — mirror variant
MirrorTileH [abc|cba] horizontal mirror
MirrorTileV Vertical mirror stack
MirrorTile4Way Full kaleidoscope (D4)
Upscale(2) / Upscale(3) Pixel-repeat zoom
Downscale(2) Subsample (inverse of upscale)
StackH(3) / StackV(3) Tile horizontally/vertically
RetainColor(c) Keep only color c
RemoveColor(c) Zero out color c
InvertColors Swap black ↔ top color
GravityDown / GravityUp Pixels fall/rise in columns
OverlayTransparent(bg) Transparent overlay on background
CropToContent Crop to non-zero bounding box
Transpose Matrix transpose
ShiftedTile Tile with roll offset