File size: 12,830 Bytes
b48dd06
 
 
 
 
 
 
 
 
9b5600c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
01918a5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b48dd06
9b5600c
 
 
 
01918a5
9b5600c
 
 
01918a5
b48dd06
 
9b5600c
 
b48dd06
9b5600c
 
 
01918a5
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# Lessons Learned (Critical Notes)

## Core lessons
- **Log the candidate arrays early.** Without the actual candidate field, debugging accepted candidates is guesswork. Always include a small, quantized snapshot in logs for reproducibility.
- **Avoid side effects inside the solver run.** External logging (W&B) must be optional and executed outside the core run to avoid recursion and hidden failures.
- **Transforms must produce visible edits.** If transforms are no‑ops for typical inputs, the beam cannot reduce residue. Unit‑test transforms with small inputs.
- **Coerce logged gate values to booleans.** Strings like `"True"` break automated gate analysis and mislead debugging.
- **Quantize and resize consistently.** Residue must be computed on the same quantized and tiled array the beam evaluates (use `np.rint` and `tile_transform`).

## Lessons from the σ=0 breakthrough

### Verify your ground truth first
The example1 target had **4 wrong cells** in row 7. Every experiment showed σ=98 not because the solver was broken, but because the target was wrong. Cross‑reference task data against the **canonical source** (e.g. `fchollet/ARC-AGI` on GitHub). A wrong target makes every downstream analysis meaningless.

### Shape‑changing transforms need the original input
The beam was resizing `phi_in` (3×3) to the target shape (9×9) before applying transforms. This meant Kronecker (3×3 → 9×9) was applied to a 9×9 field, producing 81×81, which was then tiled back — garbage. The fix: a **dual‑strategy beam** that tries each transform on both the resized field AND the original input. Any solver that composes shape‑changing primitives needs this.

### The Kronecker self‑similar pattern
ARC task 007bbfb7 has one of the cleanest rules in the dataset: `output = np.kron((input != 0), input)`. The input's own nonzero mask becomes the meta‑layout for placing copies of itself. This is a **single depth‑1 transform** — but it was unreachable because (a) the transform didn't exist in the library, and (b) the beam couldn't feed it the right input shape.

### Lambda closures capture by reference, not value
`lambda p: tile_transform(p, (task['target_shape'][0], ...))` captures `task` by reference. In a sweep loop where `task` changes, all previously‑created lambdas silently point to the new task. Fix: bind with default args — `lambda p, _h=h, _w=w: ...`.

### Idempotent transforms are invisible to the beam
`tile_to_target ∘ tile_to_target` = `tile_to_target`. When the only accepted transform is idempotent, σ flatlines across all depths. The symptom is a constant sigma trace like `[98, 98, 98, 98]`. When you see this, the library is too limited — not the search.

### Duplicate class definitions cause subtle bugs
`solver_core.py` defined `Transform` with `.compose()` and `params`. `transforms.py` defined its own `Transform` without those. Both had `.apply()`, so things ran — but composed transforms could silently lose parameters. Rule: **one canonical class, import everywhere else**.

### Don't commit `.pyc` files
They cause stale‑import bugs when the source changes. Add `__pycache__/` and `*.pyc` to `.gitignore` from day one.

## Lessons from the 10% evaluation (object layer + greedy stacker)

### Brute-force DSL works but hits a ceiling fast
Going from 19 → 33 transforms only gained 9 tasks (31 → 40). Each new transform has diminishing returns because most unsolved tasks need **task-level reasoning** (analyzing all training pairs together), not just more primitives. The DSL beam search treats each pair independently — it can't learn "frame size determines fill color" because it never compares pairs.

### The greedy stacker unlocks overlay composition
Sequential composition `T2(T1(x))` fails when the output is two independent views overlaid. The greedy stacker tries `overlay(T1(x), T2(x))` — two transforms applied independently to the input, then combined. Task `ded97339` was solved this way: `overlay(ConnectSameColorV, ConnectSameColorH)`. Without the stacker, no amount of beam depth would have found this.

### Object extraction is necessary but not sufficient
Adding `ExtractLargestObject`, `ExtractSmallestObject`, etc. solved tasks like `1f85a75f` and `23b5c85d`. But most object-centric tasks (40% of ARC) need more than extraction — they need **object movement**, **recoloring by attribute**, and **conditional placement**. Extraction without manipulation is like having eyes without hands.

### Every new transform should be tested on the full 400
When we added 14 transforms, 6 contributed to new solves. The other 8 didn't hurt (zero regressions) but didn't help either. Running the full 400-task eval after each change is cheap (51 seconds) and tells you immediately whether a transform matters.

## Lessons from the original ITT source repo

### Read the source repo first
The original ITT implementation ([Sensei-Intent-Tensor/0.0_ARC_AGI](https://github.com/Sensei-Intent-Tensor/0.0_ARC_AGI)) contains architectural decisions we should have adopted from the start. We built a DSL brute-force solver when the PEMF framework is actually a **physics-first** solver with derived operations. The source has `ITT_PURE_SOLVER.py` (1339 lines), `ITT_ARC_FOUNDATION.py` (635 lines), `itt_primitives.py` (546 lines), and three solver versions (v4, v5B, v5C).

### The dual-field split is not optional
The original uses **two fields simultaneously**:
- `Φ_q` = integer grid (ARC colors 0–9) for **semantic decisions** (what color? which object?)
- `Φ̃` = smoothed float field (2-step discrete diffusion of Φ_q) for **operator computation** (gradient, Laplacian, boundary charge)

Rule: *Read Φ_q for semantics. Compute on Φ̃ for math.* We've been using a single float array for everything. This conflates semantic truth with numerical stability. The smooth field makes operators (gradient, Laplacian, boundary charge) stable and continuous; the quantized field keeps color identity exact.

### ρ_q is third-order, not FFT imaginary
Our `layer_minus_one.py` uses FFT imaginary components to find edit zones. The original defines boundary charge as `ρ_q = |∇(∇²Φ̃)|` — the gradient of the Laplacian of the smooth field. This is a third-order derivative that detects where **curvature changes**, not just where values change. The threshold is physics-derived: `μ + 1.5σ` on nonzero ρ_q values (statistical outlier detection), not an arbitrary percentile.

### Fan Signatures route tasks before search
The original classifies each task with a 6-bit signature `[Δ₁..Δ₆]`:
- Δ₁ (∇Φ): gradient/boundary active
- Δ₂ (∇×F): curl/rotation/reflection active
- Δ₃ (+∇²Φ): expansion/tiling
- Δ₄ (−∇²Φ): compression/interior detection
- Δ₅ (∂Φ/∂t): temporal/period detection
- Δ₆ (Φ₀): scalar anchor/color remapping

This routes the task to the correct solver family **before** any transform enumeration. Only 2⁶ = 64 possible signatures, and ~15–20 correspond to real ARC patterns. We brute-force all 33 transforms on every task — the Fan Signature would eliminate 80%+ of irrelevant transforms per task.

### SigmaResidue types the transformation, not just measures it
The original doesn't just compute `σ = Σ|Φ'(p) − Φ(p)|`. It classifies the residue into: `fill/enclosed`, `expansion/size_increase`, `compression/size_decrease`, `recolor/substitution`, `erase/removal`, `identity/none`, `mixed/complex`. This classification determines which rule to try. We skip this and try everything — which works for 33 transforms but won't scale.

### TransformationRule.learn() replaces brute-force with targeted learning
The original learns rules from training pairs: `tile_pattern` (which reflection per block), `size_to_color` (frame size → fill color), `frame_to_fill` (fallback mapping), `color_map`, `shape_to_color` (Laplacian eigenspectrum → output color). This handles multi-region fill, shape indicator, and periodic extension tasks that our beam search can't reach because the rule depends on comparing **across training pairs**, not just minimizing σ on one pair.

### Shape matching via Laplacian eigenspectrum
The eigenvalues of the restricted Laplacian on an object's positions form a **translation and rotation invariant** shape fingerprint. The original uses this to solve "shape indicator" tasks where the shape of a small object determines the output color. Our BFS-based object extraction can find objects but can't compare their shapes this way.

### Spectral region separation — no BFS
The original partitions connected components via eigendecomposition of the graph Laplacian (Fiedler vector), not BFS flood-fill. The number of near-zero eigenvalues equals the number of components. This is derived from the field operators, not an algorithm bolted on.

### The "smuggling" self-audit is a model of intellectual honesty
`docs/ITT_DERIVATION_GAPS.md` explicitly lists which concepts are properly derived from ITT axioms and which are "smuggled" (borrowed from external algorithms). v4 claims to close most gaps: explicit Φ_q (not `np.round`), spectral cuts (not adjacency growth), level sets (not flood-fill), physics-derived thresholds (not arbitrary). This audit discipline is worth adopting.

## Architecture direction: ITT-first, DSL-fallback

The plan going forward is to integrate the ITT physics as a **first-pass solver** before the DSL beam search:

1. Compute PhiField (Φ_q + Φ̃) for each task
2. Compute Fan Signature → route to pattern class
3. Run TransformationRule.learn() on training pairs → typed rule
4. If learned rule achieves σ=0 on ALL train pairs → use it (high confidence)
5. Otherwise → fall through to DSL beam search + greedy stacker (our existing 33-transform pipeline)

This means ITT handles tasks it can **derive** from the field (fill, tile, recolor, shape indicator, period), and DSL handles the rest (object extraction, gravity, mirror, compress). No regression risk — the DSL path is unchanged, ITT only adds capability.

The 10-phase implementation plan is in `TODO.md`.

## Debugging checklist
1. Confirm `experiments/` contains `*_phi_best.npy` and `*_logs.json`.
2. Run `scripts/fix_and_inspect_logs.py` to coerce gates and attach candidate snapshot.
3. Inspect `logs[0]` for `candidate_array` and recomputed residue.
4. Run `itt_solver.tests.run_atomic_effects()` to verify transforms change the input.
5. Run `tests/test_transforms.py` to verify all transform unit tests pass.
6. Run a relaxed smoke beam (lock_coeff=0, max_fraction=1.0, beam_width≥6) and inspect `sigmas`.
7. **If σ flatlines:** check whether transforms are idempotent on the input the beam feeds them. Print the shape of the field each transform receives.
8. **If σ is nonzero but close:** diff `phi_best` against target cell by cell. The pattern of mismatches usually reveals the missing transform.
9. **If ITT rule-learning fails:** check Fan Signature routing — is the task classified correctly? Print `SigmaResidue.change_type` for each training pair to verify the transformation is typed correctly.

## Practical tips
- After editing package files, **clear Python module cache** or restart the interpreter.
- Keep `candidate_array` optional in logs to avoid huge files; include it for debugging runs.
- Use small, deterministic transforms for initial debugging (Rotate, Reflect, ShiftedTile).
- **Always cross‑check targets** against the original ARC dataset before concluding the solver is wrong.
- When adding a new transform, write a unit test immediately — transforms that silently return the input waste hours of debugging.
- The `default_atomic_factory` is just a starting point. For new ARC task families, inspect the input→output mapping manually (decompose into sub‑blocks, check symmetries, test Kronecker/mirror/upscale) and add targeted transforms.
- **Run the full 400-task eval** after every change. It takes 51 seconds and catches regressions instantly.
- **Read the original ITT source** before implementing anything. The physics derivation often suggests a cleaner approach than ad-hoc DSL heuristics.

## Key references
- Original ITT ARC solver: [Sensei-Intent-Tensor/0.0_ARC_AGI](https://github.com/Sensei-Intent-Tensor/0.0_ARC_AGI)
- ITT physics textbook: [Sensei-Intent-Tensor/0.0._Executable_Physics](https://github.com/Sensei-Intent-Tensor/0.0._Executable_Physics)
- Zenodo record: [https://zenodo.org/records/18077258](https://zenodo.org/records/18077258)
- ARC dataset: [fchollet/ARC-AGI](https://github.com/fchollet/ARC-AGI)
- Icecuber DSL analysis: [arxiv:2402.03507](https://arxiv.org/abs/2402.03507)
- SOAR (52% ARC-1): [arxiv:2507.14172](https://arxiv.org/abs/2507.14172)
- arc-dsl primitives: [michaelhodel/arc-dsl](https://github.com/michaelhodel/arc-dsl)