Update LEARNING.md — add ITT source repo findings, object layer lessons, architecture direction
Browse files- LEARNING.md +82 -1
LEARNING.md
CHANGED
|
@@ -30,15 +30,85 @@ ARC task 007bbfb7 has one of the cleanest rules in the dataset: `output = np.kro
|
|
| 30 |
### Don't commit `.pyc` files
|
| 31 |
They cause stale‑import bugs when the source changes. Add `__pycache__/` and `*.pyc` to `.gitignore` from day one.
|
| 32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
## Debugging checklist
|
| 34 |
1. Confirm `experiments/` contains `*_phi_best.npy` and `*_logs.json`.
|
| 35 |
2. Run `scripts/fix_and_inspect_logs.py` to coerce gates and attach candidate snapshot.
|
| 36 |
3. Inspect `logs[0]` for `candidate_array` and recomputed residue.
|
| 37 |
4. Run `itt_solver.tests.run_atomic_effects()` to verify transforms change the input.
|
| 38 |
-
5. Run `tests/test_transforms.py` to verify all
|
| 39 |
6. Run a relaxed smoke beam (lock_coeff=0, max_fraction=1.0, beam_width≥6) and inspect `sigmas`.
|
| 40 |
7. **If σ flatlines:** check whether transforms are idempotent on the input the beam feeds them. Print the shape of the field each transform receives.
|
| 41 |
8. **If σ is nonzero but close:** diff `phi_best` against target cell by cell. The pattern of mismatches usually reveals the missing transform.
|
|
|
|
| 42 |
|
| 43 |
## Practical tips
|
| 44 |
- After editing package files, **clear Python module cache** or restart the interpreter.
|
|
@@ -47,3 +117,14 @@ They cause stale‑import bugs when the source changes. Add `__pycache__/` and `
|
|
| 47 |
- **Always cross‑check targets** against the original ARC dataset before concluding the solver is wrong.
|
| 48 |
- When adding a new transform, write a unit test immediately — transforms that silently return the input waste hours of debugging.
|
| 49 |
- The `default_atomic_factory` is just a starting point. For new ARC task families, inspect the input→output mapping manually (decompose into sub‑blocks, check symmetries, test Kronecker/mirror/upscale) and add targeted transforms.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
### Don't commit `.pyc` files
|
| 31 |
They cause stale‑import bugs when the source changes. Add `__pycache__/` and `*.pyc` to `.gitignore` from day one.
|
| 32 |
|
| 33 |
+
## Lessons from the 10% evaluation (object layer + greedy stacker)
|
| 34 |
+
|
| 35 |
+
### Brute-force DSL works but hits a ceiling fast
|
| 36 |
+
Going from 19 → 33 transforms only gained 9 tasks (31 → 40). Each new transform has diminishing returns because most unsolved tasks need **task-level reasoning** (analyzing all training pairs together), not just more primitives. The DSL beam search treats each pair independently — it can't learn "frame size determines fill color" because it never compares pairs.
|
| 37 |
+
|
| 38 |
+
### The greedy stacker unlocks overlay composition
|
| 39 |
+
Sequential composition `T2(T1(x))` fails when the output is two independent views overlaid. The greedy stacker tries `overlay(T1(x), T2(x))` — two transforms applied independently to the input, then combined. Task `ded97339` was solved this way: `overlay(ConnectSameColorV, ConnectSameColorH)`. Without the stacker, no amount of beam depth would have found this.
|
| 40 |
+
|
| 41 |
+
### Object extraction is necessary but not sufficient
|
| 42 |
+
Adding `ExtractLargestObject`, `ExtractSmallestObject`, etc. solved tasks like `1f85a75f` and `23b5c85d`. But most object-centric tasks (40% of ARC) need more than extraction — they need **object movement**, **recoloring by attribute**, and **conditional placement**. Extraction without manipulation is like having eyes without hands.
|
| 43 |
+
|
| 44 |
+
### Every new transform should be tested on the full 400
|
| 45 |
+
When we added 14 transforms, 6 contributed to new solves. The other 8 didn't hurt (zero regressions) but didn't help either. Running the full 400-task eval after each change is cheap (51 seconds) and tells you immediately whether a transform matters.
|
| 46 |
+
|
| 47 |
+
## Lessons from the original ITT source repo
|
| 48 |
+
|
| 49 |
+
### Read the source repo first
|
| 50 |
+
The original ITT implementation ([Sensei-Intent-Tensor/0.0_ARC_AGI](https://github.com/Sensei-Intent-Tensor/0.0_ARC_AGI)) contains architectural decisions we should have adopted from the start. We built a DSL brute-force solver when the PEMF framework is actually a **physics-first** solver with derived operations. The source has `ITT_PURE_SOLVER.py` (1339 lines), `ITT_ARC_FOUNDATION.py` (635 lines), `itt_primitives.py` (546 lines), and three solver versions (v4, v5B, v5C).
|
| 51 |
+
|
| 52 |
+
### The dual-field split is not optional
|
| 53 |
+
The original uses **two fields simultaneously**:
|
| 54 |
+
- `Φ_q` = integer grid (ARC colors 0–9) for **semantic decisions** (what color? which object?)
|
| 55 |
+
- `Φ̃` = smoothed float field (2-step discrete diffusion of Φ_q) for **operator computation** (gradient, Laplacian, boundary charge)
|
| 56 |
+
|
| 57 |
+
Rule: *Read Φ_q for semantics. Compute on Φ̃ for math.* We've been using a single float array for everything. This conflates semantic truth with numerical stability. The smooth field makes operators (gradient, Laplacian, boundary charge) stable and continuous; the quantized field keeps color identity exact.
|
| 58 |
+
|
| 59 |
+
### ρ_q is third-order, not FFT imaginary
|
| 60 |
+
Our `layer_minus_one.py` uses FFT imaginary components to find edit zones. The original defines boundary charge as `ρ_q = |∇(∇²Φ̃)|` — the gradient of the Laplacian of the smooth field. This is a third-order derivative that detects where **curvature changes**, not just where values change. The threshold is physics-derived: `μ + 1.5σ` on nonzero ρ_q values (statistical outlier detection), not an arbitrary percentile.
|
| 61 |
+
|
| 62 |
+
### Fan Signatures route tasks before search
|
| 63 |
+
The original classifies each task with a 6-bit signature `[Δ₁..Δ₆]`:
|
| 64 |
+
- Δ₁ (∇Φ): gradient/boundary active
|
| 65 |
+
- Δ₂ (∇×F): curl/rotation/reflection active
|
| 66 |
+
- Δ₃ (+∇²Φ): expansion/tiling
|
| 67 |
+
- Δ₄ (−∇²Φ): compression/interior detection
|
| 68 |
+
- Δ₅ (∂Φ/∂t): temporal/period detection
|
| 69 |
+
- Δ₆ (Φ₀): scalar anchor/color remapping
|
| 70 |
+
|
| 71 |
+
This routes the task to the correct solver family **before** any transform enumeration. Only 2⁶ = 64 possible signatures, and ~15–20 correspond to real ARC patterns. We brute-force all 33 transforms on every task — the Fan Signature would eliminate 80%+ of irrelevant transforms per task.
|
| 72 |
+
|
| 73 |
+
### SigmaResidue types the transformation, not just measures it
|
| 74 |
+
The original doesn't just compute `σ = Σ|Φ'(p) − Φ(p)|`. It classifies the residue into: `fill/enclosed`, `expansion/size_increase`, `compression/size_decrease`, `recolor/substitution`, `erase/removal`, `identity/none`, `mixed/complex`. This classification determines which rule to try. We skip this and try everything — which works for 33 transforms but won't scale.
|
| 75 |
+
|
| 76 |
+
### TransformationRule.learn() replaces brute-force with targeted learning
|
| 77 |
+
The original learns rules from training pairs: `tile_pattern` (which reflection per block), `size_to_color` (frame size → fill color), `frame_to_fill` (fallback mapping), `color_map`, `shape_to_color` (Laplacian eigenspectrum → output color). This handles multi-region fill, shape indicator, and periodic extension tasks that our beam search can't reach because the rule depends on comparing **across training pairs**, not just minimizing σ on one pair.
|
| 78 |
+
|
| 79 |
+
### Shape matching via Laplacian eigenspectrum
|
| 80 |
+
The eigenvalues of the restricted Laplacian on an object's positions form a **translation and rotation invariant** shape fingerprint. The original uses this to solve "shape indicator" tasks where the shape of a small object determines the output color. Our BFS-based object extraction can find objects but can't compare their shapes this way.
|
| 81 |
+
|
| 82 |
+
### Spectral region separation — no BFS
|
| 83 |
+
The original partitions connected components via eigendecomposition of the graph Laplacian (Fiedler vector), not BFS flood-fill. The number of near-zero eigenvalues equals the number of components. This is derived from the field operators, not an algorithm bolted on.
|
| 84 |
+
|
| 85 |
+
### The "smuggling" self-audit is a model of intellectual honesty
|
| 86 |
+
`docs/ITT_DERIVATION_GAPS.md` explicitly lists which concepts are properly derived from ITT axioms and which are "smuggled" (borrowed from external algorithms). v4 claims to close most gaps: explicit Φ_q (not `np.round`), spectral cuts (not adjacency growth), level sets (not flood-fill), physics-derived thresholds (not arbitrary). This audit discipline is worth adopting.
|
| 87 |
+
|
| 88 |
+
## Architecture direction: ITT-first, DSL-fallback
|
| 89 |
+
|
| 90 |
+
The plan going forward is to integrate the ITT physics as a **first-pass solver** before the DSL beam search:
|
| 91 |
+
|
| 92 |
+
1. Compute PhiField (Φ_q + Φ̃) for each task
|
| 93 |
+
2. Compute Fan Signature → route to pattern class
|
| 94 |
+
3. Run TransformationRule.learn() on training pairs → typed rule
|
| 95 |
+
4. If learned rule achieves σ=0 on ALL train pairs → use it (high confidence)
|
| 96 |
+
5. Otherwise → fall through to DSL beam search + greedy stacker (our existing 33-transform pipeline)
|
| 97 |
+
|
| 98 |
+
This means ITT handles tasks it can **derive** from the field (fill, tile, recolor, shape indicator, period), and DSL handles the rest (object extraction, gravity, mirror, compress). No regression risk — the DSL path is unchanged, ITT only adds capability.
|
| 99 |
+
|
| 100 |
+
The 10-phase implementation plan is in `TODO.md`.
|
| 101 |
+
|
| 102 |
## Debugging checklist
|
| 103 |
1. Confirm `experiments/` contains `*_phi_best.npy` and `*_logs.json`.
|
| 104 |
2. Run `scripts/fix_and_inspect_logs.py` to coerce gates and attach candidate snapshot.
|
| 105 |
3. Inspect `logs[0]` for `candidate_array` and recomputed residue.
|
| 106 |
4. Run `itt_solver.tests.run_atomic_effects()` to verify transforms change the input.
|
| 107 |
+
5. Run `tests/test_transforms.py` to verify all transform unit tests pass.
|
| 108 |
6. Run a relaxed smoke beam (lock_coeff=0, max_fraction=1.0, beam_width≥6) and inspect `sigmas`.
|
| 109 |
7. **If σ flatlines:** check whether transforms are idempotent on the input the beam feeds them. Print the shape of the field each transform receives.
|
| 110 |
8. **If σ is nonzero but close:** diff `phi_best` against target cell by cell. The pattern of mismatches usually reveals the missing transform.
|
| 111 |
+
9. **If ITT rule-learning fails:** check Fan Signature routing — is the task classified correctly? Print `SigmaResidue.change_type` for each training pair to verify the transformation is typed correctly.
|
| 112 |
|
| 113 |
## Practical tips
|
| 114 |
- After editing package files, **clear Python module cache** or restart the interpreter.
|
|
|
|
| 117 |
- **Always cross‑check targets** against the original ARC dataset before concluding the solver is wrong.
|
| 118 |
- When adding a new transform, write a unit test immediately — transforms that silently return the input waste hours of debugging.
|
| 119 |
- The `default_atomic_factory` is just a starting point. For new ARC task families, inspect the input→output mapping manually (decompose into sub‑blocks, check symmetries, test Kronecker/mirror/upscale) and add targeted transforms.
|
| 120 |
+
- **Run the full 400-task eval** after every change. It takes 51 seconds and catches regressions instantly.
|
| 121 |
+
- **Read the original ITT source** before implementing anything. The physics derivation often suggests a cleaner approach than ad-hoc DSL heuristics.
|
| 122 |
+
|
| 123 |
+
## Key references
|
| 124 |
+
- Original ITT ARC solver: [Sensei-Intent-Tensor/0.0_ARC_AGI](https://github.com/Sensei-Intent-Tensor/0.0_ARC_AGI)
|
| 125 |
+
- ITT physics textbook: [Sensei-Intent-Tensor/0.0._Executable_Physics](https://github.com/Sensei-Intent-Tensor/0.0._Executable_Physics)
|
| 126 |
+
- Zenodo record: [https://zenodo.org/records/18077258](https://zenodo.org/records/18077258)
|
| 127 |
+
- ARC dataset: [fchollet/ARC-AGI](https://github.com/fchollet/ARC-AGI)
|
| 128 |
+
- Icecuber DSL analysis: [arxiv:2402.03507](https://arxiv.org/abs/2402.03507)
|
| 129 |
+
- SOAR (52% ARC-1): [arxiv:2507.14172](https://arxiv.org/abs/2507.14172)
|
| 130 |
+
- arc-dsl primitives: [michaelhodel/arc-dsl](https://github.com/michaelhodel/arc-dsl)
|