Roger MT commited on
Commit
feb08d1
·
1 Parent(s): 387183f

move fles into pemf folder

Browse files
pemf/README_PEMF.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Pre‑Emergence Mechanics Framework (PEMF) — ARC‑AGI
2
+
3
+ Short summary
4
+ The Pre‑Emergence Mechanics Framework (PEMF) frames ARC tasks as a boundary‑constrained field problem solved by minimizing irreducible residue (o) under writability gates. PEMF implements four core primitives — **Scalar Potential (+)**, **Gradient Ordering (V)**, **Residue (o)**, and **Boundary Charge (p_q)** — and composes atomic transforms (tile, shifted tile, fill_enclosed, rotate, reflect, etc.) in a beam search to drain residue and produce stable outputs.
5
+
6
+ Why this matters
7
+ PEMF shows how ARC tasks can be solved mechanically (o‑minimization + gates) rather than by symbolic heuristics. The approach maps CTS/ITT primitives to executable operators (potential fields, gradients, Dirichlet masks, complex projections) and yields a reproducible solver recipe.
8
+
9
+ Key concepts (one line each)
10
+ - **Scalar Potential (+):** represent grid as numeric potential field (initialize_potential).
11
+ - **Gradient Ordering (V):** discrete gradients direct admissible edits.
12
+ - **Residue (o):** L1 misalignment after quantize+tile; objective to minimize.
13
+ - **Boundary Charge (p_q):** Dirichlet boundary mask that enforces writability gates.
14
+ - **Layer‑1 diagnostics:** complex projection (FFT imag component) to find latent edit zones when real signal is weak.
15
+
16
+ Files and examples
17
+ - **Skill artifacts:** `SKILLS/pre_emergence_mechanics_framework/` — howto, runnable example `references/examples/verify_pemf.py`, and README for the skill.
18
+ - **Postprocess logs:** `experiments/postprocess_logs.py` — coerce gate booleans and attach candidate snapshots for offline inspection.
19
+ - **Headless entry:** `scripts/entrypoint.py` — run experiments from CLI; `--use_wandb` flag is optional and defaults to off.
20
+
21
+ Quick verification (headless)
22
+ 1. Run the PEMF example to verify primitives and a tiny compositional loop:
23
+ ```bash
24
+ python SKILLS/pre_emergence_mechanics_framework/references/examples/verify_pemf.py
25
+ ```
26
+ 2. Run a single experiment (example):
27
+ ```bash
28
+ python scripts/entrypoint.py --task example1 --out_dir experiments
29
+ ```
30
+ 3. Postprocess logs to attach candidate snapshot and coerce gates:
31
+ ```bash
32
+ python experiments/postprocess_logs.py
33
+ ```
34
+
35
+ Acceptance checks
36
+ - `verify_pemf.py` prints a residue trace and reports at least one admissible edit zone from the complex projection.
37
+ - `experiments/*_phi_best.npy` and `experiments/*_logs.fixed.json` exist after a run and contain candidate snapshot and boolean gates for inspection.
38
+
39
+ References and provenance
40
+ This README summarizes the executable PEMF recipe derived from the ARC‑AGI exposition (PEMF / CTS / ITT). See `SKILLS/pre_emergence_mechanics_framework/references/` for runnable examples and a step‑by‑step how‑to.
pemf/arc_results/RESULTS.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PEMF Solver — ARC-AGI Training Set Evaluation
2
+
3
+ ## Results (v4 — ITT + Predicate + DSL)
4
+
5
+ | Metric | v1 | v2 | v3 | **v4** |
6
+ |---|---|---|---|---|
7
+ | **Tasks solved** | 31 (7.8%) | 40 (10.0%) | 47 (11.8%) | **70 (17.5%)** |
8
+ | via ITT | — | — | 16 | **16** |
9
+ | via Predicate | — | — | — | **25** |
10
+ | via DSL | 31 | 40 | 31 | **29** |
11
+ | Total time | 17s | 51s | 36s | **38s** |
12
+ | Regressions | — | 0 | 0 | **0** |
13
+
14
+ ## Predicate Engine Breakdown (25 new solves)
15
+
16
+ | Rule Type | Tasks | Description |
17
+ |---|---|---|
18
+ | neighborhood_rule | 20 | CA-style: (center_color, neighbor_signature) → output_color |
19
+ | global_enclosed_fill | 2 | Fill all bg regions not reachable from border |
20
+ | object predicate×action | 2 | E.g. "remove smallest object" |
21
+ | per_object_enclosed_fill | 1 | Fill each object's interior with its color |
22
+
23
+ ## Architecture: 3-Pass Pipeline
24
+
25
+ ```
26
+ Task → ITT Physics → Predicate Enumeration → DSL Beam Search
27
+ (16 tasks) (25 tasks) (29 tasks)
28
+ ```
29
+
30
+ 1. **ITT** (PhiField + σ-analysis + Fan Signatures → rule learning)
31
+ 2. **Predicate** (enclosed fill → neighborhood rules → object predicate×action)
32
+ 3. **DSL** (33 transforms + dual-strategy beam + greedy stacker)
33
+
34
+ Each pass only runs if the previous one fails. Zero regression risk.
pemf/arc_results/already_solved.json ADDED
@@ -0,0 +1 @@
 
 
1
+ ["007bbfb7", "00d62c1b", "0d3d703e", "1190e5a7", "1cf80156", "1e0a9b12", "1f85a75f", "2013d3e2", "22168020", "22eb0ac0", "239be575", "23b5c85d", "28bf18c6", "2dee498d", "3618c87e", "3906de3d", "3aa6fb7a", "3af2c5a8", "3c9b0459", "42a50994", "4347f46a", "50cb2852", "6150a2bd", "62c24649", "67385a82", "67a3c6ac", "67e8384a", "68b16354", "6d0aefbc", "6f8cd79b", "6fa7a44f", "746b3537", "74dd1130", "7b7f7511", "7e0986d6", "7f4411dc", "868de0fa", "8be77c9e", "8d5021e8", "91714a58", "9172f3a0", "9565186b", "9dfd6313", "a416b8f3", "a5313dff", "a699fb00", "aabf363d", "aedd82e4", "b1948b0a", "b6afb2da", "ba97ae07", "bb43febb", "bda2d7a6", "be94b721", "c0f76784", "c59eb873", "c8f0f002", "c9e6f938", "d10ecb37", "d23f8c26", "d511f180", "d631b094", "d90796e8", "d9fac9be", "de1cd16c", "ded97339", "e26a3af2", "eb5a1d5d", "ed36ccf7", "f76d97a5"]
pemf/experiments/example1_20260428T172250Z_logs.json ADDED
@@ -0,0 +1 @@
 
 
1
+ [[{"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}], [{"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}], [{"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}]]
pemf/experiments/example1_20260428T172250Z_phi_best.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:660ada98c4dfce4cdf016cac4f3432f7e589a0c758e0a74a97f5719f4972caee
3
+ size 776
pemf/experiments/example1_20260428T172250Z_result.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "task_name": "example1",
3
+ "params": {
4
+ "beam_width": 6,
5
+ "max_depth": 3,
6
+ "lock_coeff": 0.0,
7
+ "max_fraction": 1.0,
8
+ "enable_layer_minus_one": true,
9
+ "boundary_source": "target",
10
+ "wandb_project": "itt_solver",
11
+ "wandb_anonymous": "allow"
12
+ },
13
+ "final_sigma": 98.0,
14
+ "sigma_trace": [
15
+ 98.0,
16
+ 98.0,
17
+ 98.0,
18
+ 98.0
19
+ ],
20
+ "time_s": 0.008741617202758789,
21
+ "transform": "<Transform Id\u2218tile_to_target\u2218tile_to_target\u2218tile_to_target>",
22
+ "states_count": 4
23
+ }
pemf/experiments/example1_20260428T172311Z_logs.json ADDED
@@ -0,0 +1 @@
 
 
1
+ [[{"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}], [{"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}]]
pemf/experiments/example1_20260428T172311Z_phi_best.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:660ada98c4dfce4cdf016cac4f3432f7e589a0c758e0a74a97f5719f4972caee
3
+ size 776
pemf/experiments/example1_20260428T172311Z_result.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "task_name": "example1",
3
+ "params": {
4
+ "beam_width": 4,
5
+ "max_depth": 2,
6
+ "lock_coeff": 0.0,
7
+ "max_fraction": 0.5,
8
+ "enable_layer_minus_one": true,
9
+ "boundary_source": "target",
10
+ "use_symmetry": false
11
+ },
12
+ "final_sigma": 98.0,
13
+ "sigma_trace": [
14
+ 98.0,
15
+ 98.0,
16
+ 98.0
17
+ ],
18
+ "time_s": 0.0020961761474609375,
19
+ "transform": "<Transform Id\u2218tile_to_target\u2218tile_to_target>",
20
+ "states_count": 3
21
+ }
pemf/experiments/results.csv ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ task_name,params,final_sigma,time_s,transform,sigma_trace
2
+ example1,"{""beam_width"": 4, ""max_depth"": 2, ""lock_coeff"": 0.0, ""max_fraction"": 0.5, ""enable_layer_minus_one"": false, ""boundary_source"": ""target"", ""use_symmetry"": true}",98.0,0.003506183624267578,<Transform Id∘tile_to_target∘tile_to_target>,"[98.0, 98.0, 98.0]"
3
+ example1,"{""beam_width"": 4, ""max_depth"": 2, ""lock_coeff"": 0.0, ""max_fraction"": 0.5, ""enable_layer_minus_one"": false, ""boundary_source"": ""target"", ""use_symmetry"": false}",98.0,0.0017173290252685547,<Transform Id∘tile_to_target∘tile_to_target>,"[98.0, 98.0, 98.0]"
4
+ example1,"{""beam_width"": 4, ""max_depth"": 2, ""lock_coeff"": 0.0, ""max_fraction"": 0.5, ""enable_layer_minus_one"": true, ""boundary_source"": ""target"", ""use_symmetry"": true}",98.0,0.0046575069427490234,<Transform Id∘tile_to_target∘tile_to_target>,"[98.0, 98.0, 98.0]"
5
+ example1,"{""beam_width"": 4, ""max_depth"": 2, ""lock_coeff"": 0.0, ""max_fraction"": 0.5, ""enable_layer_minus_one"": true, ""boundary_source"": ""target"", ""use_symmetry"": false}",98.0,0.0020961761474609375,<Transform Id∘tile_to_target∘tile_to_target>,"[98.0, 98.0, 98.0]"
pemf/experiments_analysis.py ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Quick diagnostics for itt_solver experiments.
3
+
4
+ Usage (from notebook or shell):
5
+ python experiments_analysis.py
6
+
7
+ It will:
8
+ - list recent files in experiments/
9
+ - print the latest result.json
10
+ - print depth-0 logs (candidates, gates, residues)
11
+ - load the latest phi_best and compute L1 vs a provided target (if you set TARGET_GRID below)
12
+ - test atomic transforms from default_atomic_factory to see if they change the input
13
+ """
14
+
15
+ import os
16
+ import glob
17
+ import json
18
+ import numpy as np
19
+ from pprint import pprint
20
+
21
+ # === Corrected target from real ARC task 007bbfb7 (Kronecker self-similar) ===
22
+ TARGET_GRID = [
23
+ [0,0,0,0,7,7,0,7,7],
24
+ [0,0,0,7,7,7,7,7,7],
25
+ [0,0,0,0,7,7,0,7,7],
26
+ [0,7,7,0,7,7,0,7,7],
27
+ [7,7,7,7,7,7,7,7,7],
28
+ [0,7,7,0,7,7,0,7,7],
29
+ [0,0,0,0,7,7,0,7,7],
30
+ [0,0,0,7,7,7,7,7,7],
31
+ [0,0,0,0,7,7,0,7,7],
32
+ ]
33
+
34
+ EXPERIMENTS_DIR = "experiments"
35
+
36
+ def list_recent_files(n=20):
37
+ files = sorted(glob.glob(os.path.join(EXPERIMENTS_DIR, "*")))
38
+ print(f"Recent files (last {n}):")
39
+ for f in files[-n:]:
40
+ print(" ", f)
41
+ return files
42
+
43
+ def load_latest_result():
44
+ res_files = sorted(glob.glob(os.path.join(EXPERIMENTS_DIR, "*_result.json")))
45
+ if not res_files:
46
+ print("No result.json files found in experiments/")
47
+ return None, None
48
+ latest = res_files[-1]
49
+ print("\nLatest result file:", latest)
50
+ with open(latest) as fh:
51
+ data = json.load(fh)
52
+ pprint(data)
53
+ return latest, data
54
+
55
+ def load_latest_logs():
56
+ logs_files = sorted(glob.glob(os.path.join(EXPERIMENTS_DIR, "*_logs.json")))
57
+ if not logs_files:
58
+ print("No logs.json files found in experiments/")
59
+ return None, None
60
+ latest = logs_files[-1]
61
+ print("\nLatest logs file:", latest)
62
+ with open(latest) as fh:
63
+ logs = json.load(fh)
64
+ if logs and isinstance(logs, list) and len(logs) > 0:
65
+ print("\nDepth 0 log entries (summary):")
66
+ for i, entry in enumerate(logs[0]):
67
+ atomic = entry.get('atomic')
68
+ accepted = entry.get('accepted')
69
+ residue = entry.get('residue')
70
+ energy = entry.get('energy')
71
+ gates = entry.get('gates')
72
+ print(f"{i}: {atomic} | accepted={accepted} | residue={residue} | energy={energy} | gates={gates}")
73
+ else:
74
+ print("Logs format unexpected or empty.")
75
+ return latest, logs
76
+
77
+ def load_latest_phi():
78
+ phi_files = sorted(glob.glob(os.path.join(EXPERIMENTS_DIR, "*_phi_best.npy")))
79
+ if not phi_files:
80
+ print("No phi_best.npy files found in experiments/")
81
+ return None, None
82
+ latest = phi_files[-1]
83
+ print("\nLatest phi_best file:", latest)
84
+ phi = np.load(latest)
85
+ print("phi_best shape:", phi.shape, "unique values:", np.unique(phi))
86
+ return latest, phi
87
+
88
+ def l1_residue_check(phi, target_grid):
89
+ if phi is None:
90
+ print("No phi provided for residue check.")
91
+ return
92
+ target = np.array(target_grid, dtype=phi.dtype)
93
+ if phi.shape != target.shape:
94
+ print("phi and target shapes differ:", phi.shape, target.shape)
95
+ try:
96
+ from itt_solver.solver_core import tile_transform
97
+ target_resized = tile_transform(target, phi.shape)
98
+ print("Resized target to phi shape for comparison.")
99
+ except Exception:
100
+ print("Could not resize target automatically.")
101
+ return
102
+ else:
103
+ target_resized = target
104
+ l1 = float(np.sum(np.abs(phi - target_resized)))
105
+ print("L1 residue between phi_best and target:", l1)
106
+ return l1
107
+
108
+ def test_atomic_effects():
109
+ print("\nTesting atomic transforms from default_atomic_factory...")
110
+ try:
111
+ from itt_solver.experiment_driver import default_atomic_factory
112
+ from itt_solver.solver_core import initialize_potential, tile_transform
113
+ except Exception as e:
114
+ print("Could not import default_atomic_factory or solver_core:", e)
115
+ return
116
+ params = {'beam_width':6,'max_depth':3,'lock_coeff':0.0,'max_fraction':1.0,'enable_layer_minus_one':True,'boundary_source':'target'}
117
+ task_stub = {'target_shape': (9,9)}
118
+ atomic_library = default_atomic_factory(params, task_stub)
119
+ phi_in = initialize_potential([[0,7,7],[7,7,7],[0,7,7]])
120
+ print("Input shape:", phi_in.shape, "unique:", np.unique(phi_in))
121
+ for T in atomic_library:
122
+ try:
123
+ out = T.apply(phi_in.copy())
124
+ except Exception as e:
125
+ print(repr(T), "apply() raised:", e)
126
+ continue
127
+ out_resized = out
128
+ if out.shape != phi_in.shape:
129
+ try:
130
+ out_resized = tile_transform(out, phi_in.shape)
131
+ except Exception:
132
+ try:
133
+ out_resized = np.broadcast_to(out, phi_in.shape)
134
+ except Exception:
135
+ out_resized = None
136
+ if out_resized is None:
137
+ changed = None
138
+ else:
139
+ changed = int(np.sum(out_resized != phi_in))
140
+ print(repr(T), "-> out shape", out.shape, "changed cells (compared to input):", changed)
141
+
142
+ def main():
143
+ print("=== experiments_analysis.py diagnostics ===")
144
+ list_recent_files()
145
+ load_latest_result()
146
+ load_latest_logs()
147
+ _, phi = load_latest_phi()
148
+ if phi is not None:
149
+ l1_residue_check(phi, TARGET_GRID)
150
+ test_atomic_effects()
151
+ print("\nDone.")
152
+
153
+ if __name__ == "__main__":
154
+ main()
pemf/notebooks/pemf_llm_lightning.ipynb ADDED
@@ -0,0 +1,303 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# PEMF ARC-AGI — LLM Solver (Lightning.ai / Multi-GPU)\n",
8
+ "\n",
9
+ "Runs Ollama with auto multi-GPU sharding for local inference.\n",
10
+ "\n",
11
+ "| GPU Config | Model | VRAM | Quality |\n",
12
+ "|---|---|---|---|\n",
13
+ "| 2xA10G (48GB) | qwen2.5-coder:32b | ~20GB q4 | Best |\n",
14
+ "| 2xL4 (48GB) | qwen2.5-coder:32b | ~20GB q4 | Best |\n",
15
+ "| 2xT4 (32GB) | qwen2.5-coder:14b | ~10GB q4 | Good |\n",
16
+ "| 1xA10G (24GB) | qwen2.5-coder:14b | ~10GB | Good |\n",
17
+ "| 4xA10G (96GB) | qwen2.5-coder:32b fp16 | ~65GB | Best+fast |"
18
+ ]
19
+ },
20
+ {
21
+ "cell_type": "code",
22
+ "execution_count": null,
23
+ "metadata": {},
24
+ "outputs": [],
25
+ "source": [
26
+ "# ============ CONFIGURATION ============\n",
27
+ "MODEL = 'qwen2.5-coder:32b'\n",
28
+ "# MODEL = 'qwen2.5-coder:14b' # fallback for less VRAM\n",
29
+ "N_CANDIDATES = 8"
30
+ ]
31
+ },
32
+ {
33
+ "cell_type": "code",
34
+ "execution_count": null,
35
+ "metadata": {},
36
+ "outputs": [],
37
+ "source": [
38
+ "import subprocess, os, time, json, re, glob\n",
39
+ "import numpy as np, urllib.request\n",
40
+ "from collections import Counter\n",
41
+ "\n",
42
+ "# Check GPUs\n",
43
+ "!nvidia-smi --query-gpu=index,name,memory.total --format=csv,noheader\n",
44
+ "gpu_count = len(subprocess.run(['nvidia-smi','-L'], capture_output=True, text=True).stdout.strip().split('\\n'))\n",
45
+ "print(f'GPUs: {gpu_count}')"
46
+ ]
47
+ },
48
+ {
49
+ "cell_type": "code",
50
+ "execution_count": null,
51
+ "metadata": {},
52
+ "outputs": [],
53
+ "source": [
54
+ "# Install Ollama\n",
55
+ "try:\n",
56
+ " subprocess.run(['ollama','--version'], capture_output=True, check=True)\n",
57
+ " print('Ollama installed')\n",
58
+ "except: \n",
59
+ " !curl -fsSL https://ollama.com/install.sh | sh\n",
60
+ "\n",
61
+ "# Start server (auto-detects all GPUs)\n",
62
+ "subprocess.run(['pkill','-f','ollama'], capture_output=True)\n",
63
+ "time.sleep(2)\n",
64
+ "env = os.environ.copy()\n",
65
+ "env['CUDA_VISIBLE_DEVICES'] = ','.join(str(i) for i in range(gpu_count))\n",
66
+ "server = subprocess.Popen(['ollama','serve'],\n",
67
+ " stdout=open('/tmp/ollama.log','w'), stderr=subprocess.STDOUT, env=env)\n",
68
+ "time.sleep(5)\n",
69
+ "print(f'Server PID {server.pid}, GPUs: {env[\"CUDA_VISIBLE_DEVICES\"]}')\n",
70
+ "\n",
71
+ "# Pull model\n",
72
+ "print(f'Pulling {MODEL}...')\n",
73
+ "r = subprocess.run(['ollama','pull',MODEL], capture_output=True, text=True, timeout=3600)\n",
74
+ "if r.returncode != 0:\n",
75
+ " print(f'Failed, trying 14b...'); MODEL='qwen2.5-coder:14b'\n",
76
+ " subprocess.run(['ollama','pull',MODEL], capture_output=True, text=True, timeout=3600)\n",
77
+ "print(f'{MODEL} ready')\n",
78
+ "\n",
79
+ "# Test\n",
80
+ "r = subprocess.run(['ollama','run',MODEL,'Say hello'], capture_output=True, text=True, timeout=60)\n",
81
+ "print(f'Test: {r.stdout.strip()[:80]}')\n",
82
+ "!nvidia-smi --query-gpu=index,memory.used,memory.total --format=csv,noheader"
83
+ ]
84
+ },
85
+ {
86
+ "cell_type": "code",
87
+ "execution_count": null,
88
+ "metadata": {},
89
+ "outputs": [],
90
+ "source": [
91
+ "# Download ARC data\n",
92
+ "if not os.path.exists('arc_data/training'):\n",
93
+ " !git clone --depth 1 https://github.com/fchollet/ARC-AGI.git /tmp/arc\n",
94
+ " os.makedirs('arc_data', exist_ok=True)\n",
95
+ " !cp -r /tmp/arc/data/training arc_data/training\n",
96
+ "print(f'Tasks: {len(glob.glob(\"arc_data/training/*.json\"))}')\n",
97
+ "\n",
98
+ "ALREADY_SOLVED = {\n",
99
+ " '007bbfb7','00d62c1b','0d3d703e','1190e5a7','1cf80156','1e0a9b12','1f85a75f',\n",
100
+ " '2013d3e2','22168020','22eb0ac0','239be575','23b5c85d','28bf18c6','2dee498d',\n",
101
+ " '3618c87e','3906de3d','3aa6fb7a','3af2c5a8','3c9b0459','42a50994','4347f46a',\n",
102
+ " '50cb2852','6150a2bd','62c24649','67385a82','67a3c6ac','67e8384a','68b16354',\n",
103
+ " '6d0aefbc','6f8cd79b','6fa7a44f','746b3537','74dd1130','7b7f7511','7e0986d6',\n",
104
+ " '7f4411dc','868de0fa','8be77c9e','8d5021e8','91714a58','9172f3a0','9565186b',\n",
105
+ " '9dfd6313','a416b8f3','a5313dff','a699fb00','aabf363d','aedd82e4','b1948b0a',\n",
106
+ " 'b6afb2da','ba97ae07','bb43febb','bda2d7a6','be94b721','c0f76784','c59eb873',\n",
107
+ " 'c8f0f002','c9e6f938','d10ecb37','d23f8c26','d511f180','d631b094','d90796e8',\n",
108
+ " 'd9fac9be','de1cd16c','ded97339','e26a3af2','eb5a1d5d','ed36ccf7','f76d97a5',\n",
109
+ "}\n",
110
+ "task_files = sorted(glob.glob('arc_data/training/*.json'))\n",
111
+ "unsolved = [(os.path.basename(f).replace('.json',''),f) for f in task_files\n",
112
+ " if os.path.basename(f).replace('.json','') not in ALREADY_SOLVED]\n",
113
+ "print(f'Symbolic: {len(ALREADY_SOLVED)}, LLM to try: {len(unsolved)}')"
114
+ ]
115
+ },
116
+ {
117
+ "cell_type": "code",
118
+ "execution_count": null,
119
+ "metadata": {},
120
+ "outputs": [],
121
+ "source": [
122
+ "# LLM Engine\n",
123
+ "def call_ollama(prompt, model, temperature=0.7):\n",
124
+ " payload = {'model':model,'prompt':prompt,'stream':False,\n",
125
+ " 'options':{'temperature':temperature,'num_predict':2048}}\n",
126
+ " req = urllib.request.Request('http://localhost:11434/api/generate',\n",
127
+ " data=json.dumps(payload).encode(), headers={'Content-Type':'application/json'}, method='POST')\n",
128
+ " try:\n",
129
+ " with urllib.request.urlopen(req, timeout=180) as resp:\n",
130
+ " return json.loads(resp.read().decode()).get('response','')\n",
131
+ " except Exception as e: return f'ERROR: {e}'\n",
132
+ "\n",
133
+ "def build_prompt(task):\n",
134
+ " pairs = task.get('train',[])\n",
135
+ " ex = '\\n'.join(f\"Example {i+1}:\\n Input: {json.dumps(p['input'])}\\n Output: {json.dumps(p['output'])}\"\n",
136
+ " for i,p in enumerate(pairs))\n",
137
+ " inps = [np.array(p['input']) for p in pairs]\n",
138
+ " outs = [np.array(p['output']) for p in pairs]\n",
139
+ " same = all(i.shape==o.shape for i,o in zip(inps,outs))\n",
140
+ " ic = sorted(set(c for i in inps for c in np.unique(i).tolist()))\n",
141
+ " oc = sorted(set(c for o in outs for c in np.unique(o).tolist()))\n",
142
+ " a = f\" Same shape: {same}\\n Colors in: {ic}, out: {oc}\\n\"\n",
143
+ " if not same: a += f\" Shape: {inps[0].shape} -> {outs[0].shape}\\n\"\n",
144
+ " return f\"\"\"Solve this ARC-AGI puzzle. Write ONLY a Python function, no explanations.\n",
145
+ "\n",
146
+ "{ex}\n",
147
+ "\n",
148
+ "Analysis:\n",
149
+ "{a}\n",
150
+ "```python\n",
151
+ "import numpy as np\n",
152
+ "from collections import Counter, deque\n",
153
+ "\n",
154
+ "def transform(grid: list[list[int]]) -> list[list[int]]:\n",
155
+ " grid = np.array(grid)\n",
156
+ "\"\"\"\n",
157
+ "\n",
158
+ "def extract_code(resp):\n",
159
+ " for pat in [r'```python\\s*(.*?)```', r'```\\s*(.*?)```']:\n",
160
+ " for m in re.findall(pat, resp, re.DOTALL):\n",
161
+ " if 'def transform' in m: return m.strip()\n",
162
+ " idx = resp.find('def transform')\n",
163
+ " if idx >= 0:\n",
164
+ " before = resp[:idx]\n",
165
+ " s = max(before.rfind('import '), before.rfind('from '))\n",
166
+ " code = resp[s if s>=0 else idx:]\n",
167
+ " end = code.find('```')\n",
168
+ " if end>0: code=code[:end]\n",
169
+ " return code.strip()\n",
170
+ " s = resp.strip()\n",
171
+ " if s.startswith(('import','def transform','from')): return s\n",
172
+ " return None\n",
173
+ "\n",
174
+ "def verify(code, pairs):\n",
175
+ " ns = {'np':np,'numpy':np,'Counter':Counter,'deque':__import__('collections').deque}\n",
176
+ " try:\n",
177
+ " import scipy.ndimage; ns['scipy']=__import__('scipy')\n",
178
+ " except: pass\n",
179
+ " try: exec(code, ns)\n",
180
+ " except: return False\n",
181
+ " if 'transform' not in ns: return False\n",
182
+ " fn = ns['transform']\n",
183
+ " for p in pairs:\n",
184
+ " try:\n",
185
+ " r = np.array(fn([row[:] for row in p['input']]), dtype=int)\n",
186
+ " e = np.array(p['output'], dtype=int)\n",
187
+ " if r.shape!=e.shape or not np.array_equal(r,e): return False\n",
188
+ " except: return False\n",
189
+ " return True\n",
190
+ "\n",
191
+ "def apply_prog(code, inp):\n",
192
+ " ns = {'np':np,'numpy':np,'Counter':Counter,'deque':__import__('collections').deque}\n",
193
+ " try:\n",
194
+ " import scipy.ndimage; ns['scipy']=__import__('scipy')\n",
195
+ " except: pass\n",
196
+ " try:\n",
197
+ " exec(code, ns)\n",
198
+ " r = ns['transform']([row[:] for row in inp])\n",
199
+ " if r is not None: return np.array(r,dtype=int).tolist()\n",
200
+ " except: pass\n",
201
+ " return None\n",
202
+ "\n",
203
+ "print('Engine ready')"
204
+ ]
205
+ },
206
+ {
207
+ "cell_type": "code",
208
+ "execution_count": null,
209
+ "metadata": {},
210
+ "outputs": [],
211
+ "source": [
212
+ "# Quick test\n",
213
+ "with open(f'arc_data/training/{unsolved[0][0]}.json') as f: t=json.load(f)\n",
214
+ "print(f'Test on {unsolved[0][0]}...')\n",
215
+ "s=time.time(); r=call_ollama(build_prompt(t),MODEL,0.1); e=time.time()-s\n",
216
+ "code=extract_code(r)\n",
217
+ "if code: print(f'{e:.1f}s, {len(code)}ch, verified: {\"Y\" if verify(code,t[\"train\"]) else \"N\"}')\n",
218
+ "else: print(f'{e:.1f}s, no code')\n",
219
+ "est = e*N_CANDIDATES*len(unsolved)/3600\n",
220
+ "print(f'Est total: {est:.1f}h for {len(unsolved)} tasks x {N_CANDIDATES} candidates')"
221
+ ]
222
+ },
223
+ {
224
+ "cell_type": "code",
225
+ "execution_count": null,
226
+ "metadata": {},
227
+ "outputs": [],
228
+ "source": [
229
+ "# === MAIN LOOP (crash-safe, resumable) ===\n",
230
+ "results = {}\n",
231
+ "solved = 0\n",
232
+ "total_time = 0\n",
233
+ "\n",
234
+ "if os.path.exists('llm_results.json'):\n",
235
+ " with open('llm_results.json') as f: prev=json.load(f)\n",
236
+ " results=prev.get('results',{})\n",
237
+ " solved=sum(1 for r in results.values() if r['status']=='solved')\n",
238
+ " total_time=prev.get('total_time_s',0)\n",
239
+ " print(f'Resuming: {solved} LLM-solved, {len(results)} attempted')\n",
240
+ "\n",
241
+ "for idx,(tid,tf) in enumerate(unsolved):\n",
242
+ " if tid in results: continue\n",
243
+ " with open(tf) as f: task=json.load(f)\n",
244
+ " print(f'[{idx+1:3d}/{len(unsolved)}] {tid}:',end=' ',flush=True)\n",
245
+ " s=time.time(); prompt=build_prompt(task); ok=False\n",
246
+ " for i in range(N_CANDIDATES):\n",
247
+ " temp=0.1 if i==0 else min(0.4+0.15*i,1.2)\n",
248
+ " resp=call_ollama(prompt,MODEL,temp)\n",
249
+ " if resp.startswith('ERROR:'): continue\n",
250
+ " code=extract_code(resp)\n",
251
+ " if code and verify(code,task['train']):\n",
252
+ " e=time.time()-s; total_time+=e; solved+=1\n",
253
+ " to=[apply_prog(code,t['input']) for t in task.get('test',[])]\n",
254
+ " results[tid]={'status':'solved','rule':f'llm_c{i+1}','code':code,\n",
255
+ " 'test_outputs':to,'time_s':round(e,2)}\n",
256
+ " print(f'✅ c{i+1} ({e:.1f}s) [{len(ALREADY_SOLVED)+solved}/{len(task_files)}]')\n",
257
+ " ok=True; break\n",
258
+ " if not ok:\n",
259
+ " e=time.time()-s; total_time+=e\n",
260
+ " results[tid]={'status':'failed','time_s':round(e,2)}\n",
261
+ " print(f'❌ ({e:.1f}s)')\n",
262
+ " if (idx+1)%5==0 or ok:\n",
263
+ " with open('llm_results.json','w') as f:\n",
264
+ " json.dump({'model':MODEL,'n_candidates':N_CANDIDATES,'llm_solved':solved,\n",
265
+ " 'attempted':len(results),'symbolic_solved':len(ALREADY_SOLVED),\n",
266
+ " 'total_solved':len(ALREADY_SOLVED)+solved,'total_tasks':len(task_files),\n",
267
+ " 'solve_rate':round(100*(len(ALREADY_SOLVED)+solved)/len(task_files),2),\n",
268
+ " 'total_time_s':round(total_time,1),'results':results},f,indent=2)"
269
+ ]
270
+ },
271
+ {
272
+ "cell_type": "code",
273
+ "execution_count": null,
274
+ "metadata": {},
275
+ "outputs": [],
276
+ "source": [
277
+ "# Final save + summary\n",
278
+ "with open('llm_results.json','w') as f:\n",
279
+ " json.dump({'model':MODEL,'n_candidates':N_CANDIDATES,'llm_solved':solved,\n",
280
+ " 'attempted':len(results),'symbolic_solved':len(ALREADY_SOLVED),\n",
281
+ " 'total_solved':len(ALREADY_SOLVED)+solved,'total_tasks':len(task_files),\n",
282
+ " 'solve_rate':round(100*(len(ALREADY_SOLVED)+solved)/len(task_files),2),\n",
283
+ " 'total_time_s':round(total_time,1),'results':results},f,indent=2)\n",
284
+ "\n",
285
+ "print(f'\\n{\"=\"*60}')\n",
286
+ "print(f'LLM solved: {solved}')\n",
287
+ "print(f'Symbolic: {len(ALREADY_SOLVED)}')\n",
288
+ "print(f'TOTAL: {len(ALREADY_SOLVED)+solved}/{len(task_files)} ({100*(len(ALREADY_SOLVED)+solved)/len(task_files):.1f}%)')\n",
289
+ "print(f'Time: {total_time/3600:.1f}h')\n",
290
+ "print(f'\\nDownload llm_results.json, then run:')\n",
291
+ "print(f' python scripts/merge_results.py arc_results/summary_v4.json llm_results.json')\n",
292
+ "\n",
293
+ "subprocess.run(['pkill','-f','ollama'], capture_output=True)"
294
+ ]
295
+ }
296
+ ],
297
+ "metadata": {
298
+ "kernelspec": {"display_name":"Python 3","language":"python","name":"python3"},
299
+ "language_info": {"name":"python","version":"3.10.0"}
300
+ },
301
+ "nbformat": 4,
302
+ "nbformat_minor": 4
303
+ }
pemf/notebooks/pemf_llm_solver.ipynb ADDED
@@ -0,0 +1,490 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# PEMF ARC-AGI — LLM Program Synthesis\n",
8
+ "\n",
9
+ "Uses NVIDIA NIM (free) with GLM 4.7 / DeepSeek V4 to solve ARC tasks.\n",
10
+ "\n",
11
+ "**Pipeline:** For each unsolved task → build prompt → LLM generates Python `transform()` → verify against ALL training pairs → apply to test.\n",
12
+ "\n",
13
+ "**Prerequisites:**\n",
14
+ "- NVIDIA NIM API key from https://build.nvidia.com/settings/api-keys\n",
15
+ "- Internet access enabled"
16
+ ]
17
+ },
18
+ {
19
+ "cell_type": "markdown",
20
+ "metadata": {},
21
+ "source": [
22
+ "## 1. Setup"
23
+ ]
24
+ },
25
+ {
26
+ "cell_type": "code",
27
+ "execution_count": null,
28
+ "metadata": {},
29
+ "outputs": [],
30
+ "source": [
31
+ "# ============================================================\n",
32
+ "# CONFIGURATION — EDIT THESE\n",
33
+ "# ============================================================\n",
34
+ "\n",
35
+ "NVIDIA_API_KEY = \"nvapi-YOUR-KEY-HERE\" # Get from https://build.nvidia.com/settings/api-keys\n",
36
+ "\n",
37
+ "MODEL = \"z-ai/glm4.7\" # Default: GLM 4.7\n",
38
+ "# MODEL = \"deepseek-ai/deepseek-v4-pro\" # Alternative: DeepSeek V4\n",
39
+ "\n",
40
+ "N_CANDIDATES = 8 # Candidates per task (more = better but slower)\n",
41
+ "RATE_LIMIT_SLEEP = 2 # Seconds between API calls"
42
+ ]
43
+ },
44
+ {
45
+ "cell_type": "code",
46
+ "execution_count": null,
47
+ "metadata": {},
48
+ "outputs": [],
49
+ "source": [
50
+ "# Download ARC dataset\n",
51
+ "import os, subprocess\n",
52
+ "\n",
53
+ "if not os.path.exists('arc_data/training'):\n",
54
+ " print('Downloading ARC dataset...')\n",
55
+ " subprocess.run(['git', 'clone', '--depth', '1', 'https://github.com/fchollet/ARC-AGI.git', '/tmp/arc'], \n",
56
+ " capture_output=True)\n",
57
+ " os.makedirs('arc_data', exist_ok=True)\n",
58
+ " subprocess.run(['cp', '-r', '/tmp/arc/data/training', 'arc_data/training'], capture_output=True)\n",
59
+ " print(f'Downloaded {len(os.listdir(\"arc_data/training\"))} tasks')\n",
60
+ "else:\n",
61
+ " print(f'ARC data already present: {len(os.listdir(\"arc_data/training\"))} tasks')"
62
+ ]
63
+ },
64
+ {
65
+ "cell_type": "code",
66
+ "execution_count": null,
67
+ "metadata": {},
68
+ "outputs": [],
69
+ "source": [
70
+ "# Already solved by symbolic pipeline (70 tasks)\n",
71
+ "ALREADY_SOLVED = {\n",
72
+ " \"007bbfb7\",\"00d62c1b\",\"0d3d703e\",\"1190e5a7\",\"1cf80156\",\"1e0a9b12\",\"1f85a75f\",\n",
73
+ " \"2013d3e2\",\"22168020\",\"22eb0ac0\",\"239be575\",\"23b5c85d\",\"28bf18c6\",\"2dee498d\",\n",
74
+ " \"3618c87e\",\"3906de3d\",\"3aa6fb7a\",\"3af2c5a8\",\"3c9b0459\",\"42a50994\",\"4347f46a\",\n",
75
+ " \"50cb2852\",\"6150a2bd\",\"62c24649\",\"67385a82\",\"67a3c6ac\",\"67e8384a\",\"68b16354\",\n",
76
+ " \"6d0aefbc\",\"6f8cd79b\",\"6fa7a44f\",\"746b3537\",\"74dd1130\",\"7b7f7511\",\"7e0986d6\",\n",
77
+ " \"7f4411dc\",\"868de0fa\",\"8be77c9e\",\"8d5021e8\",\"91714a58\",\"9172f3a0\",\"9565186b\",\n",
78
+ " \"9dfd6313\",\"a416b8f3\",\"a5313dff\",\"a699fb00\",\"aabf363d\",\"aedd82e4\",\"b1948b0a\",\n",
79
+ " \"b6afb2da\",\"ba97ae07\",\"bb43febb\",\"bda2d7a6\",\"be94b721\",\"c0f76784\",\"c59eb873\",\n",
80
+ " \"c8f0f002\",\"c9e6f938\",\"d10ecb37\",\"d23f8c26\",\"d511f180\",\"d631b094\",\"d90796e8\",\n",
81
+ " \"d9fac9be\",\"de1cd16c\",\"ded97339\",\"e26a3af2\",\"eb5a1d5d\",\"ed36ccf7\",\"f76d97a5\",\n",
82
+ "}\n",
83
+ "print(f'Already solved by symbolic pipeline: {len(ALREADY_SOLVED)} tasks')"
84
+ ]
85
+ },
86
+ {
87
+ "cell_type": "markdown",
88
+ "metadata": {},
89
+ "source": [
90
+ "## 2. LLM Engine"
91
+ ]
92
+ },
93
+ {
94
+ "cell_type": "code",
95
+ "execution_count": null,
96
+ "metadata": {},
97
+ "outputs": [],
98
+ "source": [
99
+ "import json\n",
100
+ "import time\n",
101
+ "import re\n",
102
+ "import glob\n",
103
+ "import numpy as np\n",
104
+ "import urllib.request\n",
105
+ "from collections import Counter\n",
106
+ "\n",
107
+ "\n",
108
+ "def call_nvidia(prompt, api_key, model=\"z-ai/glm4.7\", temperature=0.7):\n",
109
+ " \"\"\"Call NVIDIA NIM API.\"\"\"\n",
110
+ " url = \"https://integrate.api.nvidia.com/v1/chat/completions\"\n",
111
+ " payload = {\n",
112
+ " \"model\": model,\n",
113
+ " \"messages\": [{\"role\": \"user\", \"content\": prompt}],\n",
114
+ " \"max_tokens\": 2048,\n",
115
+ " \"temperature\": temperature,\n",
116
+ " }\n",
117
+ " data = json.dumps(payload).encode('utf-8')\n",
118
+ " req = urllib.request.Request(url, data=data,\n",
119
+ " headers={\"Content-Type\": \"application/json\",\n",
120
+ " \"Authorization\": f\"Bearer {api_key}\"},\n",
121
+ " method='POST')\n",
122
+ " try:\n",
123
+ " with urllib.request.urlopen(req, timeout=120) as resp:\n",
124
+ " result = json.loads(resp.read().decode())\n",
125
+ " return result['choices'][0]['message']['content']\n",
126
+ " except Exception as e:\n",
127
+ " return f\"ERROR: {e}\"\n",
128
+ "\n",
129
+ "\n",
130
+ "def build_prompt(task):\n",
131
+ " \"\"\"Build prompt for ARC task.\"\"\"\n",
132
+ " train_pairs = task.get('train', [])\n",
133
+ " examples = []\n",
134
+ " for i, pair in enumerate(train_pairs):\n",
135
+ " examples.append(\n",
136
+ " f\"Example {i+1}:\\n\"\n",
137
+ " f\" Input: {json.dumps(pair['input'])}\\n\"\n",
138
+ " f\" Output: {json.dumps(pair['output'])}\"\n",
139
+ " )\n",
140
+ " examples_str = \"\\n\".join(examples)\n",
141
+ "\n",
142
+ " inputs = [np.array(p['input']) for p in train_pairs]\n",
143
+ " outputs = [np.array(p['output']) for p in train_pairs]\n",
144
+ " same_shape = all(i.shape == o.shape for i, o in zip(inputs, outputs))\n",
145
+ " in_colors = sorted(set(c for i in inputs for c in np.unique(i).tolist()))\n",
146
+ " out_colors = sorted(set(c for o in outputs for c in np.unique(o).tolist()))\n",
147
+ "\n",
148
+ " analysis = f\" Same input/output shape: {same_shape}\\n\"\n",
149
+ " analysis += f\" Input colors: {in_colors}, Output colors: {out_colors}\\n\"\n",
150
+ " if not same_shape:\n",
151
+ " for i, o in zip(inputs[:1], outputs[:1]):\n",
152
+ " analysis += f\" Shape: {i.shape} -> {o.shape}\\n\"\n",
153
+ "\n",
154
+ " return f\"\"\"Solve this ARC-AGI puzzle. Write ONLY a Python function, no explanations.\n",
155
+ "\n",
156
+ "{examples_str}\n",
157
+ "\n",
158
+ "Analysis:\n",
159
+ "{analysis}\n",
160
+ "```python\n",
161
+ "import numpy as np\n",
162
+ "from collections import Counter, deque\n",
163
+ "\n",
164
+ "def transform(grid: list[list[int]]) -> list[list[int]]:\n",
165
+ " grid = np.array(grid)\n",
166
+ "\"\"\"\n",
167
+ "\n",
168
+ "\n",
169
+ "def extract_code(response):\n",
170
+ " \"\"\"Extract Python function from LLM response.\"\"\"\n",
171
+ " for pattern in [r'```python\\s*(.*?)```', r'```\\s*(.*?)```']:\n",
172
+ " matches = re.findall(pattern, response, re.DOTALL)\n",
173
+ " for match in matches:\n",
174
+ " if 'def transform' in match:\n",
175
+ " return match.strip()\n",
176
+ " idx = response.find('def transform')\n",
177
+ " if idx >= 0:\n",
178
+ " before = response[:idx]\n",
179
+ " import_start = max(before.rfind('import '), before.rfind('from '))\n",
180
+ " start = import_start if import_start >= 0 else idx\n",
181
+ " code = response[start:]\n",
182
+ " end = code.find('```')\n",
183
+ " if end > 0:\n",
184
+ " code = code[:end]\n",
185
+ " return code.strip()\n",
186
+ " stripped = response.strip()\n",
187
+ " if stripped.startswith(('import', 'def transform', 'from')):\n",
188
+ " return stripped\n",
189
+ " return None\n",
190
+ "\n",
191
+ "\n",
192
+ "def verify_program(code, train_pairs):\n",
193
+ " \"\"\"Execute program and verify against all training pairs.\"\"\"\n",
194
+ " namespace = {'np': np, 'numpy': np, 'Counter': Counter,\n",
195
+ " 'deque': __import__('collections').deque}\n",
196
+ " try:\n",
197
+ " import scipy.ndimage\n",
198
+ " namespace['scipy'] = __import__('scipy')\n",
199
+ " except ImportError:\n",
200
+ " pass\n",
201
+ " try:\n",
202
+ " exec(code, namespace)\n",
203
+ " except Exception:\n",
204
+ " return False\n",
205
+ " if 'transform' not in namespace:\n",
206
+ " return False\n",
207
+ " fn = namespace['transform']\n",
208
+ " for pair in train_pairs:\n",
209
+ " try:\n",
210
+ " result = fn([row[:] for row in pair['input']])\n",
211
+ " if result is None:\n",
212
+ " return False\n",
213
+ " r = np.array(result, dtype=int)\n",
214
+ " e = np.array(pair['output'], dtype=int)\n",
215
+ " if r.shape != e.shape or not np.array_equal(r, e):\n",
216
+ " return False\n",
217
+ " except Exception:\n",
218
+ " return False\n",
219
+ " return True\n",
220
+ "\n",
221
+ "\n",
222
+ "def apply_program(code, test_input):\n",
223
+ " \"\"\"Apply verified program to test input.\"\"\"\n",
224
+ " namespace = {'np': np, 'numpy': np, 'Counter': Counter,\n",
225
+ " 'deque': __import__('collections').deque}\n",
226
+ " try:\n",
227
+ " import scipy.ndimage\n",
228
+ " namespace['scipy'] = __import__('scipy')\n",
229
+ " except ImportError:\n",
230
+ " pass\n",
231
+ " try:\n",
232
+ " exec(code, namespace)\n",
233
+ " result = namespace['transform']([row[:] for row in test_input])\n",
234
+ " if result is not None:\n",
235
+ " return np.array(result, dtype=int).tolist()\n",
236
+ " except Exception:\n",
237
+ " pass\n",
238
+ " return None\n",
239
+ "\n",
240
+ "\n",
241
+ "print('LLM engine ready.')"
242
+ ]
243
+ },
244
+ {
245
+ "cell_type": "markdown",
246
+ "metadata": {},
247
+ "source": [
248
+ "## 3. Quick Test (1 task)"
249
+ ]
250
+ },
251
+ {
252
+ "cell_type": "code",
253
+ "execution_count": null,
254
+ "metadata": {},
255
+ "outputs": [],
256
+ "source": [
257
+ "# Quick test — verify API works before running all 330 tasks\n",
258
+ "test_tid = '0520fde7'\n",
259
+ "with open(f'arc_data/training/{test_tid}.json') as f:\n",
260
+ " test_task = json.load(f)\n",
261
+ "\n",
262
+ "print(f'Testing on {test_tid}...')\n",
263
+ "for i, p in enumerate(test_task['train']):\n",
264
+ " inp = np.array(p['input']); out = np.array(p['output'])\n",
265
+ " print(f' Pair {i}: {inp.shape} -> {out.shape}')\n",
266
+ "\n",
267
+ "prompt = build_prompt(test_task)\n",
268
+ "print(f'Prompt: {len(prompt)} chars')\n",
269
+ "\n",
270
+ "response = call_nvidia(prompt, NVIDIA_API_KEY, MODEL, temperature=0.1)\n",
271
+ "if response.startswith('ERROR:'):\n",
272
+ " print(f'\\n❌ API Error: {response}')\n",
273
+ " print('Check your NVIDIA_API_KEY and MODEL settings above.')\n",
274
+ "else:\n",
275
+ " code = extract_code(response)\n",
276
+ " if code:\n",
277
+ " ok = verify_program(code, test_task['train'])\n",
278
+ " print(f'\\nCode extracted: {len(code)} chars')\n",
279
+ " print(f'Verified: {\"✅\" if ok else \"❌\"}')\n",
280
+ " if ok:\n",
281
+ " print('API working and generating correct code!')\n",
282
+ " else:\n",
283
+ " print('API working but code failed verification (normal — will try more candidates in full run)')\n",
284
+ " else:\n",
285
+ " print(f'\\nNo code extracted from response ({len(response)} chars)')\n",
286
+ " print('API working but response format unexpected. Will retry with different temperatures in full run.')"
287
+ ]
288
+ },
289
+ {
290
+ "cell_type": "markdown",
291
+ "metadata": {},
292
+ "source": [
293
+ "## 4. Run on All Unsolved Tasks"
294
+ ]
295
+ },
296
+ {
297
+ "cell_type": "code",
298
+ "execution_count": null,
299
+ "metadata": {},
300
+ "outputs": [],
301
+ "source": [
302
+ "# Load all unsolved tasks\n",
303
+ "task_files = sorted(glob.glob('arc_data/training/*.json'))\n",
304
+ "unsolved = []\n",
305
+ "for tf in task_files:\n",
306
+ " tid = os.path.basename(tf).replace('.json', '')\n",
307
+ " if tid not in ALREADY_SOLVED:\n",
308
+ " unsolved.append((tid, tf))\n",
309
+ "\n",
310
+ "print(f'Total tasks: {len(task_files)}')\n",
311
+ "print(f'Already solved (symbolic): {len(ALREADY_SOLVED)}')\n",
312
+ "print(f'To attempt with LLM: {len(unsolved)}')\n",
313
+ "print(f'Model: {MODEL}')\n",
314
+ "print(f'Candidates per task: {N_CANDIDATES}')\n",
315
+ "print(f'\\nStarting...')"
316
+ ]
317
+ },
318
+ {
319
+ "cell_type": "code",
320
+ "execution_count": null,
321
+ "metadata": {},
322
+ "outputs": [],
323
+ "source": [
324
+ "# Main loop\n",
325
+ "results = {}\n",
326
+ "solved = 0\n",
327
+ "total_time = 0\n",
328
+ "\n",
329
+ "# Resume from previous run if exists\n",
330
+ "if os.path.exists('llm_results.json'):\n",
331
+ " with open('llm_results.json') as f:\n",
332
+ " prev = json.load(f)\n",
333
+ " results = prev.get('results', {})\n",
334
+ " solved = sum(1 for r in results.values() if r['status'] == 'solved')\n",
335
+ " print(f'Resuming from previous run: {solved} already solved by LLM')\n",
336
+ "\n",
337
+ "for idx, (tid, tf) in enumerate(unsolved):\n",
338
+ " # Skip if already attempted\n",
339
+ " if tid in results:\n",
340
+ " continue\n",
341
+ " \n",
342
+ " with open(tf) as f:\n",
343
+ " task = json.load(f)\n",
344
+ " \n",
345
+ " print(f'[{idx+1:3d}/{len(unsolved)}] {tid}:', end=' ', flush=True)\n",
346
+ " start = time.time()\n",
347
+ " \n",
348
+ " prompt = build_prompt(task)\n",
349
+ " task_solved = False\n",
350
+ " \n",
351
+ " for i in range(N_CANDIDATES):\n",
352
+ " temp = 0.1 if i == 0 else min(0.4 + 0.15 * i, 1.2)\n",
353
+ " response = call_nvidia(prompt, NVIDIA_API_KEY, MODEL, temp)\n",
354
+ " \n",
355
+ " if response.startswith('ERROR:'):\n",
356
+ " if '429' in response or 'rate' in response.lower():\n",
357
+ " time.sleep(10) # Rate limit — wait longer\n",
358
+ " continue\n",
359
+ " \n",
360
+ " code = extract_code(response)\n",
361
+ " if code is None:\n",
362
+ " continue\n",
363
+ " \n",
364
+ " if verify_program(code, task['train']):\n",
365
+ " elapsed = time.time() - start\n",
366
+ " total_time += elapsed\n",
367
+ " solved += 1\n",
368
+ " \n",
369
+ " test_outputs = [apply_program(code, t['input']) for t in task.get('test', [])]\n",
370
+ " results[tid] = {\n",
371
+ " 'status': 'solved', 'rule': f'llm_c{i+1}_t{temp:.1f}',\n",
372
+ " 'code': code, 'test_outputs': test_outputs,\n",
373
+ " 'time_s': round(elapsed, 2),\n",
374
+ " }\n",
375
+ " print(f'✅ c{i+1} ({elapsed:.1f}s) [total: {len(ALREADY_SOLVED)+solved}/{len(task_files)}]')\n",
376
+ " task_solved = True\n",
377
+ " break\n",
378
+ " \n",
379
+ " time.sleep(RATE_LIMIT_SLEEP)\n",
380
+ " \n",
381
+ " if not task_solved:\n",
382
+ " elapsed = time.time() - start\n",
383
+ " total_time += elapsed\n",
384
+ " results[tid] = {'status': 'failed', 'time_s': round(elapsed, 2)}\n",
385
+ " print(f'❌ ({elapsed:.1f}s)')\n",
386
+ " \n",
387
+ " # Save progress every 10 tasks\n",
388
+ " if (idx + 1) % 10 == 0:\n",
389
+ " with open('llm_results.json', 'w') as f:\n",
390
+ " json.dump({\n",
391
+ " 'model': MODEL, 'n_candidates': N_CANDIDATES,\n",
392
+ " 'llm_solved': solved, 'attempted': sum(1 for r in results.values()),\n",
393
+ " 'symbolic_solved': len(ALREADY_SOLVED),\n",
394
+ " 'total_solved': len(ALREADY_SOLVED) + solved,\n",
395
+ " 'total_tasks': len(task_files),\n",
396
+ " 'solve_rate': round(100 * (len(ALREADY_SOLVED) + solved) / len(task_files), 2),\n",
397
+ " 'total_time_s': round(total_time, 1),\n",
398
+ " 'results': results,\n",
399
+ " }, f, indent=2)\n",
400
+ " print(f' [Saved: {len(ALREADY_SOLVED)+solved}/{len(task_files)} total]')"
401
+ ]
402
+ },
403
+ {
404
+ "cell_type": "code",
405
+ "execution_count": null,
406
+ "metadata": {},
407
+ "outputs": [],
408
+ "source": [
409
+ "# Final save\n",
410
+ "with open('llm_results.json', 'w') as f:\n",
411
+ " json.dump({\n",
412
+ " 'model': MODEL, 'n_candidates': N_CANDIDATES,\n",
413
+ " 'llm_solved': solved, 'attempted': sum(1 for r in results.values()),\n",
414
+ " 'symbolic_solved': len(ALREADY_SOLVED),\n",
415
+ " 'total_solved': len(ALREADY_SOLVED) + solved,\n",
416
+ " 'total_tasks': len(task_files),\n",
417
+ " 'solve_rate': round(100 * (len(ALREADY_SOLVED) + solved) / len(task_files), 2),\n",
418
+ " 'total_time_s': round(total_time, 1),\n",
419
+ " 'results': results,\n",
420
+ " }, f, indent=2)\n",
421
+ "\n",
422
+ "print(f'\\n{\"=\"*60}')\n",
423
+ "print(f'FINAL RESULTS')\n",
424
+ "print(f'{\"=\"*60}')\n",
425
+ "print(f'LLM solved: {solved}')\n",
426
+ "print(f'Symbolic solved: {len(ALREADY_SOLVED)}')\n",
427
+ "print(f'TOTAL SOLVED: {len(ALREADY_SOLVED)+solved}/{len(task_files)} ({100*(len(ALREADY_SOLVED)+solved)/len(task_files):.1f}%)')\n",
428
+ "print(f'Time: {total_time:.0f}s')\n",
429
+ "print(f'\\nResults saved to: llm_results.json')"
430
+ ]
431
+ },
432
+ {
433
+ "cell_type": "markdown",
434
+ "metadata": {},
435
+ "source": [
436
+ "## 5. Results Analysis"
437
+ ]
438
+ },
439
+ {
440
+ "cell_type": "code",
441
+ "execution_count": null,
442
+ "metadata": {},
443
+ "outputs": [],
444
+ "source": [
445
+ "# Load and analyze results\n",
446
+ "with open('llm_results.json') as f:\n",
447
+ " data = json.load(f)\n",
448
+ "\n",
449
+ "print(f'Model: {data[\"model\"]}')\n",
450
+ "print(f'Candidates per task: {data[\"n_candidates\"]}')\n",
451
+ "print(f'\\nSymbolic solved: {data[\"symbolic_solved\"]}')\n",
452
+ "print(f'LLM solved: {data[\"llm_solved\"]}')\n",
453
+ "print(f'TOTAL: {data[\"total_solved\"]}/{data[\"total_tasks\"]} ({data[\"solve_rate\"]}%)')\n",
454
+ "\n",
455
+ "llm_solved_tasks = [tid for tid, r in data['results'].items() if r['status'] == 'solved']\n",
456
+ "print(f'\\nLLM-solved tasks ({len(llm_solved_tasks)}):')\n",
457
+ "for tid in sorted(llm_solved_tasks):\n",
458
+ " rule = data['results'][tid].get('rule', '?')\n",
459
+ " t = data['results'][tid].get('time_s', 0)\n",
460
+ " print(f' {tid}: {rule} ({t}s)')"
461
+ ]
462
+ },
463
+ {
464
+ "cell_type": "markdown",
465
+ "metadata": {},
466
+ "source": [
467
+ "## 6. Download Results\n",
468
+ "\n",
469
+ "Download `llm_results.json` from the notebook output, then merge with symbolic results:\n",
470
+ "\n",
471
+ "```bash\n",
472
+ "python scripts/merge_results.py arc_results/summary_v4.json llm_results.json\n",
473
+ "```"
474
+ ]
475
+ }
476
+ ],
477
+ "metadata": {
478
+ "kernelspec": {
479
+ "display_name": "Python 3",
480
+ "language": "python",
481
+ "name": "python3"
482
+ },
483
+ "language_info": {
484
+ "name": "python",
485
+ "version": "3.10.0"
486
+ }
487
+ },
488
+ "nbformat": 4,
489
+ "nbformat_minor": 4
490
+ }
pemf/pyproject.toml ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "pemf-arc-agi"
3
+ version = "0.4.0"
4
+ description = "Pre-Emergence Mechanics Framework (PEMF) solver for ARC-AGI"
5
+ requires-python = ">=3.10"
6
+ license = {text = "MIT"}
7
+
8
+ dependencies = [
9
+ "numpy>=1.24",
10
+ "scipy>=1.10",
11
+ ]
12
+
13
+ [project.optional-dependencies]
14
+ viz = [
15
+ "matplotlib>=3.7",
16
+ ]
17
+ wandb = [
18
+ "wandb>=0.15",
19
+ "matplotlib>=3.7",
20
+ ]
21
+ llm = [
22
+ "huggingface-hub>=0.20",
23
+ ]
24
+ all = [
25
+ "numpy>=1.24",
26
+ "scipy>=1.10",
27
+ "matplotlib>=3.7",
28
+ "wandb>=0.15",
29
+ "huggingface-hub>=0.20",
30
+ ]
31
+
32
+ [build-system]
33
+ requires = ["hatchling"]
34
+ build-backend = "hatchling.build"
35
+
36
+ [tool.hatch.build.targets.wheel]
37
+ packages = ["itt_solver"]
38
+
39
+ [dependency-groups]
40
+ dev = [
41
+ "pytest>=7.0",
42
+ ]
pemf/scripts/entrypoint.py ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Headless entrypoint for running a single experiment or a sweep.
3
+
4
+ Usage:
5
+ python scripts/entrypoint.py --task example1 --out_dir experiments
6
+ python scripts/entrypoint.py --task example1 --out_dir experiments --use_wandb
7
+
8
+ By default Weights & Biases logging is disabled. Use --use_wandb to enable it.
9
+ """
10
+ import argparse
11
+ import json
12
+ import os
13
+ import importlib
14
+
15
+ def main():
16
+ parser = argparse.ArgumentParser(description="Run ARC-AGI experiment (headless).")
17
+ parser.add_argument("--task", type=str, required=True, help="Task name or path to task JSON")
18
+ parser.add_argument("--out_dir", type=str, default="experiments", help="Output directory")
19
+ parser.add_argument("--use_wandb", action="store_true", help="Enable Weights & Biases logging (default: off)")
20
+ parser.add_argument("--params", type=str, default=None, help="Optional JSON string of params")
21
+ args = parser.parse_args()
22
+
23
+ os.makedirs(args.out_dir, exist_ok=True)
24
+
25
+ # lazy imports to avoid heavy startup cost
26
+ import itt_solver.experiment_driver as ed
27
+ import itt_solver.solver_core as sc
28
+
29
+ # load task: if args.task is a JSON file path, load it; otherwise expect a built-in name
30
+ if os.path.exists(args.task):
31
+ with open(args.task) as fh:
32
+ task = json.load(fh)
33
+ else:
34
+ # minimal built-in example if user passed 'example1'
35
+ # Corrected target from real ARC task 007bbfb7 (Kronecker self-similar)
36
+ if args.task == "example1":
37
+ task = {
38
+ 'name': 'example1',
39
+ 'input': [[0,7,7],[7,7,7],[0,7,7]],
40
+ 'target': [
41
+ [0,0,0,0,7,7,0,7,7],
42
+ [0,0,0,7,7,7,7,7,7],
43
+ [0,0,0,0,7,7,0,7,7],
44
+ [0,7,7,0,7,7,0,7,7],
45
+ [7,7,7,7,7,7,7,7,7],
46
+ [0,7,7,0,7,7,0,7,7],
47
+ [0,0,0,0,7,7,0,7,7],
48
+ [0,0,0,7,7,7,7,7,7],
49
+ [0,0,0,0,7,7,0,7,7],
50
+ ],
51
+ 'target_shape': (9,9)
52
+ }
53
+ else:
54
+ raise SystemExit(f"Unknown task identifier: {args.task}")
55
+
56
+ # parse params if provided
57
+ params = {}
58
+ if args.params:
59
+ try:
60
+ params = json.loads(args.params)
61
+ except Exception:
62
+ print("Warning: could not parse --params JSON; ignoring.")
63
+
64
+ # build atomic library using default factory
65
+ atomic_library = ed.default_atomic_factory(params, task)
66
+
67
+ # run single experiment
68
+ result = ed.run_single(task, atomic_library, params, out_dir=args.out_dir)
69
+
70
+ # optionally run W&B logging externally (only if requested)
71
+ if args.use_wandb:
72
+ try:
73
+ from itt_solver.wandb_runner import run_and_log_wandb
74
+ run_and_log_wandb(task, atomic_library, params, out_dir=args.out_dir,
75
+ wandb_project=params.get('wandb_project','itt_solver'),
76
+ wandb_entity=None, resume="allow")
77
+ except Exception as e:
78
+ print("W&B logging failed or not configured:", e)
79
+
80
+ print("Run finished. Result summary:")
81
+ print(json.dumps(result, indent=2))
82
+
83
+ if __name__ == "__main__":
84
+ main()
pemf/scripts/fix_and_inspect_logs.py ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import glob, json, numpy as np, os
2
+ from pprint import pprint
3
+
4
+ def load_latest(pattern):
5
+ files = sorted(glob.glob(pattern))
6
+ return files[-1] if files else None
7
+
8
+ logs_path = load_latest("experiments/*_logs.json")
9
+ phi_path = load_latest("experiments/*_phi_best.npy")
10
+ res_path = load_latest("experiments/*_result.json")
11
+
12
+ print("logs:", logs_path)
13
+ print("phi_best:", phi_path)
14
+ print("result:", res_path)
15
+
16
+ if not logs_path:
17
+ raise SystemExit("No logs file found")
18
+
19
+ logs = json.load(open(logs_path))
20
+ res = json.load(open(res_path)) if res_path else {}
21
+
22
+ # coerce gate values to booleans for all depth entries
23
+ def coerce_gates(g):
24
+ if not isinstance(g, dict):
25
+ return g
26
+ out = {}
27
+ for k,v in g.items():
28
+ if isinstance(v, str):
29
+ lv = v.strip().lower()
30
+ if lv in ("true","1","yes"):
31
+ out[k] = True
32
+ elif lv in ("false","0","no"):
33
+ out[k] = False
34
+ else:
35
+ try:
36
+ out[k] = bool(int(v))
37
+ except Exception:
38
+ out[k] = v
39
+ else:
40
+ out[k] = v
41
+ return out
42
+
43
+ for depth_idx, depth in enumerate(logs):
44
+ for entry in depth:
45
+ if 'gates' in entry:
46
+ entry['gates'] = coerce_gates(entry['gates'])
47
+
48
+ # attach phi_best into the first accepted entry (if not present)
49
+ accepted_entry = None
50
+ for entry in logs[0]:
51
+ if entry.get('accepted'):
52
+ accepted_entry = entry
53
+ break
54
+
55
+ phi = np.load(phi_path) if phi_path else None
56
+ if accepted_entry is not None:
57
+ if 'candidate_array' not in accepted_entry:
58
+ accepted_entry['candidate_array'] = phi.tolist() if phi is not None else None
59
+
60
+ # Corrected target from real ARC task 007bbfb7 (Kronecker self-similar)
61
+ TARGET_GRID = [
62
+ [0,0,0,0,7,7,0,7,7],
63
+ [0,0,0,7,7,7,7,7,7],
64
+ [0,0,0,0,7,7,0,7,7],
65
+ [0,7,7,0,7,7,0,7,7],
66
+ [7,7,7,7,7,7,7,7,7],
67
+ [0,7,7,0,7,7,0,7,7],
68
+ [0,0,0,0,7,7,0,7,7],
69
+ [0,0,0,7,7,7,7,7,7],
70
+ [0,0,0,0,7,7,0,7,7],
71
+ ]
72
+ TARGET = np.array(TARGET_GRID, dtype=int)
73
+
74
+ def tile_transform(phi, out_shape):
75
+ a = np.array(phi)
76
+ h_out, w_out = out_shape
77
+ h_in, w_in = a.shape
78
+ reps_h = (h_out + h_in - 1) // h_in
79
+ reps_w = (w_out + w_in - 1) // w_in
80
+ tiled = np.tile(a, (reps_h, reps_w))
81
+ return tiled[:h_out, :w_out]
82
+
83
+ if accepted_entry is not None and accepted_entry.get('candidate_array') is not None:
84
+ cand = np.array(accepted_entry['candidate_array'], dtype=float)
85
+ if cand.shape != TARGET.shape:
86
+ cand_resized = tile_transform(cand, TARGET.shape)
87
+ else:
88
+ cand_resized = cand
89
+ cand_q = np.rint(cand_resized).astype(int)
90
+ l1 = float(np.sum(np.abs(cand_q - TARGET)))
91
+ print("Recomputed L1 residue for first accepted candidate:", l1)
92
+ print("Candidate unique values:", np.unique(cand_q))
93
+ diff = (cand_q != TARGET).astype(int)
94
+ print("Changed cells count:", int(diff.sum()))
95
+ print("Diff map (1=diff):")
96
+ print(diff)
97
+ else:
98
+ print("No candidate array available in logs or phi_best missing.")
99
+
100
+ # write fixed logs copy
101
+ fixed_path = logs_path.replace("_logs.json", "_logs.fixed.json")
102
+ with open(fixed_path, "w") as fh:
103
+ json.dump(logs, fh, indent=2)
104
+ print("Wrote fixed logs to", fixed_path)
pemf/scripts/kaggle_llm_solver.py ADDED
@@ -0,0 +1,452 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ PEMF ARC-AGI — LLM Program Synthesis via Ollama (Kaggle Edition)
3
+ ================================================================
4
+
5
+ Self-contained script for Kaggle GPU notebooks.
6
+ Pulls a model via Ollama, runs LLM synthesis on unsolved ARC tasks.
7
+
8
+ Usage on Kaggle:
9
+ 1. Enable GPU (T4 x2 or P100)
10
+ 2. Enable internet access
11
+ 3. Upload this file + arc_data/ + already_solved.json
12
+ 4. Run all cells
13
+
14
+ The script:
15
+ - Installs Ollama
16
+ - Pulls the model (qwen2.5-coder:32b or smaller)
17
+ - Loads ARC tasks
18
+ - For each unsolved task: generates Python transform(), verifies against training pairs
19
+ - Saves results to llm_results.json
20
+ """
21
+
22
+ import subprocess
23
+ import sys
24
+ import os
25
+ import json
26
+ import time
27
+ import re
28
+ import signal
29
+ import numpy as np
30
+ from typing import Dict, List, Optional, Tuple
31
+ from collections import Counter
32
+ from pathlib import Path
33
+
34
+
35
+ # =============================================================================
36
+ # 1. OLLAMA SETUP
37
+ # =============================================================================
38
+
39
+ def install_ollama():
40
+ """Install Ollama on Kaggle/Linux."""
41
+ print("Installing Ollama...")
42
+ subprocess.run("curl -fsSL https://ollama.com/install.sh | sh",
43
+ shell=True, check=True, capture_output=True)
44
+ print("Ollama installed.")
45
+
46
+
47
+ def start_ollama():
48
+ """Start Ollama server in background."""
49
+ print("Starting Ollama server...")
50
+ proc = subprocess.Popen(
51
+ ["ollama", "serve"],
52
+ stdout=subprocess.DEVNULL,
53
+ stderr=subprocess.DEVNULL,
54
+ )
55
+ time.sleep(3) # Wait for server to start
56
+ print(f"Ollama server started (PID {proc.pid})")
57
+ return proc
58
+
59
+
60
+ def pull_model(model_name: str):
61
+ """Pull a model via Ollama."""
62
+ print(f"Pulling model {model_name}... (this may take several minutes)")
63
+ result = subprocess.run(
64
+ ["ollama", "pull", model_name],
65
+ capture_output=True, text=True, timeout=1800
66
+ )
67
+ if result.returncode != 0:
68
+ print(f"Pull failed: {result.stderr}")
69
+ raise RuntimeError(f"Failed to pull {model_name}")
70
+ print(f"Model {model_name} ready.")
71
+
72
+
73
+ def call_ollama(prompt: str, model: str = "qwen2.5-coder:32b",
74
+ temperature: float = 0.7, timeout_s: int = 120) -> str:
75
+ """Call Ollama API and return response text."""
76
+ import urllib.request
77
+
78
+ payload = {
79
+ "model": model,
80
+ "prompt": prompt,
81
+ "stream": False,
82
+ "options": {
83
+ "temperature": temperature,
84
+ "num_predict": 2048,
85
+ }
86
+ }
87
+
88
+ data = json.dumps(payload).encode('utf-8')
89
+ req = urllib.request.Request(
90
+ "http://localhost:11434/api/generate",
91
+ data=data,
92
+ headers={"Content-Type": "application/json"},
93
+ method='POST'
94
+ )
95
+
96
+ try:
97
+ with urllib.request.urlopen(req, timeout=timeout_s) as resp:
98
+ result = json.loads(resp.read().decode())
99
+ return result.get('response', '')
100
+ except Exception as e:
101
+ return f"ERROR: {e}"
102
+
103
+
104
+ # =============================================================================
105
+ # 2. PROMPT BUILDING
106
+ # =============================================================================
107
+
108
+ def build_prompt(task: Dict) -> str:
109
+ """Build prompt for ARC task."""
110
+ train_pairs = task.get('train', [])
111
+
112
+ examples = []
113
+ for i, pair in enumerate(train_pairs):
114
+ examples.append(
115
+ f"Example {i+1}:\n"
116
+ f" Input: {json.dumps(pair['input'])}\n"
117
+ f" Output: {json.dumps(pair['output'])}"
118
+ )
119
+ examples_str = "\n".join(examples)
120
+
121
+ # Basic analysis
122
+ inputs = [np.array(p['input']) for p in train_pairs]
123
+ outputs = [np.array(p['output']) for p in train_pairs]
124
+ same_shape = all(i.shape == o.shape for i, o in zip(inputs, outputs))
125
+ in_colors = sorted(set(c for i in inputs for c in np.unique(i).tolist()))
126
+ out_colors = sorted(set(c for o in outputs for c in np.unique(o).tolist()))
127
+
128
+ analysis = f" Same input/output shape: {same_shape}\n"
129
+ analysis += f" Input colors: {in_colors}\n"
130
+ analysis += f" Output colors: {out_colors}\n"
131
+ if not same_shape:
132
+ ratios = [(o.shape[0]/i.shape[0], o.shape[1]/i.shape[1])
133
+ for i, o in zip(inputs, outputs)]
134
+ analysis += f" Shape ratios (h,w): {ratios}\n"
135
+
136
+ prompt = f"""Solve this ARC-AGI puzzle. Write ONLY a Python function, no explanations.
137
+
138
+ {examples_str}
139
+
140
+ Analysis:
141
+ {analysis}
142
+ Write a complete Python function that transforms any input grid to its output.
143
+ The function MUST work correctly for ALL examples above.
144
+
145
+ ```python
146
+ import numpy as np
147
+ from collections import Counter
148
+
149
+ def transform(grid: list[list[int]]) -> list[list[int]]:
150
+ grid = np.array(grid)
151
+ """
152
+ return prompt
153
+
154
+
155
+ # =============================================================================
156
+ # 3. CODE EXTRACTION AND VERIFICATION
157
+ # =============================================================================
158
+
159
+ def extract_code(response: str) -> Optional[str]:
160
+ """Extract Python function from LLM response."""
161
+ # Try ```python blocks
162
+ for pattern in [r'```python\s*(.*?)```', r'```\s*(.*?)```']:
163
+ matches = re.findall(pattern, response, re.DOTALL)
164
+ for match in matches:
165
+ if 'def transform' in match:
166
+ return match.strip()
167
+
168
+ # Try finding def transform directly
169
+ idx = response.find('def transform')
170
+ if idx >= 0:
171
+ # Look backwards for imports
172
+ before = response[:idx]
173
+ import_start = before.rfind('import ')
174
+ if import_start >= 0:
175
+ code = response[import_start:]
176
+ else:
177
+ code = response[idx:]
178
+ # Trim at next ``` or double newline after function ends
179
+ end = code.find('```')
180
+ if end > 0:
181
+ code = code[:end]
182
+ return code.strip()
183
+
184
+ # If response itself looks like code (starts with import or def)
185
+ stripped = response.strip()
186
+ if stripped.startswith('import') or stripped.startswith('def transform'):
187
+ return stripped
188
+
189
+ return None
190
+
191
+
192
+ def verify_program(code: str, train_pairs: List[Dict]) -> bool:
193
+ """Execute program and verify against all training pairs."""
194
+ namespace = {'np': np, 'numpy': np, 'Counter': Counter,
195
+ 'collections': __import__('collections')}
196
+
197
+ try:
198
+ exec(code, namespace)
199
+ except Exception:
200
+ return False
201
+
202
+ if 'transform' not in namespace:
203
+ return False
204
+
205
+ transform_fn = namespace['transform']
206
+
207
+ for pair in train_pairs:
208
+ try:
209
+ inp = [row[:] for row in pair['input']] # deep copy
210
+ result = transform_fn(inp)
211
+ if result is None:
212
+ return False
213
+ result_arr = np.array(result, dtype=int)
214
+ expected_arr = np.array(pair['output'], dtype=int)
215
+ if result_arr.shape != expected_arr.shape:
216
+ return False
217
+ if not np.array_equal(result_arr, expected_arr):
218
+ return False
219
+ except Exception:
220
+ return False
221
+
222
+ return True
223
+
224
+
225
+ def apply_program(code: str, test_input: List[List[int]]) -> Optional[List[List[int]]]:
226
+ """Apply verified program to test input."""
227
+ namespace = {'np': np, 'numpy': np, 'Counter': Counter,
228
+ 'collections': __import__('collections')}
229
+ try:
230
+ exec(code, namespace)
231
+ result = namespace['transform']([row[:] for row in test_input])
232
+ if result is not None:
233
+ return [list(row) for row in np.array(result, dtype=int).tolist()]
234
+ except Exception:
235
+ pass
236
+ return None
237
+
238
+
239
+ # =============================================================================
240
+ # 4. SYNTHESIS ENGINE
241
+ # =============================================================================
242
+
243
+ def synthesize_task(task: Dict, model: str = "qwen2.5-coder:32b",
244
+ n_candidates: int = 8, verbose: bool = False) -> Optional[Tuple[str, str]]:
245
+ """
246
+ Try to solve a task via LLM.
247
+ Returns (rule_name, code) if successful, None otherwise.
248
+ """
249
+ train_pairs = task.get('train', [])
250
+ if not train_pairs:
251
+ return None
252
+
253
+ prompt = build_prompt(task)
254
+
255
+ for i in range(n_candidates):
256
+ temp = 0.1 if i == 0 else 0.5 + 0.1 * i # first try low temp, then increase
257
+ response = call_ollama(prompt, model=model, temperature=min(temp, 1.0))
258
+
259
+ if response.startswith("ERROR:"):
260
+ if verbose:
261
+ print(f" Candidate {i+1}: API error")
262
+ continue
263
+
264
+ code = extract_code(response)
265
+ if code is None:
266
+ if verbose:
267
+ print(f" Candidate {i+1}: No code extracted")
268
+ continue
269
+
270
+ if verbose:
271
+ print(f" Candidate {i+1}: {len(code)} chars", end="")
272
+
273
+ if verify_program(code, train_pairs):
274
+ if verbose:
275
+ print(f" ✅")
276
+ return (f"llm_c{i+1}_t{temp:.1f}", code)
277
+ else:
278
+ if verbose:
279
+ print(f" ❌")
280
+
281
+ return None
282
+
283
+
284
+ # =============================================================================
285
+ # 5. MAIN RUNNER
286
+ # =============================================================================
287
+
288
+ def main():
289
+ # --- Configuration ---
290
+ MODEL = os.environ.get("OLLAMA_MODEL", "qwen2.5-coder:32b")
291
+ # For smaller GPUs, use:
292
+ # MODEL = "qwen2.5-coder:14b" (fits T4 16GB)
293
+ # MODEL = "qwen2.5-coder:7b" (fits any GPU)
294
+
295
+ N_CANDIDATES = int(os.environ.get("N_CANDIDATES", "8"))
296
+ ARC_DIR = os.environ.get("ARC_DIR", "arc_data/training")
297
+ ALREADY_SOLVED_FILE = os.environ.get("ALREADY_SOLVED", "already_solved.json")
298
+ OUTPUT_FILE = os.environ.get("OUTPUT_FILE", "llm_results.json")
299
+
300
+ print("=" * 60)
301
+ print("PEMF ARC-AGI — LLM Program Synthesis (Kaggle/Ollama)")
302
+ print("=" * 60)
303
+ print(f"Model: {MODEL}")
304
+ print(f"Candidates per task: {N_CANDIDATES}")
305
+ print(f"ARC data: {ARC_DIR}")
306
+ print()
307
+
308
+ # --- Install & start Ollama ---
309
+ try:
310
+ subprocess.run(["ollama", "--version"], capture_output=True, check=True)
311
+ print("Ollama already installed.")
312
+ except (FileNotFoundError, subprocess.CalledProcessError):
313
+ install_ollama()
314
+
315
+ server = start_ollama()
316
+
317
+ try:
318
+ pull_model(MODEL)
319
+ except Exception as e:
320
+ print(f"Failed to pull {MODEL}: {e}")
321
+ print("Trying smaller model...")
322
+ MODEL = "qwen2.5-coder:7b"
323
+ pull_model(MODEL)
324
+
325
+ # --- Load already solved tasks ---
326
+ already_solved = set()
327
+ if os.path.exists(ALREADY_SOLVED_FILE):
328
+ with open(ALREADY_SOLVED_FILE) as f:
329
+ already_solved = set(json.load(f))
330
+ print(f"Already solved (symbolic): {len(already_solved)} tasks")
331
+
332
+ # --- Load ARC tasks ---
333
+ import glob
334
+ task_files = sorted(glob.glob(os.path.join(ARC_DIR, "*.json")))
335
+ print(f"Total ARC tasks: {len(task_files)}")
336
+
337
+ unsolved_files = []
338
+ for tf in task_files:
339
+ tid = os.path.basename(tf).replace('.json', '')
340
+ if tid not in already_solved:
341
+ unsolved_files.append((tid, tf))
342
+ print(f"Unsolved tasks to try: {len(unsolved_files)}")
343
+ print()
344
+
345
+ # --- Run synthesis ---
346
+ results = {}
347
+ solved = 0
348
+ total_time = 0
349
+
350
+ for idx, (tid, tf) in enumerate(unsolved_files):
351
+ with open(tf) as f:
352
+ task = json.load(f)
353
+
354
+ print(f"[{idx+1:3d}/{len(unsolved_files)}] {tid}:", end=" ", flush=True)
355
+ start = time.time()
356
+
357
+ result = synthesize_task(task, model=MODEL, n_candidates=N_CANDIDATES, verbose=False)
358
+ elapsed = time.time() - start
359
+ total_time += elapsed
360
+
361
+ if result:
362
+ rule_name, code = result
363
+ solved += 1
364
+
365
+ # Apply to test pairs
366
+ test_outputs = []
367
+ for test in task.get('test', []):
368
+ out = apply_program(code, test['input'])
369
+ test_outputs.append(out)
370
+
371
+ results[tid] = {
372
+ 'status': 'solved',
373
+ 'rule': rule_name,
374
+ 'code': code,
375
+ 'test_outputs': test_outputs,
376
+ 'time_s': round(elapsed, 2),
377
+ }
378
+ print(f"✅ {rule_name} ({elapsed:.1f}s)")
379
+ else:
380
+ results[tid] = {
381
+ 'status': 'failed',
382
+ 'time_s': round(elapsed, 2),
383
+ }
384
+ print(f"❌ ({elapsed:.1f}s)")
385
+
386
+ # Save progress periodically
387
+ if (idx + 1) % 10 == 0:
388
+ with open(OUTPUT_FILE, 'w') as f:
389
+ json.dump({
390
+ 'model': MODEL,
391
+ 'n_candidates': N_CANDIDATES,
392
+ 'solved': solved,
393
+ 'attempted': idx + 1,
394
+ 'total_time_s': round(total_time, 1),
395
+ 'results': results,
396
+ }, f, indent=2)
397
+ print(f" [Progress saved: {solved}/{idx+1} solved]")
398
+
399
+ # --- Final save ---
400
+ with open(OUTPUT_FILE, 'w') as f:
401
+ json.dump({
402
+ 'model': MODEL,
403
+ 'n_candidates': N_CANDIDATES,
404
+ 'solved': solved,
405
+ 'attempted': len(unsolved_files),
406
+ 'total_time_s': round(total_time, 1),
407
+ 'already_solved_symbolic': len(already_solved),
408
+ 'total_solved': len(already_solved) + solved,
409
+ 'total_tasks': len(task_files),
410
+ 'solve_rate': round(100 * (len(already_solved) + solved) / len(task_files), 2),
411
+ 'results': results,
412
+ }, f, indent=2)
413
+
414
+ # --- Summary ---
415
+ print()
416
+ print("=" * 60)
417
+ print("FINAL RESULTS")
418
+ print("=" * 60)
419
+ print(f"LLM solved: {solved}/{len(unsolved_files)} unsolved tasks")
420
+ print(f"Symbolic solved: {len(already_solved)}")
421
+ print(f"TOTAL SOLVED: {len(already_solved) + solved}/{len(task_files)} ({100*(len(already_solved)+solved)/len(task_files):.1f}%)")
422
+ print(f"Total LLM time: {total_time:.0f}s ({total_time/max(1,len(unsolved_files)):.1f}s/task)")
423
+ print(f"Results saved to: {OUTPUT_FILE}")
424
+
425
+ # Cleanup
426
+ server.terminate()
427
+
428
+
429
+ # =============================================================================
430
+ # 6. GENERATE already_solved.json FROM SYMBOLIC RESULTS
431
+ # =============================================================================
432
+
433
+ def generate_already_solved(summary_file: str, output_file: str = "already_solved.json"):
434
+ """
435
+ Generate already_solved.json from a v4 summary file.
436
+ Run this BEFORE running on Kaggle.
437
+ """
438
+ with open(summary_file) as f:
439
+ data = json.load(f)
440
+ solved = [r['task_id'] for r in data['results'] if r.get('all_train_solved')]
441
+ with open(output_file, 'w') as f:
442
+ json.dump(solved, f)
443
+ print(f"Wrote {len(solved)} solved task IDs to {output_file}")
444
+
445
+
446
+ if __name__ == "__main__":
447
+ # If run with --generate-solved, create the already_solved.json
448
+ if len(sys.argv) > 1 and sys.argv[1] == "--generate-solved":
449
+ summary = sys.argv[2] if len(sys.argv) > 2 else "arc_results/summary_v4.json"
450
+ generate_already_solved(summary)
451
+ else:
452
+ main()
pemf/scripts/llm_solver_cloud.py ADDED
@@ -0,0 +1,483 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ PEMF ARC-AGI — LLM Program Synthesis (Multi-Provider)
3
+ =====================================================
4
+
5
+ Supports:
6
+ - NVIDIA NIM (free — DeepSeek V4 Pro, GLM-4, Qwen, Llama)
7
+ - Google Gemini (free tier: 15 RPM)
8
+ - DeepSeek direct API (very cheap)
9
+ - GLM/Zhipu direct API (free tier)
10
+ - Ollama local (any model)
11
+
12
+ Usage:
13
+ # NVIDIA NIM — FREE, best option (GLM 4.7 default)
14
+ export LLM_PROVIDER=nvidia
15
+ export NVIDIA_API_KEY=nvapi-xxxxx
16
+ python llm_solver_cloud.py
17
+ # Get key: https://build.nvidia.com/settings/api-keys
18
+ # Default model: z-ai/glm4.7
19
+
20
+ # NVIDIA NIM with DeepSeek V4
21
+ export LLM_PROVIDER=nvidia
22
+ export NVIDIA_API_KEY=nvapi-xxxxx
23
+ export LLM_MODEL=deepseek-ai/deepseek-v4-pro
24
+ python llm_solver_cloud.py
25
+
26
+ # Gemini (free)
27
+ export LLM_PROVIDER=gemini
28
+ export GEMINI_API_KEY=your_key
29
+ python llm_solver_cloud.py
30
+
31
+ # Ollama local
32
+ export LLM_PROVIDER=ollama
33
+ export OLLAMA_MODEL=qwen2.5-coder:32b
34
+ python llm_solver_cloud.py
35
+ """
36
+
37
+ import os
38
+ import sys
39
+ import json
40
+ import time
41
+ import re
42
+ import glob
43
+ import numpy as np
44
+ from typing import Dict, List, Optional, Tuple
45
+ from collections import Counter
46
+ import urllib.request
47
+
48
+
49
+ # =============================================================================
50
+ # PROVIDER CONFIGS
51
+ # =============================================================================
52
+
53
+ PROVIDERS = {
54
+ "nvidia": {
55
+ "name": "NVIDIA NIM (free — DeepSeek V4, GLM 4.7, Qwen, Llama)",
56
+ "base_url": "https://integrate.api.nvidia.com/v1/chat/completions",
57
+ "default_model": "z-ai/glm4.7",
58
+ "env_key": "NVIDIA_API_KEY",
59
+ "free_tier": "Free for NVIDIA Developer Program members",
60
+ "get_key_url": "https://build.nvidia.com/settings/api-keys",
61
+ "models": {
62
+ "glm4.7": "z-ai/glm4.7",
63
+ "deepseek-v4": "deepseek-ai/deepseek-v4-pro",
64
+ },
65
+ },
66
+ "gemini": {
67
+ "name": "Google Gemini",
68
+ "base_url": "https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent",
69
+ "default_model": "gemini-2.0-flash",
70
+ "env_key": "GEMINI_API_KEY",
71
+ "free_tier": "15 RPM, 1M tokens/day",
72
+ "get_key_url": "https://aistudio.google.com/apikey",
73
+ },
74
+ "deepseek": {
75
+ "name": "DeepSeek (direct API)",
76
+ "base_url": "https://api.deepseek.com/v1/chat/completions",
77
+ "default_model": "deepseek-chat",
78
+ "env_key": "DEEPSEEK_API_KEY",
79
+ "free_tier": "$0.07/M input, $0.27/M output",
80
+ "get_key_url": "https://platform.deepseek.com/api_keys",
81
+ },
82
+ "glm": {
83
+ "name": "GLM (Zhipu AI direct)",
84
+ "base_url": "https://open.bigmodel.cn/api/paas/v4/chat/completions",
85
+ "default_model": "glm-4-flash",
86
+ "env_key": "GLM_API_KEY",
87
+ "free_tier": "glm-4-flash is free",
88
+ "get_key_url": "https://open.bigmodel.cn/usercenter/apikeys",
89
+ },
90
+ "ollama": {
91
+ "name": "Ollama (local)",
92
+ "base_url": "http://localhost:11434/api/generate",
93
+ "default_model": "qwen2.5-coder:32b",
94
+ "env_key": None,
95
+ },
96
+ }
97
+
98
+
99
+ # =============================================================================
100
+ # API CALLERS
101
+ # =============================================================================
102
+
103
+ def call_nvidia(prompt: str, api_key: str, model: str = "deepseek-ai/deepseek-v4-pro",
104
+ temperature: float = 0.7) -> str:
105
+ """Call NVIDIA NIM API (OpenAI-compatible). Hosts DeepSeek V4, GLM, Qwen, Llama."""
106
+ url = "https://integrate.api.nvidia.com/v1/chat/completions"
107
+ payload = {
108
+ "model": model,
109
+ "messages": [{"role": "user", "content": prompt}],
110
+ "max_tokens": 2048,
111
+ "temperature": temperature,
112
+ }
113
+ data = json.dumps(payload).encode('utf-8')
114
+ req = urllib.request.Request(url, data=data,
115
+ headers={"Content-Type": "application/json",
116
+ "Authorization": f"Bearer {api_key}"},
117
+ method='POST')
118
+ try:
119
+ with urllib.request.urlopen(req, timeout=120) as resp:
120
+ result = json.loads(resp.read().decode())
121
+ return result['choices'][0]['message']['content']
122
+ except Exception as e:
123
+ return f"ERROR: {e}"
124
+
125
+
126
+ def call_gemini(prompt: str, api_key: str, model: str = "gemini-2.0-flash",
127
+ temperature: float = 0.7) -> str:
128
+ """Call Google Gemini API."""
129
+ url = f"https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={api_key}"
130
+ payload = {
131
+ "contents": [{"parts": [{"text": prompt}]}],
132
+ "generationConfig": {
133
+ "temperature": temperature,
134
+ "maxOutputTokens": 2048,
135
+ }
136
+ }
137
+ data = json.dumps(payload).encode('utf-8')
138
+ req = urllib.request.Request(url, data=data,
139
+ headers={"Content-Type": "application/json"},
140
+ method='POST')
141
+ try:
142
+ with urllib.request.urlopen(req, timeout=120) as resp:
143
+ result = json.loads(resp.read().decode())
144
+ candidates = result.get('candidates', [])
145
+ if candidates:
146
+ parts = candidates[0].get('content', {}).get('parts', [])
147
+ if parts:
148
+ return parts[0].get('text', '')
149
+ return "ERROR: No response content"
150
+ except Exception as e:
151
+ return f"ERROR: {e}"
152
+
153
+
154
+ def call_deepseek(prompt: str, api_key: str, model: str = "deepseek-chat",
155
+ temperature: float = 0.7) -> str:
156
+ """Call DeepSeek API (OpenAI-compatible)."""
157
+ url = "https://api.deepseek.com/v1/chat/completions"
158
+ payload = {
159
+ "model": model,
160
+ "messages": [{"role": "user", "content": prompt}],
161
+ "max_tokens": 2048,
162
+ "temperature": temperature,
163
+ }
164
+ data = json.dumps(payload).encode('utf-8')
165
+ req = urllib.request.Request(url, data=data,
166
+ headers={"Content-Type": "application/json",
167
+ "Authorization": f"Bearer {api_key}"},
168
+ method='POST')
169
+ try:
170
+ with urllib.request.urlopen(req, timeout=120) as resp:
171
+ result = json.loads(resp.read().decode())
172
+ return result['choices'][0]['message']['content']
173
+ except Exception as e:
174
+ return f"ERROR: {e}"
175
+
176
+
177
+ def call_glm(prompt: str, api_key: str, model: str = "glm-4-flash",
178
+ temperature: float = 0.7) -> str:
179
+ """Call GLM/Zhipu API (OpenAI-compatible)."""
180
+ url = "https://open.bigmodel.cn/api/paas/v4/chat/completions"
181
+ payload = {
182
+ "model": model,
183
+ "messages": [{"role": "user", "content": prompt}],
184
+ "max_tokens": 2048,
185
+ "temperature": temperature,
186
+ }
187
+ data = json.dumps(payload).encode('utf-8')
188
+ req = urllib.request.Request(url, data=data,
189
+ headers={"Content-Type": "application/json",
190
+ "Authorization": f"Bearer {api_key}"},
191
+ method='POST')
192
+ try:
193
+ with urllib.request.urlopen(req, timeout=120) as resp:
194
+ result = json.loads(resp.read().decode())
195
+ return result['choices'][0]['message']['content']
196
+ except Exception as e:
197
+ return f"ERROR: {e}"
198
+
199
+
200
+ def call_ollama(prompt: str, model: str = "qwen2.5-coder:32b",
201
+ temperature: float = 0.7) -> str:
202
+ """Call local Ollama."""
203
+ url = "http://localhost:11434/api/generate"
204
+ payload = {
205
+ "model": model,
206
+ "prompt": prompt,
207
+ "stream": False,
208
+ "options": {"temperature": temperature, "num_predict": 2048},
209
+ }
210
+ data = json.dumps(payload).encode('utf-8')
211
+ req = urllib.request.Request(url, data=data,
212
+ headers={"Content-Type": "application/json"},
213
+ method='POST')
214
+ try:
215
+ with urllib.request.urlopen(req, timeout=180) as resp:
216
+ result = json.loads(resp.read().decode())
217
+ return result.get('response', '')
218
+ except Exception as e:
219
+ return f"ERROR: {e}"
220
+
221
+
222
+ def call_llm(prompt: str, provider: str, api_key: str = "",
223
+ model: str = "", temperature: float = 0.7) -> str:
224
+ """Unified LLM caller."""
225
+ if provider == "nvidia":
226
+ return call_nvidia(prompt, api_key, model or "deepseek-ai/deepseek-v4-pro", temperature)
227
+ elif provider == "gemini":
228
+ return call_gemini(prompt, api_key, model or "gemini-2.0-flash", temperature)
229
+ elif provider == "deepseek":
230
+ return call_deepseek(prompt, api_key, model or "deepseek-chat", temperature)
231
+ elif provider == "glm":
232
+ return call_glm(prompt, api_key, model or "glm-4-flash", temperature)
233
+ elif provider == "ollama":
234
+ return call_ollama(prompt, model or "qwen2.5-coder:32b", temperature)
235
+ else:
236
+ return f"ERROR: Unknown provider {provider}"
237
+
238
+
239
+ # =============================================================================
240
+ # PROMPT, EXTRACTION, VERIFICATION (same as before)
241
+ # =============================================================================
242
+
243
+ def build_prompt(task: Dict) -> str:
244
+ train_pairs = task.get('train', [])
245
+ examples = []
246
+ for i, pair in enumerate(train_pairs):
247
+ examples.append(
248
+ f"Example {i+1}:\n"
249
+ f" Input: {json.dumps(pair['input'])}\n"
250
+ f" Output: {json.dumps(pair['output'])}"
251
+ )
252
+ examples_str = "\n".join(examples)
253
+
254
+ inputs = [np.array(p['input']) for p in train_pairs]
255
+ outputs = [np.array(p['output']) for p in train_pairs]
256
+ same_shape = all(i.shape == o.shape for i, o in zip(inputs, outputs))
257
+ in_colors = sorted(set(c for i in inputs for c in np.unique(i).tolist()))
258
+ out_colors = sorted(set(c for o in outputs for c in np.unique(o).tolist()))
259
+
260
+ analysis = f" Same input/output shape: {same_shape}\n"
261
+ analysis += f" Input colors: {in_colors}, Output colors: {out_colors}\n"
262
+ if not same_shape:
263
+ for i, o in zip(inputs[:1], outputs[:1]):
264
+ analysis += f" Shape: {i.shape} -> {o.shape}\n"
265
+
266
+ return f"""Solve this ARC-AGI puzzle. Write ONLY a Python function, no explanations.
267
+
268
+ {examples_str}
269
+
270
+ Analysis:
271
+ {analysis}
272
+ ```python
273
+ import numpy as np
274
+ from collections import Counter, deque
275
+ from scipy.ndimage import label
276
+
277
+ def transform(grid: list[list[int]]) -> list[list[int]]:
278
+ grid = np.array(grid)
279
+ """
280
+
281
+
282
+ def extract_code(response: str) -> Optional[str]:
283
+ for pattern in [r'```python\s*(.*?)```', r'```\s*(.*?)```']:
284
+ matches = re.findall(pattern, response, re.DOTALL)
285
+ for match in matches:
286
+ if 'def transform' in match:
287
+ return match.strip()
288
+ idx = response.find('def transform')
289
+ if idx >= 0:
290
+ before = response[:idx]
291
+ import_start = max(before.rfind('import '), before.rfind('from '))
292
+ start = import_start if import_start >= 0 else idx
293
+ code = response[start:]
294
+ end = code.find('```')
295
+ if end > 0:
296
+ code = code[:end]
297
+ return code.strip()
298
+ stripped = response.strip()
299
+ if stripped.startswith(('import', 'def transform', 'from')):
300
+ return stripped
301
+ return None
302
+
303
+
304
+ def verify_program(code: str, train_pairs: List[Dict]) -> bool:
305
+ namespace = {'np': np, 'numpy': np, 'Counter': Counter,
306
+ 'deque': __import__('collections').deque}
307
+ try:
308
+ # Allow scipy import in generated code
309
+ try:
310
+ import scipy.ndimage
311
+ namespace['scipy'] = __import__('scipy')
312
+ except ImportError:
313
+ pass
314
+ exec(code, namespace)
315
+ except Exception:
316
+ return False
317
+ if 'transform' not in namespace:
318
+ return False
319
+ fn = namespace['transform']
320
+ for pair in train_pairs:
321
+ try:
322
+ result = fn([row[:] for row in pair['input']])
323
+ if result is None:
324
+ return False
325
+ r = np.array(result, dtype=int)
326
+ e = np.array(pair['output'], dtype=int)
327
+ if r.shape != e.shape or not np.array_equal(r, e):
328
+ return False
329
+ except Exception:
330
+ return False
331
+ return True
332
+
333
+
334
+ def apply_program(code: str, test_input):
335
+ namespace = {'np': np, 'numpy': np, 'Counter': Counter,
336
+ 'deque': __import__('collections').deque}
337
+ try:
338
+ import scipy.ndimage
339
+ namespace['scipy'] = __import__('scipy')
340
+ except ImportError:
341
+ pass
342
+ try:
343
+ exec(code, namespace)
344
+ result = namespace['transform']([row[:] for row in test_input])
345
+ if result is not None:
346
+ return np.array(result, dtype=int).tolist()
347
+ except Exception:
348
+ pass
349
+ return None
350
+
351
+
352
+ # =============================================================================
353
+ # SYNTHESIS + MAIN
354
+ # =============================================================================
355
+
356
+ def synthesize_task(task, provider, api_key, model, n_candidates=8, verbose=False):
357
+ prompt = build_prompt(task)
358
+ for i in range(n_candidates):
359
+ temp = 0.1 if i == 0 else min(0.4 + 0.15 * i, 1.2)
360
+ response = call_llm(prompt, provider, api_key, model, temp)
361
+ if response.startswith("ERROR:"):
362
+ if verbose: print(f" C{i+1}: {response[:60]}")
363
+ # Rate limit — wait and retry
364
+ if "429" in response or "rate" in response.lower():
365
+ time.sleep(5)
366
+ continue
367
+ code = extract_code(response)
368
+ if code is None:
369
+ if verbose: print(f" C{i+1}: no code")
370
+ continue
371
+ if verbose: print(f" C{i+1}: {len(code)}ch", end="")
372
+ if verify_program(code, task['train']):
373
+ if verbose: print(" ✅")
374
+ return (f"llm_c{i+1}", code)
375
+ else:
376
+ if verbose: print(" ❌")
377
+ return None
378
+
379
+
380
+ def main():
381
+ PROVIDER = os.environ.get("LLM_PROVIDER", "gemini")
382
+ config = PROVIDERS.get(PROVIDER, {})
383
+ API_KEY = os.environ.get(config.get("env_key", ""), "") if config.get("env_key") else ""
384
+ MODEL = os.environ.get("LLM_MODEL", config.get("default_model", ""))
385
+ N_CANDIDATES = int(os.environ.get("N_CANDIDATES", "8"))
386
+ ARC_DIR = os.environ.get("ARC_DIR", "arc_data/training")
387
+ ALREADY_SOLVED = os.environ.get("ALREADY_SOLVED", "already_solved.json")
388
+ OUTPUT = os.environ.get("OUTPUT_FILE", "llm_results.json")
389
+
390
+ print("=" * 60)
391
+ print(f"PEMF ARC-AGI — LLM Synthesis ({config.get('name', PROVIDER)})")
392
+ print("=" * 60)
393
+ print(f"Provider: {PROVIDER}")
394
+ print(f"Model: {MODEL}")
395
+ print(f"Candidates/task: {N_CANDIDATES}")
396
+ if not API_KEY and PROVIDER != "ollama":
397
+ print(f"\n⚠️ No API key! Set {config.get('env_key', '???')}")
398
+ print(f" Get key: {config.get('get_key_url', '?')}")
399
+ return
400
+ print()
401
+
402
+ # Load already solved
403
+ already_solved = set()
404
+ if os.path.exists(ALREADY_SOLVED):
405
+ with open(ALREADY_SOLVED) as f:
406
+ already_solved = set(json.load(f))
407
+ print(f"Symbolic solved: {len(already_solved)}")
408
+
409
+ # Load tasks
410
+ task_files = sorted(glob.glob(os.path.join(ARC_DIR, "*.json")))
411
+ unsolved = [(os.path.basename(tf).replace('.json',''), tf)
412
+ for tf in task_files
413
+ if os.path.basename(tf).replace('.json','') not in already_solved]
414
+ print(f"Total tasks: {len(task_files)}, unsolved: {len(unsolved)}")
415
+ print()
416
+
417
+ # Run
418
+ results = {}
419
+ solved = 0
420
+ total_time = 0
421
+
422
+ for idx, (tid, tf) in enumerate(unsolved):
423
+ with open(tf) as f:
424
+ task = json.load(f)
425
+ print(f"[{idx+1:3d}/{len(unsolved)}] {tid}:", end=" ", flush=True)
426
+ start = time.time()
427
+ result = synthesize_task(task, PROVIDER, API_KEY, MODEL, N_CANDIDATES, verbose=False)
428
+ elapsed = time.time() - start
429
+ total_time += elapsed
430
+
431
+ if result:
432
+ rule, code = result
433
+ solved += 1
434
+ test_outputs = [apply_program(code, t['input']) for t in task.get('test', [])]
435
+ results[tid] = {'status': 'solved', 'rule': rule, 'code': code,
436
+ 'test_outputs': test_outputs, 'time_s': round(elapsed, 2)}
437
+ print(f"✅ ({elapsed:.1f}s)")
438
+ else:
439
+ results[tid] = {'status': 'failed', 'time_s': round(elapsed, 2)}
440
+ print(f"❌ ({elapsed:.1f}s)")
441
+
442
+ # Rate limit respect
443
+ if PROVIDER == "gemini":
444
+ time.sleep(4) # 15 RPM = 1 every 4s
445
+ elif PROVIDER == "nvidia":
446
+ time.sleep(2) # NIM free tier: ~30 RPM
447
+ elif PROVIDER in ("deepseek", "glm"):
448
+ time.sleep(1)
449
+
450
+ # Save every 10
451
+ if (idx + 1) % 10 == 0:
452
+ _save(OUTPUT, PROVIDER, MODEL, N_CANDIDATES, solved, idx+1,
453
+ total_time, already_solved, len(task_files), results)
454
+ print(f" [Saved: {solved}/{idx+1}, total {len(already_solved)+solved}/{len(task_files)}]")
455
+
456
+ # Final save
457
+ _save(OUTPUT, PROVIDER, MODEL, N_CANDIDATES, solved, len(unsolved),
458
+ total_time, already_solved, len(task_files), results)
459
+
460
+ print(f"\n{'='*60}")
461
+ print(f"LLM solved: {solved}/{len(unsolved)}")
462
+ print(f"Symbolic: {len(already_solved)}")
463
+ print(f"TOTAL: {len(already_solved)+solved}/{len(task_files)} ({100*(len(already_solved)+solved)/len(task_files):.1f}%)")
464
+ print(f"Saved: {OUTPUT}")
465
+
466
+
467
+ def _save(path, provider, model, n_cand, solved, attempted, total_time,
468
+ already_solved, total_tasks, results):
469
+ with open(path, 'w') as f:
470
+ json.dump({
471
+ 'provider': provider, 'model': model, 'n_candidates': n_cand,
472
+ 'llm_solved': solved, 'attempted': attempted,
473
+ 'total_time_s': round(total_time, 1),
474
+ 'symbolic_solved': len(already_solved),
475
+ 'total_solved': len(already_solved) + solved,
476
+ 'total_tasks': total_tasks,
477
+ 'solve_rate': round(100*(len(already_solved)+solved)/total_tasks, 2),
478
+ 'results': results,
479
+ }, f, indent=2)
480
+
481
+
482
+ if __name__ == "__main__":
483
+ main()
pemf/scripts/merge_results.py ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Merge LLM results with symbolic results to get final solve count.
3
+
4
+ Usage:
5
+ python merge_results.py arc_results/summary_v4.json llm_results.json
6
+ """
7
+ import json
8
+ import sys
9
+
10
+
11
+ def merge(symbolic_file: str, llm_file: str, output_file: str = "arc_results/summary_final.json"):
12
+ with open(symbolic_file) as f:
13
+ symbolic = json.load(f)
14
+ with open(llm_file) as f:
15
+ llm = json.load(f)
16
+
17
+ symbolic_solved = {r['task_id'] for r in symbolic['results'] if r.get('all_train_solved')}
18
+ llm_solved = {tid for tid, r in llm['results'].items() if r['status'] == 'solved'}
19
+
20
+ total_solved = symbolic_solved | llm_solved
21
+ new_from_llm = llm_solved - symbolic_solved
22
+
23
+ print(f"Symbolic solved: {len(symbolic_solved)}")
24
+ print(f"LLM solved: {len(llm_solved)}")
25
+ print(f"New from LLM: {len(new_from_llm)}")
26
+ print(f"TOTAL SOLVED: {len(total_solved)}/{symbolic['total_tasks']} ({100*len(total_solved)/symbolic['total_tasks']:.1f}%)")
27
+
28
+ print(f"\nNew tasks solved by LLM:")
29
+ for tid in sorted(new_from_llm):
30
+ rule = llm['results'][tid].get('rule', '?')
31
+ print(f" {tid}: {rule}")
32
+
33
+ # Save merged
34
+ merged = {
35
+ 'total_tasks': symbolic['total_tasks'],
36
+ 'symbolic_solved': len(symbolic_solved),
37
+ 'llm_solved': len(llm_solved),
38
+ 'new_from_llm': len(new_from_llm),
39
+ 'total_solved': len(total_solved),
40
+ 'solve_rate': round(100 * len(total_solved) / symbolic['total_tasks'], 2),
41
+ 'symbolic_tasks': sorted(symbolic_solved),
42
+ 'llm_tasks': sorted(llm_solved),
43
+ 'new_llm_tasks': sorted(new_from_llm),
44
+ }
45
+ with open(output_file, 'w') as f:
46
+ json.dump(merged, f, indent=2)
47
+ print(f"\nMerged results saved to {output_file}")
48
+
49
+
50
+ if __name__ == "__main__":
51
+ sym = sys.argv[1] if len(sys.argv) > 1 else "arc_results/summary_v4.json"
52
+ llm = sys.argv[2] if len(sys.argv) > 2 else "llm_results.json"
53
+ merge(sym, llm)
pemf/scripts/run_all_arc.py ADDED
@@ -0,0 +1,183 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Run the PEMF solver on all ARC-AGI tasks and report solve rates.
3
+
4
+ For each task, the solver tries every training pair. A task is "solved"
5
+ if the solver achieves σ=0 on ALL training pairs.
6
+
7
+ Usage:
8
+ 1. Download the ARC dataset into arc_data/training/:
9
+ git clone https://github.com/fchollet/ARC-AGI.git /tmp/arc
10
+ cp -r /tmp/arc/data/training arc_data/training
11
+ 2. Run:
12
+ python scripts/run_all_arc.py
13
+
14
+ Outputs:
15
+ arc_results/summary.json — per-task results
16
+ arc_results/report.txt — human-readable report
17
+ """
18
+ import os, json, time, glob
19
+
20
+ import numpy as np
21
+ from itt_solver.solver_core import initialize_potential, sigma_l1
22
+ from itt_solver.beam_logging import beam_minimize_with_log
23
+ from itt_solver.experiment_driver import default_atomic_factory
24
+
25
+ ARC_DIR = os.environ.get("ARC_DIR", "arc_data/training")
26
+ OUT_DIR = os.environ.get("OUT_DIR", "arc_results")
27
+ os.makedirs(OUT_DIR, exist_ok=True)
28
+
29
+ PARAMS = {
30
+ 'beam_width': 8,
31
+ 'max_depth': 2,
32
+ 'lock_coeff': 0.0,
33
+ 'max_fraction': 1.0,
34
+ 'use_symmetry': True,
35
+ 'use_gravity': True,
36
+ 'use_color_ops': True,
37
+ 'boundary_source': 'target',
38
+ }
39
+
40
+ def solve_pair(inp, out, params):
41
+ """Run solver on one input→output pair. Returns (sigma, transform_name, time_s)."""
42
+ h, w = len(out), len(out[0])
43
+ task = {
44
+ 'name': 'pair',
45
+ 'input': inp,
46
+ 'target': out,
47
+ 'target_shape': (h, w),
48
+ }
49
+ atomic_lib = default_atomic_factory(params, task)
50
+ phi_in = initialize_potential(inp)
51
+ phi_target = initialize_potential(out)
52
+
53
+ start = time.time()
54
+ T_best, phi_best, states, sigmas, logs = beam_minimize_with_log(
55
+ phi_in, phi_target, atomic_lib,
56
+ beam_width=params['beam_width'],
57
+ max_depth=params['max_depth'],
58
+ lock_coeff=params['lock_coeff'],
59
+ max_fraction=params['max_fraction'],
60
+ allowed_symbols=list(range(10)),
61
+ enable_layer_minus_one=False,
62
+ boundary_source=params['boundary_source'],
63
+ )
64
+ elapsed = time.time() - start
65
+ final_sigma = float(sigmas[-1]) if sigmas else float('inf')
66
+ return final_sigma, repr(T_best), elapsed
67
+
68
+ def run_all():
69
+ task_files = sorted(glob.glob(os.path.join(ARC_DIR, "*.json")))
70
+ print(f"Running solver on {len(task_files)} ARC training tasks...")
71
+ print(f"Params: beam_width={PARAMS['beam_width']}, max_depth={PARAMS['max_depth']}")
72
+ print()
73
+
74
+ results = []
75
+ solved_count = 0
76
+ partial_count = 0
77
+ total_time = 0
78
+
79
+ for ti, tf in enumerate(task_files):
80
+ task_id = os.path.basename(tf).replace('.json', '')
81
+ with open(tf) as fh:
82
+ task_data = json.load(fh)
83
+
84
+ train_pairs = task_data.get('train', [])
85
+ test_pairs = task_data.get('test', [])
86
+
87
+ pair_results = []
88
+ all_zero = True
89
+ best_sigma = float('inf')
90
+ best_transform = None
91
+
92
+ for pi, pair in enumerate(train_pairs):
93
+ sigma, transform, elapsed = solve_pair(pair['input'], pair['output'], PARAMS)
94
+ total_time += elapsed
95
+ pair_results.append({
96
+ 'pair': pi, 'sigma': sigma,
97
+ 'transform': transform, 'time_s': round(elapsed, 4),
98
+ })
99
+ if sigma > 0:
100
+ all_zero = False
101
+ if sigma < best_sigma:
102
+ best_sigma = sigma
103
+ best_transform = transform
104
+
105
+ test_results = []
106
+ test_solved = None
107
+ for pi, pair in enumerate(test_pairs):
108
+ if 'output' in pair:
109
+ sigma, transform, elapsed = solve_pair(pair['input'], pair['output'], PARAMS)
110
+ total_time += elapsed
111
+ test_results.append({
112
+ 'pair': pi, 'sigma': sigma,
113
+ 'transform': transform, 'time_s': round(elapsed, 4),
114
+ })
115
+ if test_solved is None:
116
+ test_solved = True
117
+ if sigma > 0:
118
+ test_solved = False
119
+
120
+ status = "SOLVED" if all_zero else "PARTIAL" if best_sigma < float('inf') and best_sigma > 0 else "FAILED"
121
+ if all_zero:
122
+ solved_count += 1
123
+ elif best_sigma < float('inf'):
124
+ partial_count += 1
125
+
126
+ results.append({
127
+ 'task_id': task_id, 'status': status,
128
+ 'train_pairs': len(train_pairs), 'all_train_solved': all_zero,
129
+ 'best_sigma': best_sigma, 'best_transform': best_transform,
130
+ 'pair_results': pair_results,
131
+ 'test_results': test_results, 'test_solved': test_solved,
132
+ })
133
+
134
+ if (ti + 1) % 20 == 0 or all_zero:
135
+ marker = "✅" if all_zero else " "
136
+ print(f"[{ti+1:3d}/{len(task_files)}] {task_id}: {status} (best σ={best_sigma:.1f}) {marker}")
137
+
138
+ failed_count = len(task_files) - solved_count - partial_count
139
+ print(f"\n{'='*60}")
140
+ print(f"RESULTS: {len(task_files)} tasks")
141
+ print(f" SOLVED (σ=0 all train pairs): {solved_count} ({100*solved_count/len(task_files):.1f}%)")
142
+ print(f" PARTIAL (σ>0 but finite): {partial_count}")
143
+ print(f" FAILED: {failed_count}")
144
+ print(f" Total time: {total_time:.1f}s ({total_time/len(task_files):.2f}s/task)")
145
+
146
+ summary = {
147
+ 'total_tasks': len(task_files), 'solved': solved_count,
148
+ 'partial': partial_count, 'failed': failed_count,
149
+ 'solve_rate': round(100 * solved_count / len(task_files), 2),
150
+ 'params': PARAMS, 'total_time_s': round(total_time, 2),
151
+ 'results': results,
152
+ }
153
+ with open(os.path.join(OUT_DIR, 'summary.json'), 'w') as fh:
154
+ json.dump(summary, fh, indent=2)
155
+
156
+ solved_tasks = [r for r in results if r['all_train_solved']]
157
+ print(f"\nSolved tasks:")
158
+ for r in solved_tasks:
159
+ print(f" {r['task_id']}: {r['best_transform']}")
160
+
161
+ partial_tasks = sorted(
162
+ [r for r in results if not r['all_train_solved'] and r['best_sigma'] < float('inf')],
163
+ key=lambda r: r['best_sigma']
164
+ )
165
+ print(f"\nTop 20 closest-to-solving:")
166
+ for r in partial_tasks[:20]:
167
+ print(f" {r['task_id']}: σ={r['best_sigma']:.1f} ({r['best_transform']})")
168
+
169
+ with open(os.path.join(OUT_DIR, 'report.txt'), 'w') as fh:
170
+ fh.write(f"PEMF Solver — ARC-AGI Training Set Results\n{'='*60}\n")
171
+ fh.write(f"Total tasks: {len(task_files)}\n")
172
+ fh.write(f"Solved: {solved_count} ({100*solved_count/len(task_files):.1f}%)\n")
173
+ fh.write(f"Partial: {partial_count}\nFailed: {failed_count}\n")
174
+ fh.write(f"Time: {total_time:.1f}s\n\n")
175
+ fh.write(f"Params: {json.dumps(PARAMS, indent=2)}\n\n")
176
+ fh.write(f"Solved tasks:\n")
177
+ for r in solved_tasks:
178
+ fh.write(f" {r['task_id']}: {r['best_transform']}\n")
179
+
180
+ print(f"\nResults saved to {OUT_DIR}/")
181
+
182
+ if __name__ == '__main__':
183
+ run_all()
pemf/tests/test_transforms.py ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Unit tests for all transforms in itt_solver.transforms.
3
+
4
+ Usage:
5
+ python tests/test_transforms.py
6
+
7
+ 40 tests covering: Kronecker, mirror tiles, upscale, downscale, stack,
8
+ rotate, reflect, color ops, gravity, crop, transpose, shifted tile,
9
+ fill enclosed.
10
+ """
11
+ import numpy as np
12
+ from itt_solver import transforms as tr
13
+
14
+ INP = np.array([[0,7,7],[7,7,7],[0,7,7]], dtype=float)
15
+
16
+ tests_passed = 0
17
+ tests_failed = 0
18
+
19
+ def check(name, condition):
20
+ global tests_passed, tests_failed
21
+ if condition:
22
+ print(f" ✅ {name}")
23
+ tests_passed += 1
24
+ else:
25
+ print(f" ❌ {name}")
26
+ tests_failed += 1
27
+
28
+ print("=== Kronecker Self-Similar ===")
29
+ T = tr.KroneckerSelfSimilar()
30
+ out = T.apply(INP)
31
+ check("Output shape is 9x9", out.shape == (9, 9))
32
+ check("σ=0 vs known target", np.array_equal(out, np.kron((INP!=0).astype(float), INP)))
33
+
34
+ print("\n=== KroneckerSelfSimilarInv ===")
35
+ T = tr.KroneckerSelfSimilarInv()
36
+ out = T.apply(INP)
37
+ check("Output shape is 9x9", out.shape == (9, 9))
38
+
39
+ print("\n=== MirrorTileH ===")
40
+ T = tr.MirrorTileH()
41
+ out = T.apply(INP)
42
+ check("Shape is 3x6", out.shape == (3, 6))
43
+ check("Left half is input", np.array_equal(out[:, :3], INP))
44
+ check("Right half is fliplr(input)", np.array_equal(out[:, 3:], np.fliplr(INP)))
45
+
46
+ print("\n=== MirrorTileV ===")
47
+ T = tr.MirrorTileV()
48
+ out = T.apply(INP)
49
+ check("Shape is 6x3", out.shape == (6, 3))
50
+ check("Top half is input", np.array_equal(out[:3, :], INP))
51
+ check("Bottom half is flipud(input)", np.array_equal(out[3:, :], np.flipud(INP)))
52
+
53
+ print("\n=== MirrorTile4Way ===")
54
+ T = tr.MirrorTile4Way()
55
+ out = T.apply(INP)
56
+ check("Shape is 6x6", out.shape == (6, 6))
57
+
58
+ print("\n=== Upscale 2x ===")
59
+ T = tr.Upscale(2)
60
+ out = T.apply(INP)
61
+ check("Shape is 6x6", out.shape == (6, 6))
62
+ check("Top-left 2x2 block is INP[0,0]", np.all(out[:2, :2] == INP[0, 0]))
63
+
64
+ print("\n=== Upscale 3x ===")
65
+ T = tr.Upscale(3)
66
+ out = T.apply(INP)
67
+ check("Shape is 9x9", out.shape == (9, 9))
68
+ check("Top-left 3x3 block is INP[0,0]", np.all(out[:3, :3] == INP[0, 0]))
69
+
70
+ print("\n=== Downscale 2x ===")
71
+ T = tr.Downscale(2)
72
+ big = np.kron(INP, np.ones((2, 2)))
73
+ out = T.apply(big)
74
+ check("Downscale of upscaled recovers original", np.array_equal(out, INP))
75
+
76
+ print("\n=== StackH 3 ===")
77
+ T = tr.StackH(3)
78
+ out = T.apply(INP)
79
+ check("Shape is 3x9", out.shape == (3, 9))
80
+ check("First third is input", np.array_equal(out[:, :3], INP))
81
+
82
+ print("\n=== StackV 3 ===")
83
+ T = tr.StackV(3)
84
+ out = T.apply(INP)
85
+ check("Shape is 9x3", out.shape == (9, 3))
86
+ check("First third is input", np.array_equal(out[:3, :], INP))
87
+
88
+ print("\n=== Rotate 90/180/270 ===")
89
+ for k in [1, 2, 3]:
90
+ T = tr.Rotate(k)
91
+ out = T.apply(INP)
92
+ check(f"Rotate_{90*k} matches np.rot90", np.array_equal(out, np.rot90(INP, k)))
93
+
94
+ print("\n=== Reflect h/v ===")
95
+ T = tr.Reflect('h')
96
+ check("Reflect_h matches flipud", np.array_equal(T.apply(INP), np.flipud(INP)))
97
+ T = tr.Reflect('v')
98
+ check("Reflect_v matches fliplr", np.array_equal(T.apply(INP), np.fliplr(INP)))
99
+
100
+ print("\n=== RetainColor ===")
101
+ T = tr.RetainColor(7)
102
+ out = T.apply(INP)
103
+ check("Only 7s remain", np.all(out[INP == 7] == 7))
104
+ check("Non-7 positions are 0", np.all(out[INP != 7] == 0))
105
+
106
+ print("\n=== RemoveColor ===")
107
+ T = tr.RemoveColor(7)
108
+ out = T.apply(INP)
109
+ check("7s are removed", np.all(out[INP == 7] == 0))
110
+ check("0s stay 0", np.all(out[INP == 0] == 0))
111
+
112
+ print("\n=== InvertColors ===")
113
+ T = tr.InvertColors()
114
+ out = T.apply(INP)
115
+ check("0→7 swap", np.all(out[INP == 0] == 7))
116
+ check("7→0 swap", np.all(out[INP == 7] == 0))
117
+
118
+ print("\n=== GravityDown ===")
119
+ T = tr.GravityDown()
120
+ col_in = np.array([[0,7,0],[0,0,7],[7,0,0]], dtype=float)
121
+ out = T.apply(col_in)
122
+ check("Col 0: 7 at bottom", out[2, 0] == 7 and out[0, 0] == 0 and out[1, 0] == 0)
123
+ check("Col 1: 7 at bottom", out[2, 1] == 7 and out[0, 1] == 0)
124
+
125
+ print("\n=== GravityUp ===")
126
+ T = tr.GravityUp()
127
+ out = T.apply(col_in)
128
+ check("Col 0: 7 at top", out[0, 0] == 7 and out[1, 0] == 0 and out[2, 0] == 0)
129
+
130
+ print("\n=== CropToContent ===")
131
+ T = tr.CropToContent()
132
+ padded = np.array([[0,0,0,0],[0,7,7,0],[0,7,7,0],[0,0,0,0]], dtype=float)
133
+ out = T.apply(padded)
134
+ check("Crops to 2x2", out.shape == (2, 2))
135
+ check("All 7s", np.all(out == 7))
136
+
137
+ print("\n=== Transpose ===")
138
+ T = tr.Transpose()
139
+ out = T.apply(INP)
140
+ check("Shape is transposed", out.shape == (3, 3))
141
+ check("Values match transpose", np.array_equal(out, INP.T))
142
+
143
+ print("\n=== ShiftedTile ===")
144
+ T = tr.tile_to_target_shifted(shift=(1, 1), tile_factor=3)
145
+ out = T.apply(INP)
146
+ check("Shape is 9x9", out.shape == (9, 9))
147
+ check("Differs from vanilla tile", not np.array_equal(out, np.tile(INP, (3, 3))))
148
+
149
+ print("\n=== FillEnclosedHarmonic ===")
150
+ T = tr.FillEnclosedHarmonic()
151
+ enclosed = np.array([[7,7,7],[7,0,7],[7,7,7]], dtype=float)
152
+ out = T.apply(enclosed)
153
+ check("Center hole filled", out[1, 1] == 7)
154
+
155
+ print(f"\n{'='*50}")
156
+ print(f"Results: {tests_passed} passed, {tests_failed} failed")