Roger MT commited on
Commit
7944555
·
1 Parent(s): feb08d1

delete fles

Browse files
README_PEMF.md DELETED
@@ -1,40 +0,0 @@
1
- # Pre‑Emergence Mechanics Framework (PEMF) — ARC‑AGI
2
-
3
- Short summary
4
- The Pre‑Emergence Mechanics Framework (PEMF) frames ARC tasks as a boundary‑constrained field problem solved by minimizing irreducible residue (o) under writability gates. PEMF implements four core primitives — **Scalar Potential (+)**, **Gradient Ordering (V)**, **Residue (o)**, and **Boundary Charge (p_q)** — and composes atomic transforms (tile, shifted tile, fill_enclosed, rotate, reflect, etc.) in a beam search to drain residue and produce stable outputs.
5
-
6
- Why this matters
7
- PEMF shows how ARC tasks can be solved mechanically (o‑minimization + gates) rather than by symbolic heuristics. The approach maps CTS/ITT primitives to executable operators (potential fields, gradients, Dirichlet masks, complex projections) and yields a reproducible solver recipe.
8
-
9
- Key concepts (one line each)
10
- - **Scalar Potential (+):** represent grid as numeric potential field (initialize_potential).
11
- - **Gradient Ordering (V):** discrete gradients direct admissible edits.
12
- - **Residue (o):** L1 misalignment after quantize+tile; objective to minimize.
13
- - **Boundary Charge (p_q):** Dirichlet boundary mask that enforces writability gates.
14
- - **Layer‑1 diagnostics:** complex projection (FFT imag component) to find latent edit zones when real signal is weak.
15
-
16
- Files and examples
17
- - **Skill artifacts:** `SKILLS/pre_emergence_mechanics_framework/` — howto, runnable example `references/examples/verify_pemf.py`, and README for the skill.
18
- - **Postprocess logs:** `experiments/postprocess_logs.py` — coerce gate booleans and attach candidate snapshots for offline inspection.
19
- - **Headless entry:** `scripts/entrypoint.py` — run experiments from CLI; `--use_wandb` flag is optional and defaults to off.
20
-
21
- Quick verification (headless)
22
- 1. Run the PEMF example to verify primitives and a tiny compositional loop:
23
- ```bash
24
- python SKILLS/pre_emergence_mechanics_framework/references/examples/verify_pemf.py
25
- ```
26
- 2. Run a single experiment (example):
27
- ```bash
28
- python scripts/entrypoint.py --task example1 --out_dir experiments
29
- ```
30
- 3. Postprocess logs to attach candidate snapshot and coerce gates:
31
- ```bash
32
- python experiments/postprocess_logs.py
33
- ```
34
-
35
- Acceptance checks
36
- - `verify_pemf.py` prints a residue trace and reports at least one admissible edit zone from the complex projection.
37
- - `experiments/*_phi_best.npy` and `experiments/*_logs.fixed.json` exist after a run and contain candidate snapshot and boolean gates for inspection.
38
-
39
- References and provenance
40
- This README summarizes the executable PEMF recipe derived from the ARC‑AGI exposition (PEMF / CTS / ITT). See `SKILLS/pre_emergence_mechanics_framework/references/` for runnable examples and a step‑by‑step how‑to.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
arc_results/RESULTS.md DELETED
@@ -1,34 +0,0 @@
1
- # PEMF Solver — ARC-AGI Training Set Evaluation
2
-
3
- ## Results (v4 — ITT + Predicate + DSL)
4
-
5
- | Metric | v1 | v2 | v3 | **v4** |
6
- |---|---|---|---|---|
7
- | **Tasks solved** | 31 (7.8%) | 40 (10.0%) | 47 (11.8%) | **70 (17.5%)** |
8
- | via ITT | — | — | 16 | **16** |
9
- | via Predicate | — | — | — | **25** |
10
- | via DSL | 31 | 40 | 31 | **29** |
11
- | Total time | 17s | 51s | 36s | **38s** |
12
- | Regressions | — | 0 | 0 | **0** |
13
-
14
- ## Predicate Engine Breakdown (25 new solves)
15
-
16
- | Rule Type | Tasks | Description |
17
- |---|---|---|
18
- | neighborhood_rule | 20 | CA-style: (center_color, neighbor_signature) → output_color |
19
- | global_enclosed_fill | 2 | Fill all bg regions not reachable from border |
20
- | object predicate×action | 2 | E.g. "remove smallest object" |
21
- | per_object_enclosed_fill | 1 | Fill each object's interior with its color |
22
-
23
- ## Architecture: 3-Pass Pipeline
24
-
25
- ```
26
- Task → ITT Physics → Predicate Enumeration → DSL Beam Search
27
- (16 tasks) (25 tasks) (29 tasks)
28
- ```
29
-
30
- 1. **ITT** (PhiField + σ-analysis + Fan Signatures → rule learning)
31
- 2. **Predicate** (enclosed fill → neighborhood rules → object predicate×action)
32
- 3. **DSL** (33 transforms + dual-strategy beam + greedy stacker)
33
-
34
- Each pass only runs if the previous one fails. Zero regression risk.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
arc_results/already_solved.json DELETED
@@ -1 +0,0 @@
1
- ["007bbfb7", "00d62c1b", "0d3d703e", "1190e5a7", "1cf80156", "1e0a9b12", "1f85a75f", "2013d3e2", "22168020", "22eb0ac0", "239be575", "23b5c85d", "28bf18c6", "2dee498d", "3618c87e", "3906de3d", "3aa6fb7a", "3af2c5a8", "3c9b0459", "42a50994", "4347f46a", "50cb2852", "6150a2bd", "62c24649", "67385a82", "67a3c6ac", "67e8384a", "68b16354", "6d0aefbc", "6f8cd79b", "6fa7a44f", "746b3537", "74dd1130", "7b7f7511", "7e0986d6", "7f4411dc", "868de0fa", "8be77c9e", "8d5021e8", "91714a58", "9172f3a0", "9565186b", "9dfd6313", "a416b8f3", "a5313dff", "a699fb00", "aabf363d", "aedd82e4", "b1948b0a", "b6afb2da", "ba97ae07", "bb43febb", "bda2d7a6", "be94b721", "c0f76784", "c59eb873", "c8f0f002", "c9e6f938", "d10ecb37", "d23f8c26", "d511f180", "d631b094", "d90796e8", "d9fac9be", "de1cd16c", "ded97339", "e26a3af2", "eb5a1d5d", "ed36ccf7", "f76d97a5"]
 
 
experiments/example1_20260428T172250Z_logs.json DELETED
@@ -1 +0,0 @@
1
- [[{"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}], [{"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}], [{"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Rotate_90>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform Reflect_h>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}]]
 
 
experiments/example1_20260428T172250Z_phi_best.npy DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:660ada98c4dfce4cdf016cac4f3432f7e589a0c758e0a74a97f5719f4972caee
3
- size 776
 
 
 
 
experiments/example1_20260428T172250Z_result.json DELETED
@@ -1,23 +0,0 @@
1
- {
2
- "task_name": "example1",
3
- "params": {
4
- "beam_width": 6,
5
- "max_depth": 3,
6
- "lock_coeff": 0.0,
7
- "max_fraction": 1.0,
8
- "enable_layer_minus_one": true,
9
- "boundary_source": "target",
10
- "wandb_project": "itt_solver",
11
- "wandb_anonymous": "allow"
12
- },
13
- "final_sigma": 98.0,
14
- "sigma_trace": [
15
- 98.0,
16
- 98.0,
17
- 98.0,
18
- 98.0
19
- ],
20
- "time_s": 0.008741617202758789,
21
- "transform": "<Transform Id\u2218tile_to_target\u2218tile_to_target\u2218tile_to_target>",
22
- "states_count": 4
23
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
experiments/example1_20260428T172311Z_logs.json DELETED
@@ -1 +0,0 @@
1
- [[{"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}], [{"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform tile_to_target>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}, {"atomic": "<Transform FillEnclosedHarmonic>", "score": 98.0, "residue": 98.0, "energy": 2352.0, "gates": {"A_boundary": true, "B_localization": "True", "C_quantization": "True", "passed": "True"}, "accepted": true, "shape": [9, 9]}]]
 
 
experiments/example1_20260428T172311Z_phi_best.npy DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:660ada98c4dfce4cdf016cac4f3432f7e589a0c758e0a74a97f5719f4972caee
3
- size 776
 
 
 
 
experiments/example1_20260428T172311Z_result.json DELETED
@@ -1,21 +0,0 @@
1
- {
2
- "task_name": "example1",
3
- "params": {
4
- "beam_width": 4,
5
- "max_depth": 2,
6
- "lock_coeff": 0.0,
7
- "max_fraction": 0.5,
8
- "enable_layer_minus_one": true,
9
- "boundary_source": "target",
10
- "use_symmetry": false
11
- },
12
- "final_sigma": 98.0,
13
- "sigma_trace": [
14
- 98.0,
15
- 98.0,
16
- 98.0
17
- ],
18
- "time_s": 0.0020961761474609375,
19
- "transform": "<Transform Id\u2218tile_to_target\u2218tile_to_target>",
20
- "states_count": 3
21
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
experiments/results.csv DELETED
@@ -1,5 +0,0 @@
1
- task_name,params,final_sigma,time_s,transform,sigma_trace
2
- example1,"{""beam_width"": 4, ""max_depth"": 2, ""lock_coeff"": 0.0, ""max_fraction"": 0.5, ""enable_layer_minus_one"": false, ""boundary_source"": ""target"", ""use_symmetry"": true}",98.0,0.003506183624267578,<Transform Id∘tile_to_target∘tile_to_target>,"[98.0, 98.0, 98.0]"
3
- example1,"{""beam_width"": 4, ""max_depth"": 2, ""lock_coeff"": 0.0, ""max_fraction"": 0.5, ""enable_layer_minus_one"": false, ""boundary_source"": ""target"", ""use_symmetry"": false}",98.0,0.0017173290252685547,<Transform Id∘tile_to_target∘tile_to_target>,"[98.0, 98.0, 98.0]"
4
- example1,"{""beam_width"": 4, ""max_depth"": 2, ""lock_coeff"": 0.0, ""max_fraction"": 0.5, ""enable_layer_minus_one"": true, ""boundary_source"": ""target"", ""use_symmetry"": true}",98.0,0.0046575069427490234,<Transform Id∘tile_to_target∘tile_to_target>,"[98.0, 98.0, 98.0]"
5
- example1,"{""beam_width"": 4, ""max_depth"": 2, ""lock_coeff"": 0.0, ""max_fraction"": 0.5, ""enable_layer_minus_one"": true, ""boundary_source"": ""target"", ""use_symmetry"": false}",98.0,0.0020961761474609375,<Transform Id∘tile_to_target∘tile_to_target>,"[98.0, 98.0, 98.0]"
 
 
 
 
 
 
experiments_analysis.py DELETED
@@ -1,154 +0,0 @@
1
- """
2
- Quick diagnostics for itt_solver experiments.
3
-
4
- Usage (from notebook or shell):
5
- python experiments_analysis.py
6
-
7
- It will:
8
- - list recent files in experiments/
9
- - print the latest result.json
10
- - print depth-0 logs (candidates, gates, residues)
11
- - load the latest phi_best and compute L1 vs a provided target (if you set TARGET_GRID below)
12
- - test atomic transforms from default_atomic_factory to see if they change the input
13
- """
14
-
15
- import os
16
- import glob
17
- import json
18
- import numpy as np
19
- from pprint import pprint
20
-
21
- # === Corrected target from real ARC task 007bbfb7 (Kronecker self-similar) ===
22
- TARGET_GRID = [
23
- [0,0,0,0,7,7,0,7,7],
24
- [0,0,0,7,7,7,7,7,7],
25
- [0,0,0,0,7,7,0,7,7],
26
- [0,7,7,0,7,7,0,7,7],
27
- [7,7,7,7,7,7,7,7,7],
28
- [0,7,7,0,7,7,0,7,7],
29
- [0,0,0,0,7,7,0,7,7],
30
- [0,0,0,7,7,7,7,7,7],
31
- [0,0,0,0,7,7,0,7,7],
32
- ]
33
-
34
- EXPERIMENTS_DIR = "experiments"
35
-
36
- def list_recent_files(n=20):
37
- files = sorted(glob.glob(os.path.join(EXPERIMENTS_DIR, "*")))
38
- print(f"Recent files (last {n}):")
39
- for f in files[-n:]:
40
- print(" ", f)
41
- return files
42
-
43
- def load_latest_result():
44
- res_files = sorted(glob.glob(os.path.join(EXPERIMENTS_DIR, "*_result.json")))
45
- if not res_files:
46
- print("No result.json files found in experiments/")
47
- return None, None
48
- latest = res_files[-1]
49
- print("\nLatest result file:", latest)
50
- with open(latest) as fh:
51
- data = json.load(fh)
52
- pprint(data)
53
- return latest, data
54
-
55
- def load_latest_logs():
56
- logs_files = sorted(glob.glob(os.path.join(EXPERIMENTS_DIR, "*_logs.json")))
57
- if not logs_files:
58
- print("No logs.json files found in experiments/")
59
- return None, None
60
- latest = logs_files[-1]
61
- print("\nLatest logs file:", latest)
62
- with open(latest) as fh:
63
- logs = json.load(fh)
64
- if logs and isinstance(logs, list) and len(logs) > 0:
65
- print("\nDepth 0 log entries (summary):")
66
- for i, entry in enumerate(logs[0]):
67
- atomic = entry.get('atomic')
68
- accepted = entry.get('accepted')
69
- residue = entry.get('residue')
70
- energy = entry.get('energy')
71
- gates = entry.get('gates')
72
- print(f"{i}: {atomic} | accepted={accepted} | residue={residue} | energy={energy} | gates={gates}")
73
- else:
74
- print("Logs format unexpected or empty.")
75
- return latest, logs
76
-
77
- def load_latest_phi():
78
- phi_files = sorted(glob.glob(os.path.join(EXPERIMENTS_DIR, "*_phi_best.npy")))
79
- if not phi_files:
80
- print("No phi_best.npy files found in experiments/")
81
- return None, None
82
- latest = phi_files[-1]
83
- print("\nLatest phi_best file:", latest)
84
- phi = np.load(latest)
85
- print("phi_best shape:", phi.shape, "unique values:", np.unique(phi))
86
- return latest, phi
87
-
88
- def l1_residue_check(phi, target_grid):
89
- if phi is None:
90
- print("No phi provided for residue check.")
91
- return
92
- target = np.array(target_grid, dtype=phi.dtype)
93
- if phi.shape != target.shape:
94
- print("phi and target shapes differ:", phi.shape, target.shape)
95
- try:
96
- from itt_solver.solver_core import tile_transform
97
- target_resized = tile_transform(target, phi.shape)
98
- print("Resized target to phi shape for comparison.")
99
- except Exception:
100
- print("Could not resize target automatically.")
101
- return
102
- else:
103
- target_resized = target
104
- l1 = float(np.sum(np.abs(phi - target_resized)))
105
- print("L1 residue between phi_best and target:", l1)
106
- return l1
107
-
108
- def test_atomic_effects():
109
- print("\nTesting atomic transforms from default_atomic_factory...")
110
- try:
111
- from itt_solver.experiment_driver import default_atomic_factory
112
- from itt_solver.solver_core import initialize_potential, tile_transform
113
- except Exception as e:
114
- print("Could not import default_atomic_factory or solver_core:", e)
115
- return
116
- params = {'beam_width':6,'max_depth':3,'lock_coeff':0.0,'max_fraction':1.0,'enable_layer_minus_one':True,'boundary_source':'target'}
117
- task_stub = {'target_shape': (9,9)}
118
- atomic_library = default_atomic_factory(params, task_stub)
119
- phi_in = initialize_potential([[0,7,7],[7,7,7],[0,7,7]])
120
- print("Input shape:", phi_in.shape, "unique:", np.unique(phi_in))
121
- for T in atomic_library:
122
- try:
123
- out = T.apply(phi_in.copy())
124
- except Exception as e:
125
- print(repr(T), "apply() raised:", e)
126
- continue
127
- out_resized = out
128
- if out.shape != phi_in.shape:
129
- try:
130
- out_resized = tile_transform(out, phi_in.shape)
131
- except Exception:
132
- try:
133
- out_resized = np.broadcast_to(out, phi_in.shape)
134
- except Exception:
135
- out_resized = None
136
- if out_resized is None:
137
- changed = None
138
- else:
139
- changed = int(np.sum(out_resized != phi_in))
140
- print(repr(T), "-> out shape", out.shape, "changed cells (compared to input):", changed)
141
-
142
- def main():
143
- print("=== experiments_analysis.py diagnostics ===")
144
- list_recent_files()
145
- load_latest_result()
146
- load_latest_logs()
147
- _, phi = load_latest_phi()
148
- if phi is not None:
149
- l1_residue_check(phi, TARGET_GRID)
150
- test_atomic_effects()
151
- print("\nDone.")
152
-
153
- if __name__ == "__main__":
154
- main()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
notebooks/pemf_llm_lightning.ipynb DELETED
@@ -1,303 +0,0 @@
1
- {
2
- "cells": [
3
- {
4
- "cell_type": "markdown",
5
- "metadata": {},
6
- "source": [
7
- "# PEMF ARC-AGI — LLM Solver (Lightning.ai / Multi-GPU)\n",
8
- "\n",
9
- "Runs Ollama with auto multi-GPU sharding for local inference.\n",
10
- "\n",
11
- "| GPU Config | Model | VRAM | Quality |\n",
12
- "|---|---|---|---|\n",
13
- "| 2xA10G (48GB) | qwen2.5-coder:32b | ~20GB q4 | Best |\n",
14
- "| 2xL4 (48GB) | qwen2.5-coder:32b | ~20GB q4 | Best |\n",
15
- "| 2xT4 (32GB) | qwen2.5-coder:14b | ~10GB q4 | Good |\n",
16
- "| 1xA10G (24GB) | qwen2.5-coder:14b | ~10GB | Good |\n",
17
- "| 4xA10G (96GB) | qwen2.5-coder:32b fp16 | ~65GB | Best+fast |"
18
- ]
19
- },
20
- {
21
- "cell_type": "code",
22
- "execution_count": null,
23
- "metadata": {},
24
- "outputs": [],
25
- "source": [
26
- "# ============ CONFIGURATION ============\n",
27
- "MODEL = 'qwen2.5-coder:32b'\n",
28
- "# MODEL = 'qwen2.5-coder:14b' # fallback for less VRAM\n",
29
- "N_CANDIDATES = 8"
30
- ]
31
- },
32
- {
33
- "cell_type": "code",
34
- "execution_count": null,
35
- "metadata": {},
36
- "outputs": [],
37
- "source": [
38
- "import subprocess, os, time, json, re, glob\n",
39
- "import numpy as np, urllib.request\n",
40
- "from collections import Counter\n",
41
- "\n",
42
- "# Check GPUs\n",
43
- "!nvidia-smi --query-gpu=index,name,memory.total --format=csv,noheader\n",
44
- "gpu_count = len(subprocess.run(['nvidia-smi','-L'], capture_output=True, text=True).stdout.strip().split('\\n'))\n",
45
- "print(f'GPUs: {gpu_count}')"
46
- ]
47
- },
48
- {
49
- "cell_type": "code",
50
- "execution_count": null,
51
- "metadata": {},
52
- "outputs": [],
53
- "source": [
54
- "# Install Ollama\n",
55
- "try:\n",
56
- " subprocess.run(['ollama','--version'], capture_output=True, check=True)\n",
57
- " print('Ollama installed')\n",
58
- "except: \n",
59
- " !curl -fsSL https://ollama.com/install.sh | sh\n",
60
- "\n",
61
- "# Start server (auto-detects all GPUs)\n",
62
- "subprocess.run(['pkill','-f','ollama'], capture_output=True)\n",
63
- "time.sleep(2)\n",
64
- "env = os.environ.copy()\n",
65
- "env['CUDA_VISIBLE_DEVICES'] = ','.join(str(i) for i in range(gpu_count))\n",
66
- "server = subprocess.Popen(['ollama','serve'],\n",
67
- " stdout=open('/tmp/ollama.log','w'), stderr=subprocess.STDOUT, env=env)\n",
68
- "time.sleep(5)\n",
69
- "print(f'Server PID {server.pid}, GPUs: {env[\"CUDA_VISIBLE_DEVICES\"]}')\n",
70
- "\n",
71
- "# Pull model\n",
72
- "print(f'Pulling {MODEL}...')\n",
73
- "r = subprocess.run(['ollama','pull',MODEL], capture_output=True, text=True, timeout=3600)\n",
74
- "if r.returncode != 0:\n",
75
- " print(f'Failed, trying 14b...'); MODEL='qwen2.5-coder:14b'\n",
76
- " subprocess.run(['ollama','pull',MODEL], capture_output=True, text=True, timeout=3600)\n",
77
- "print(f'{MODEL} ready')\n",
78
- "\n",
79
- "# Test\n",
80
- "r = subprocess.run(['ollama','run',MODEL,'Say hello'], capture_output=True, text=True, timeout=60)\n",
81
- "print(f'Test: {r.stdout.strip()[:80]}')\n",
82
- "!nvidia-smi --query-gpu=index,memory.used,memory.total --format=csv,noheader"
83
- ]
84
- },
85
- {
86
- "cell_type": "code",
87
- "execution_count": null,
88
- "metadata": {},
89
- "outputs": [],
90
- "source": [
91
- "# Download ARC data\n",
92
- "if not os.path.exists('arc_data/training'):\n",
93
- " !git clone --depth 1 https://github.com/fchollet/ARC-AGI.git /tmp/arc\n",
94
- " os.makedirs('arc_data', exist_ok=True)\n",
95
- " !cp -r /tmp/arc/data/training arc_data/training\n",
96
- "print(f'Tasks: {len(glob.glob(\"arc_data/training/*.json\"))}')\n",
97
- "\n",
98
- "ALREADY_SOLVED = {\n",
99
- " '007bbfb7','00d62c1b','0d3d703e','1190e5a7','1cf80156','1e0a9b12','1f85a75f',\n",
100
- " '2013d3e2','22168020','22eb0ac0','239be575','23b5c85d','28bf18c6','2dee498d',\n",
101
- " '3618c87e','3906de3d','3aa6fb7a','3af2c5a8','3c9b0459','42a50994','4347f46a',\n",
102
- " '50cb2852','6150a2bd','62c24649','67385a82','67a3c6ac','67e8384a','68b16354',\n",
103
- " '6d0aefbc','6f8cd79b','6fa7a44f','746b3537','74dd1130','7b7f7511','7e0986d6',\n",
104
- " '7f4411dc','868de0fa','8be77c9e','8d5021e8','91714a58','9172f3a0','9565186b',\n",
105
- " '9dfd6313','a416b8f3','a5313dff','a699fb00','aabf363d','aedd82e4','b1948b0a',\n",
106
- " 'b6afb2da','ba97ae07','bb43febb','bda2d7a6','be94b721','c0f76784','c59eb873',\n",
107
- " 'c8f0f002','c9e6f938','d10ecb37','d23f8c26','d511f180','d631b094','d90796e8',\n",
108
- " 'd9fac9be','de1cd16c','ded97339','e26a3af2','eb5a1d5d','ed36ccf7','f76d97a5',\n",
109
- "}\n",
110
- "task_files = sorted(glob.glob('arc_data/training/*.json'))\n",
111
- "unsolved = [(os.path.basename(f).replace('.json',''),f) for f in task_files\n",
112
- " if os.path.basename(f).replace('.json','') not in ALREADY_SOLVED]\n",
113
- "print(f'Symbolic: {len(ALREADY_SOLVED)}, LLM to try: {len(unsolved)}')"
114
- ]
115
- },
116
- {
117
- "cell_type": "code",
118
- "execution_count": null,
119
- "metadata": {},
120
- "outputs": [],
121
- "source": [
122
- "# LLM Engine\n",
123
- "def call_ollama(prompt, model, temperature=0.7):\n",
124
- " payload = {'model':model,'prompt':prompt,'stream':False,\n",
125
- " 'options':{'temperature':temperature,'num_predict':2048}}\n",
126
- " req = urllib.request.Request('http://localhost:11434/api/generate',\n",
127
- " data=json.dumps(payload).encode(), headers={'Content-Type':'application/json'}, method='POST')\n",
128
- " try:\n",
129
- " with urllib.request.urlopen(req, timeout=180) as resp:\n",
130
- " return json.loads(resp.read().decode()).get('response','')\n",
131
- " except Exception as e: return f'ERROR: {e}'\n",
132
- "\n",
133
- "def build_prompt(task):\n",
134
- " pairs = task.get('train',[])\n",
135
- " ex = '\\n'.join(f\"Example {i+1}:\\n Input: {json.dumps(p['input'])}\\n Output: {json.dumps(p['output'])}\"\n",
136
- " for i,p in enumerate(pairs))\n",
137
- " inps = [np.array(p['input']) for p in pairs]\n",
138
- " outs = [np.array(p['output']) for p in pairs]\n",
139
- " same = all(i.shape==o.shape for i,o in zip(inps,outs))\n",
140
- " ic = sorted(set(c for i in inps for c in np.unique(i).tolist()))\n",
141
- " oc = sorted(set(c for o in outs for c in np.unique(o).tolist()))\n",
142
- " a = f\" Same shape: {same}\\n Colors in: {ic}, out: {oc}\\n\"\n",
143
- " if not same: a += f\" Shape: {inps[0].shape} -> {outs[0].shape}\\n\"\n",
144
- " return f\"\"\"Solve this ARC-AGI puzzle. Write ONLY a Python function, no explanations.\n",
145
- "\n",
146
- "{ex}\n",
147
- "\n",
148
- "Analysis:\n",
149
- "{a}\n",
150
- "```python\n",
151
- "import numpy as np\n",
152
- "from collections import Counter, deque\n",
153
- "\n",
154
- "def transform(grid: list[list[int]]) -> list[list[int]]:\n",
155
- " grid = np.array(grid)\n",
156
- "\"\"\"\n",
157
- "\n",
158
- "def extract_code(resp):\n",
159
- " for pat in [r'```python\\s*(.*?)```', r'```\\s*(.*?)```']:\n",
160
- " for m in re.findall(pat, resp, re.DOTALL):\n",
161
- " if 'def transform' in m: return m.strip()\n",
162
- " idx = resp.find('def transform')\n",
163
- " if idx >= 0:\n",
164
- " before = resp[:idx]\n",
165
- " s = max(before.rfind('import '), before.rfind('from '))\n",
166
- " code = resp[s if s>=0 else idx:]\n",
167
- " end = code.find('```')\n",
168
- " if end>0: code=code[:end]\n",
169
- " return code.strip()\n",
170
- " s = resp.strip()\n",
171
- " if s.startswith(('import','def transform','from')): return s\n",
172
- " return None\n",
173
- "\n",
174
- "def verify(code, pairs):\n",
175
- " ns = {'np':np,'numpy':np,'Counter':Counter,'deque':__import__('collections').deque}\n",
176
- " try:\n",
177
- " import scipy.ndimage; ns['scipy']=__import__('scipy')\n",
178
- " except: pass\n",
179
- " try: exec(code, ns)\n",
180
- " except: return False\n",
181
- " if 'transform' not in ns: return False\n",
182
- " fn = ns['transform']\n",
183
- " for p in pairs:\n",
184
- " try:\n",
185
- " r = np.array(fn([row[:] for row in p['input']]), dtype=int)\n",
186
- " e = np.array(p['output'], dtype=int)\n",
187
- " if r.shape!=e.shape or not np.array_equal(r,e): return False\n",
188
- " except: return False\n",
189
- " return True\n",
190
- "\n",
191
- "def apply_prog(code, inp):\n",
192
- " ns = {'np':np,'numpy':np,'Counter':Counter,'deque':__import__('collections').deque}\n",
193
- " try:\n",
194
- " import scipy.ndimage; ns['scipy']=__import__('scipy')\n",
195
- " except: pass\n",
196
- " try:\n",
197
- " exec(code, ns)\n",
198
- " r = ns['transform']([row[:] for row in inp])\n",
199
- " if r is not None: return np.array(r,dtype=int).tolist()\n",
200
- " except: pass\n",
201
- " return None\n",
202
- "\n",
203
- "print('Engine ready')"
204
- ]
205
- },
206
- {
207
- "cell_type": "code",
208
- "execution_count": null,
209
- "metadata": {},
210
- "outputs": [],
211
- "source": [
212
- "# Quick test\n",
213
- "with open(f'arc_data/training/{unsolved[0][0]}.json') as f: t=json.load(f)\n",
214
- "print(f'Test on {unsolved[0][0]}...')\n",
215
- "s=time.time(); r=call_ollama(build_prompt(t),MODEL,0.1); e=time.time()-s\n",
216
- "code=extract_code(r)\n",
217
- "if code: print(f'{e:.1f}s, {len(code)}ch, verified: {\"Y\" if verify(code,t[\"train\"]) else \"N\"}')\n",
218
- "else: print(f'{e:.1f}s, no code')\n",
219
- "est = e*N_CANDIDATES*len(unsolved)/3600\n",
220
- "print(f'Est total: {est:.1f}h for {len(unsolved)} tasks x {N_CANDIDATES} candidates')"
221
- ]
222
- },
223
- {
224
- "cell_type": "code",
225
- "execution_count": null,
226
- "metadata": {},
227
- "outputs": [],
228
- "source": [
229
- "# === MAIN LOOP (crash-safe, resumable) ===\n",
230
- "results = {}\n",
231
- "solved = 0\n",
232
- "total_time = 0\n",
233
- "\n",
234
- "if os.path.exists('llm_results.json'):\n",
235
- " with open('llm_results.json') as f: prev=json.load(f)\n",
236
- " results=prev.get('results',{})\n",
237
- " solved=sum(1 for r in results.values() if r['status']=='solved')\n",
238
- " total_time=prev.get('total_time_s',0)\n",
239
- " print(f'Resuming: {solved} LLM-solved, {len(results)} attempted')\n",
240
- "\n",
241
- "for idx,(tid,tf) in enumerate(unsolved):\n",
242
- " if tid in results: continue\n",
243
- " with open(tf) as f: task=json.load(f)\n",
244
- " print(f'[{idx+1:3d}/{len(unsolved)}] {tid}:',end=' ',flush=True)\n",
245
- " s=time.time(); prompt=build_prompt(task); ok=False\n",
246
- " for i in range(N_CANDIDATES):\n",
247
- " temp=0.1 if i==0 else min(0.4+0.15*i,1.2)\n",
248
- " resp=call_ollama(prompt,MODEL,temp)\n",
249
- " if resp.startswith('ERROR:'): continue\n",
250
- " code=extract_code(resp)\n",
251
- " if code and verify(code,task['train']):\n",
252
- " e=time.time()-s; total_time+=e; solved+=1\n",
253
- " to=[apply_prog(code,t['input']) for t in task.get('test',[])]\n",
254
- " results[tid]={'status':'solved','rule':f'llm_c{i+1}','code':code,\n",
255
- " 'test_outputs':to,'time_s':round(e,2)}\n",
256
- " print(f'✅ c{i+1} ({e:.1f}s) [{len(ALREADY_SOLVED)+solved}/{len(task_files)}]')\n",
257
- " ok=True; break\n",
258
- " if not ok:\n",
259
- " e=time.time()-s; total_time+=e\n",
260
- " results[tid]={'status':'failed','time_s':round(e,2)}\n",
261
- " print(f'❌ ({e:.1f}s)')\n",
262
- " if (idx+1)%5==0 or ok:\n",
263
- " with open('llm_results.json','w') as f:\n",
264
- " json.dump({'model':MODEL,'n_candidates':N_CANDIDATES,'llm_solved':solved,\n",
265
- " 'attempted':len(results),'symbolic_solved':len(ALREADY_SOLVED),\n",
266
- " 'total_solved':len(ALREADY_SOLVED)+solved,'total_tasks':len(task_files),\n",
267
- " 'solve_rate':round(100*(len(ALREADY_SOLVED)+solved)/len(task_files),2),\n",
268
- " 'total_time_s':round(total_time,1),'results':results},f,indent=2)"
269
- ]
270
- },
271
- {
272
- "cell_type": "code",
273
- "execution_count": null,
274
- "metadata": {},
275
- "outputs": [],
276
- "source": [
277
- "# Final save + summary\n",
278
- "with open('llm_results.json','w') as f:\n",
279
- " json.dump({'model':MODEL,'n_candidates':N_CANDIDATES,'llm_solved':solved,\n",
280
- " 'attempted':len(results),'symbolic_solved':len(ALREADY_SOLVED),\n",
281
- " 'total_solved':len(ALREADY_SOLVED)+solved,'total_tasks':len(task_files),\n",
282
- " 'solve_rate':round(100*(len(ALREADY_SOLVED)+solved)/len(task_files),2),\n",
283
- " 'total_time_s':round(total_time,1),'results':results},f,indent=2)\n",
284
- "\n",
285
- "print(f'\\n{\"=\"*60}')\n",
286
- "print(f'LLM solved: {solved}')\n",
287
- "print(f'Symbolic: {len(ALREADY_SOLVED)}')\n",
288
- "print(f'TOTAL: {len(ALREADY_SOLVED)+solved}/{len(task_files)} ({100*(len(ALREADY_SOLVED)+solved)/len(task_files):.1f}%)')\n",
289
- "print(f'Time: {total_time/3600:.1f}h')\n",
290
- "print(f'\\nDownload llm_results.json, then run:')\n",
291
- "print(f' python scripts/merge_results.py arc_results/summary_v4.json llm_results.json')\n",
292
- "\n",
293
- "subprocess.run(['pkill','-f','ollama'], capture_output=True)"
294
- ]
295
- }
296
- ],
297
- "metadata": {
298
- "kernelspec": {"display_name":"Python 3","language":"python","name":"python3"},
299
- "language_info": {"name":"python","version":"3.10.0"}
300
- },
301
- "nbformat": 4,
302
- "nbformat_minor": 4
303
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
notebooks/pemf_llm_solver.ipynb DELETED
@@ -1,490 +0,0 @@
1
- {
2
- "cells": [
3
- {
4
- "cell_type": "markdown",
5
- "metadata": {},
6
- "source": [
7
- "# PEMF ARC-AGI — LLM Program Synthesis\n",
8
- "\n",
9
- "Uses NVIDIA NIM (free) with GLM 4.7 / DeepSeek V4 to solve ARC tasks.\n",
10
- "\n",
11
- "**Pipeline:** For each unsolved task → build prompt → LLM generates Python `transform()` → verify against ALL training pairs → apply to test.\n",
12
- "\n",
13
- "**Prerequisites:**\n",
14
- "- NVIDIA NIM API key from https://build.nvidia.com/settings/api-keys\n",
15
- "- Internet access enabled"
16
- ]
17
- },
18
- {
19
- "cell_type": "markdown",
20
- "metadata": {},
21
- "source": [
22
- "## 1. Setup"
23
- ]
24
- },
25
- {
26
- "cell_type": "code",
27
- "execution_count": null,
28
- "metadata": {},
29
- "outputs": [],
30
- "source": [
31
- "# ============================================================\n",
32
- "# CONFIGURATION — EDIT THESE\n",
33
- "# ============================================================\n",
34
- "\n",
35
- "NVIDIA_API_KEY = \"nvapi-YOUR-KEY-HERE\" # Get from https://build.nvidia.com/settings/api-keys\n",
36
- "\n",
37
- "MODEL = \"z-ai/glm4.7\" # Default: GLM 4.7\n",
38
- "# MODEL = \"deepseek-ai/deepseek-v4-pro\" # Alternative: DeepSeek V4\n",
39
- "\n",
40
- "N_CANDIDATES = 8 # Candidates per task (more = better but slower)\n",
41
- "RATE_LIMIT_SLEEP = 2 # Seconds between API calls"
42
- ]
43
- },
44
- {
45
- "cell_type": "code",
46
- "execution_count": null,
47
- "metadata": {},
48
- "outputs": [],
49
- "source": [
50
- "# Download ARC dataset\n",
51
- "import os, subprocess\n",
52
- "\n",
53
- "if not os.path.exists('arc_data/training'):\n",
54
- " print('Downloading ARC dataset...')\n",
55
- " subprocess.run(['git', 'clone', '--depth', '1', 'https://github.com/fchollet/ARC-AGI.git', '/tmp/arc'], \n",
56
- " capture_output=True)\n",
57
- " os.makedirs('arc_data', exist_ok=True)\n",
58
- " subprocess.run(['cp', '-r', '/tmp/arc/data/training', 'arc_data/training'], capture_output=True)\n",
59
- " print(f'Downloaded {len(os.listdir(\"arc_data/training\"))} tasks')\n",
60
- "else:\n",
61
- " print(f'ARC data already present: {len(os.listdir(\"arc_data/training\"))} tasks')"
62
- ]
63
- },
64
- {
65
- "cell_type": "code",
66
- "execution_count": null,
67
- "metadata": {},
68
- "outputs": [],
69
- "source": [
70
- "# Already solved by symbolic pipeline (70 tasks)\n",
71
- "ALREADY_SOLVED = {\n",
72
- " \"007bbfb7\",\"00d62c1b\",\"0d3d703e\",\"1190e5a7\",\"1cf80156\",\"1e0a9b12\",\"1f85a75f\",\n",
73
- " \"2013d3e2\",\"22168020\",\"22eb0ac0\",\"239be575\",\"23b5c85d\",\"28bf18c6\",\"2dee498d\",\n",
74
- " \"3618c87e\",\"3906de3d\",\"3aa6fb7a\",\"3af2c5a8\",\"3c9b0459\",\"42a50994\",\"4347f46a\",\n",
75
- " \"50cb2852\",\"6150a2bd\",\"62c24649\",\"67385a82\",\"67a3c6ac\",\"67e8384a\",\"68b16354\",\n",
76
- " \"6d0aefbc\",\"6f8cd79b\",\"6fa7a44f\",\"746b3537\",\"74dd1130\",\"7b7f7511\",\"7e0986d6\",\n",
77
- " \"7f4411dc\",\"868de0fa\",\"8be77c9e\",\"8d5021e8\",\"91714a58\",\"9172f3a0\",\"9565186b\",\n",
78
- " \"9dfd6313\",\"a416b8f3\",\"a5313dff\",\"a699fb00\",\"aabf363d\",\"aedd82e4\",\"b1948b0a\",\n",
79
- " \"b6afb2da\",\"ba97ae07\",\"bb43febb\",\"bda2d7a6\",\"be94b721\",\"c0f76784\",\"c59eb873\",\n",
80
- " \"c8f0f002\",\"c9e6f938\",\"d10ecb37\",\"d23f8c26\",\"d511f180\",\"d631b094\",\"d90796e8\",\n",
81
- " \"d9fac9be\",\"de1cd16c\",\"ded97339\",\"e26a3af2\",\"eb5a1d5d\",\"ed36ccf7\",\"f76d97a5\",\n",
82
- "}\n",
83
- "print(f'Already solved by symbolic pipeline: {len(ALREADY_SOLVED)} tasks')"
84
- ]
85
- },
86
- {
87
- "cell_type": "markdown",
88
- "metadata": {},
89
- "source": [
90
- "## 2. LLM Engine"
91
- ]
92
- },
93
- {
94
- "cell_type": "code",
95
- "execution_count": null,
96
- "metadata": {},
97
- "outputs": [],
98
- "source": [
99
- "import json\n",
100
- "import time\n",
101
- "import re\n",
102
- "import glob\n",
103
- "import numpy as np\n",
104
- "import urllib.request\n",
105
- "from collections import Counter\n",
106
- "\n",
107
- "\n",
108
- "def call_nvidia(prompt, api_key, model=\"z-ai/glm4.7\", temperature=0.7):\n",
109
- " \"\"\"Call NVIDIA NIM API.\"\"\"\n",
110
- " url = \"https://integrate.api.nvidia.com/v1/chat/completions\"\n",
111
- " payload = {\n",
112
- " \"model\": model,\n",
113
- " \"messages\": [{\"role\": \"user\", \"content\": prompt}],\n",
114
- " \"max_tokens\": 2048,\n",
115
- " \"temperature\": temperature,\n",
116
- " }\n",
117
- " data = json.dumps(payload).encode('utf-8')\n",
118
- " req = urllib.request.Request(url, data=data,\n",
119
- " headers={\"Content-Type\": \"application/json\",\n",
120
- " \"Authorization\": f\"Bearer {api_key}\"},\n",
121
- " method='POST')\n",
122
- " try:\n",
123
- " with urllib.request.urlopen(req, timeout=120) as resp:\n",
124
- " result = json.loads(resp.read().decode())\n",
125
- " return result['choices'][0]['message']['content']\n",
126
- " except Exception as e:\n",
127
- " return f\"ERROR: {e}\"\n",
128
- "\n",
129
- "\n",
130
- "def build_prompt(task):\n",
131
- " \"\"\"Build prompt for ARC task.\"\"\"\n",
132
- " train_pairs = task.get('train', [])\n",
133
- " examples = []\n",
134
- " for i, pair in enumerate(train_pairs):\n",
135
- " examples.append(\n",
136
- " f\"Example {i+1}:\\n\"\n",
137
- " f\" Input: {json.dumps(pair['input'])}\\n\"\n",
138
- " f\" Output: {json.dumps(pair['output'])}\"\n",
139
- " )\n",
140
- " examples_str = \"\\n\".join(examples)\n",
141
- "\n",
142
- " inputs = [np.array(p['input']) for p in train_pairs]\n",
143
- " outputs = [np.array(p['output']) for p in train_pairs]\n",
144
- " same_shape = all(i.shape == o.shape for i, o in zip(inputs, outputs))\n",
145
- " in_colors = sorted(set(c for i in inputs for c in np.unique(i).tolist()))\n",
146
- " out_colors = sorted(set(c for o in outputs for c in np.unique(o).tolist()))\n",
147
- "\n",
148
- " analysis = f\" Same input/output shape: {same_shape}\\n\"\n",
149
- " analysis += f\" Input colors: {in_colors}, Output colors: {out_colors}\\n\"\n",
150
- " if not same_shape:\n",
151
- " for i, o in zip(inputs[:1], outputs[:1]):\n",
152
- " analysis += f\" Shape: {i.shape} -> {o.shape}\\n\"\n",
153
- "\n",
154
- " return f\"\"\"Solve this ARC-AGI puzzle. Write ONLY a Python function, no explanations.\n",
155
- "\n",
156
- "{examples_str}\n",
157
- "\n",
158
- "Analysis:\n",
159
- "{analysis}\n",
160
- "```python\n",
161
- "import numpy as np\n",
162
- "from collections import Counter, deque\n",
163
- "\n",
164
- "def transform(grid: list[list[int]]) -> list[list[int]]:\n",
165
- " grid = np.array(grid)\n",
166
- "\"\"\"\n",
167
- "\n",
168
- "\n",
169
- "def extract_code(response):\n",
170
- " \"\"\"Extract Python function from LLM response.\"\"\"\n",
171
- " for pattern in [r'```python\\s*(.*?)```', r'```\\s*(.*?)```']:\n",
172
- " matches = re.findall(pattern, response, re.DOTALL)\n",
173
- " for match in matches:\n",
174
- " if 'def transform' in match:\n",
175
- " return match.strip()\n",
176
- " idx = response.find('def transform')\n",
177
- " if idx >= 0:\n",
178
- " before = response[:idx]\n",
179
- " import_start = max(before.rfind('import '), before.rfind('from '))\n",
180
- " start = import_start if import_start >= 0 else idx\n",
181
- " code = response[start:]\n",
182
- " end = code.find('```')\n",
183
- " if end > 0:\n",
184
- " code = code[:end]\n",
185
- " return code.strip()\n",
186
- " stripped = response.strip()\n",
187
- " if stripped.startswith(('import', 'def transform', 'from')):\n",
188
- " return stripped\n",
189
- " return None\n",
190
- "\n",
191
- "\n",
192
- "def verify_program(code, train_pairs):\n",
193
- " \"\"\"Execute program and verify against all training pairs.\"\"\"\n",
194
- " namespace = {'np': np, 'numpy': np, 'Counter': Counter,\n",
195
- " 'deque': __import__('collections').deque}\n",
196
- " try:\n",
197
- " import scipy.ndimage\n",
198
- " namespace['scipy'] = __import__('scipy')\n",
199
- " except ImportError:\n",
200
- " pass\n",
201
- " try:\n",
202
- " exec(code, namespace)\n",
203
- " except Exception:\n",
204
- " return False\n",
205
- " if 'transform' not in namespace:\n",
206
- " return False\n",
207
- " fn = namespace['transform']\n",
208
- " for pair in train_pairs:\n",
209
- " try:\n",
210
- " result = fn([row[:] for row in pair['input']])\n",
211
- " if result is None:\n",
212
- " return False\n",
213
- " r = np.array(result, dtype=int)\n",
214
- " e = np.array(pair['output'], dtype=int)\n",
215
- " if r.shape != e.shape or not np.array_equal(r, e):\n",
216
- " return False\n",
217
- " except Exception:\n",
218
- " return False\n",
219
- " return True\n",
220
- "\n",
221
- "\n",
222
- "def apply_program(code, test_input):\n",
223
- " \"\"\"Apply verified program to test input.\"\"\"\n",
224
- " namespace = {'np': np, 'numpy': np, 'Counter': Counter,\n",
225
- " 'deque': __import__('collections').deque}\n",
226
- " try:\n",
227
- " import scipy.ndimage\n",
228
- " namespace['scipy'] = __import__('scipy')\n",
229
- " except ImportError:\n",
230
- " pass\n",
231
- " try:\n",
232
- " exec(code, namespace)\n",
233
- " result = namespace['transform']([row[:] for row in test_input])\n",
234
- " if result is not None:\n",
235
- " return np.array(result, dtype=int).tolist()\n",
236
- " except Exception:\n",
237
- " pass\n",
238
- " return None\n",
239
- "\n",
240
- "\n",
241
- "print('LLM engine ready.')"
242
- ]
243
- },
244
- {
245
- "cell_type": "markdown",
246
- "metadata": {},
247
- "source": [
248
- "## 3. Quick Test (1 task)"
249
- ]
250
- },
251
- {
252
- "cell_type": "code",
253
- "execution_count": null,
254
- "metadata": {},
255
- "outputs": [],
256
- "source": [
257
- "# Quick test — verify API works before running all 330 tasks\n",
258
- "test_tid = '0520fde7'\n",
259
- "with open(f'arc_data/training/{test_tid}.json') as f:\n",
260
- " test_task = json.load(f)\n",
261
- "\n",
262
- "print(f'Testing on {test_tid}...')\n",
263
- "for i, p in enumerate(test_task['train']):\n",
264
- " inp = np.array(p['input']); out = np.array(p['output'])\n",
265
- " print(f' Pair {i}: {inp.shape} -> {out.shape}')\n",
266
- "\n",
267
- "prompt = build_prompt(test_task)\n",
268
- "print(f'Prompt: {len(prompt)} chars')\n",
269
- "\n",
270
- "response = call_nvidia(prompt, NVIDIA_API_KEY, MODEL, temperature=0.1)\n",
271
- "if response.startswith('ERROR:'):\n",
272
- " print(f'\\n❌ API Error: {response}')\n",
273
- " print('Check your NVIDIA_API_KEY and MODEL settings above.')\n",
274
- "else:\n",
275
- " code = extract_code(response)\n",
276
- " if code:\n",
277
- " ok = verify_program(code, test_task['train'])\n",
278
- " print(f'\\nCode extracted: {len(code)} chars')\n",
279
- " print(f'Verified: {\"✅\" if ok else \"❌\"}')\n",
280
- " if ok:\n",
281
- " print('API working and generating correct code!')\n",
282
- " else:\n",
283
- " print('API working but code failed verification (normal — will try more candidates in full run)')\n",
284
- " else:\n",
285
- " print(f'\\nNo code extracted from response ({len(response)} chars)')\n",
286
- " print('API working but response format unexpected. Will retry with different temperatures in full run.')"
287
- ]
288
- },
289
- {
290
- "cell_type": "markdown",
291
- "metadata": {},
292
- "source": [
293
- "## 4. Run on All Unsolved Tasks"
294
- ]
295
- },
296
- {
297
- "cell_type": "code",
298
- "execution_count": null,
299
- "metadata": {},
300
- "outputs": [],
301
- "source": [
302
- "# Load all unsolved tasks\n",
303
- "task_files = sorted(glob.glob('arc_data/training/*.json'))\n",
304
- "unsolved = []\n",
305
- "for tf in task_files:\n",
306
- " tid = os.path.basename(tf).replace('.json', '')\n",
307
- " if tid not in ALREADY_SOLVED:\n",
308
- " unsolved.append((tid, tf))\n",
309
- "\n",
310
- "print(f'Total tasks: {len(task_files)}')\n",
311
- "print(f'Already solved (symbolic): {len(ALREADY_SOLVED)}')\n",
312
- "print(f'To attempt with LLM: {len(unsolved)}')\n",
313
- "print(f'Model: {MODEL}')\n",
314
- "print(f'Candidates per task: {N_CANDIDATES}')\n",
315
- "print(f'\\nStarting...')"
316
- ]
317
- },
318
- {
319
- "cell_type": "code",
320
- "execution_count": null,
321
- "metadata": {},
322
- "outputs": [],
323
- "source": [
324
- "# Main loop\n",
325
- "results = {}\n",
326
- "solved = 0\n",
327
- "total_time = 0\n",
328
- "\n",
329
- "# Resume from previous run if exists\n",
330
- "if os.path.exists('llm_results.json'):\n",
331
- " with open('llm_results.json') as f:\n",
332
- " prev = json.load(f)\n",
333
- " results = prev.get('results', {})\n",
334
- " solved = sum(1 for r in results.values() if r['status'] == 'solved')\n",
335
- " print(f'Resuming from previous run: {solved} already solved by LLM')\n",
336
- "\n",
337
- "for idx, (tid, tf) in enumerate(unsolved):\n",
338
- " # Skip if already attempted\n",
339
- " if tid in results:\n",
340
- " continue\n",
341
- " \n",
342
- " with open(tf) as f:\n",
343
- " task = json.load(f)\n",
344
- " \n",
345
- " print(f'[{idx+1:3d}/{len(unsolved)}] {tid}:', end=' ', flush=True)\n",
346
- " start = time.time()\n",
347
- " \n",
348
- " prompt = build_prompt(task)\n",
349
- " task_solved = False\n",
350
- " \n",
351
- " for i in range(N_CANDIDATES):\n",
352
- " temp = 0.1 if i == 0 else min(0.4 + 0.15 * i, 1.2)\n",
353
- " response = call_nvidia(prompt, NVIDIA_API_KEY, MODEL, temp)\n",
354
- " \n",
355
- " if response.startswith('ERROR:'):\n",
356
- " if '429' in response or 'rate' in response.lower():\n",
357
- " time.sleep(10) # Rate limit — wait longer\n",
358
- " continue\n",
359
- " \n",
360
- " code = extract_code(response)\n",
361
- " if code is None:\n",
362
- " continue\n",
363
- " \n",
364
- " if verify_program(code, task['train']):\n",
365
- " elapsed = time.time() - start\n",
366
- " total_time += elapsed\n",
367
- " solved += 1\n",
368
- " \n",
369
- " test_outputs = [apply_program(code, t['input']) for t in task.get('test', [])]\n",
370
- " results[tid] = {\n",
371
- " 'status': 'solved', 'rule': f'llm_c{i+1}_t{temp:.1f}',\n",
372
- " 'code': code, 'test_outputs': test_outputs,\n",
373
- " 'time_s': round(elapsed, 2),\n",
374
- " }\n",
375
- " print(f'✅ c{i+1} ({elapsed:.1f}s) [total: {len(ALREADY_SOLVED)+solved}/{len(task_files)}]')\n",
376
- " task_solved = True\n",
377
- " break\n",
378
- " \n",
379
- " time.sleep(RATE_LIMIT_SLEEP)\n",
380
- " \n",
381
- " if not task_solved:\n",
382
- " elapsed = time.time() - start\n",
383
- " total_time += elapsed\n",
384
- " results[tid] = {'status': 'failed', 'time_s': round(elapsed, 2)}\n",
385
- " print(f'❌ ({elapsed:.1f}s)')\n",
386
- " \n",
387
- " # Save progress every 10 tasks\n",
388
- " if (idx + 1) % 10 == 0:\n",
389
- " with open('llm_results.json', 'w') as f:\n",
390
- " json.dump({\n",
391
- " 'model': MODEL, 'n_candidates': N_CANDIDATES,\n",
392
- " 'llm_solved': solved, 'attempted': sum(1 for r in results.values()),\n",
393
- " 'symbolic_solved': len(ALREADY_SOLVED),\n",
394
- " 'total_solved': len(ALREADY_SOLVED) + solved,\n",
395
- " 'total_tasks': len(task_files),\n",
396
- " 'solve_rate': round(100 * (len(ALREADY_SOLVED) + solved) / len(task_files), 2),\n",
397
- " 'total_time_s': round(total_time, 1),\n",
398
- " 'results': results,\n",
399
- " }, f, indent=2)\n",
400
- " print(f' [Saved: {len(ALREADY_SOLVED)+solved}/{len(task_files)} total]')"
401
- ]
402
- },
403
- {
404
- "cell_type": "code",
405
- "execution_count": null,
406
- "metadata": {},
407
- "outputs": [],
408
- "source": [
409
- "# Final save\n",
410
- "with open('llm_results.json', 'w') as f:\n",
411
- " json.dump({\n",
412
- " 'model': MODEL, 'n_candidates': N_CANDIDATES,\n",
413
- " 'llm_solved': solved, 'attempted': sum(1 for r in results.values()),\n",
414
- " 'symbolic_solved': len(ALREADY_SOLVED),\n",
415
- " 'total_solved': len(ALREADY_SOLVED) + solved,\n",
416
- " 'total_tasks': len(task_files),\n",
417
- " 'solve_rate': round(100 * (len(ALREADY_SOLVED) + solved) / len(task_files), 2),\n",
418
- " 'total_time_s': round(total_time, 1),\n",
419
- " 'results': results,\n",
420
- " }, f, indent=2)\n",
421
- "\n",
422
- "print(f'\\n{\"=\"*60}')\n",
423
- "print(f'FINAL RESULTS')\n",
424
- "print(f'{\"=\"*60}')\n",
425
- "print(f'LLM solved: {solved}')\n",
426
- "print(f'Symbolic solved: {len(ALREADY_SOLVED)}')\n",
427
- "print(f'TOTAL SOLVED: {len(ALREADY_SOLVED)+solved}/{len(task_files)} ({100*(len(ALREADY_SOLVED)+solved)/len(task_files):.1f}%)')\n",
428
- "print(f'Time: {total_time:.0f}s')\n",
429
- "print(f'\\nResults saved to: llm_results.json')"
430
- ]
431
- },
432
- {
433
- "cell_type": "markdown",
434
- "metadata": {},
435
- "source": [
436
- "## 5. Results Analysis"
437
- ]
438
- },
439
- {
440
- "cell_type": "code",
441
- "execution_count": null,
442
- "metadata": {},
443
- "outputs": [],
444
- "source": [
445
- "# Load and analyze results\n",
446
- "with open('llm_results.json') as f:\n",
447
- " data = json.load(f)\n",
448
- "\n",
449
- "print(f'Model: {data[\"model\"]}')\n",
450
- "print(f'Candidates per task: {data[\"n_candidates\"]}')\n",
451
- "print(f'\\nSymbolic solved: {data[\"symbolic_solved\"]}')\n",
452
- "print(f'LLM solved: {data[\"llm_solved\"]}')\n",
453
- "print(f'TOTAL: {data[\"total_solved\"]}/{data[\"total_tasks\"]} ({data[\"solve_rate\"]}%)')\n",
454
- "\n",
455
- "llm_solved_tasks = [tid for tid, r in data['results'].items() if r['status'] == 'solved']\n",
456
- "print(f'\\nLLM-solved tasks ({len(llm_solved_tasks)}):')\n",
457
- "for tid in sorted(llm_solved_tasks):\n",
458
- " rule = data['results'][tid].get('rule', '?')\n",
459
- " t = data['results'][tid].get('time_s', 0)\n",
460
- " print(f' {tid}: {rule} ({t}s)')"
461
- ]
462
- },
463
- {
464
- "cell_type": "markdown",
465
- "metadata": {},
466
- "source": [
467
- "## 6. Download Results\n",
468
- "\n",
469
- "Download `llm_results.json` from the notebook output, then merge with symbolic results:\n",
470
- "\n",
471
- "```bash\n",
472
- "python scripts/merge_results.py arc_results/summary_v4.json llm_results.json\n",
473
- "```"
474
- ]
475
- }
476
- ],
477
- "metadata": {
478
- "kernelspec": {
479
- "display_name": "Python 3",
480
- "language": "python",
481
- "name": "python3"
482
- },
483
- "language_info": {
484
- "name": "python",
485
- "version": "3.10.0"
486
- }
487
- },
488
- "nbformat": 4,
489
- "nbformat_minor": 4
490
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
pyproject.toml DELETED
@@ -1,42 +0,0 @@
1
- [project]
2
- name = "pemf-arc-agi"
3
- version = "0.4.0"
4
- description = "Pre-Emergence Mechanics Framework (PEMF) solver for ARC-AGI"
5
- requires-python = ">=3.10"
6
- license = {text = "MIT"}
7
-
8
- dependencies = [
9
- "numpy>=1.24",
10
- "scipy>=1.10",
11
- ]
12
-
13
- [project.optional-dependencies]
14
- viz = [
15
- "matplotlib>=3.7",
16
- ]
17
- wandb = [
18
- "wandb>=0.15",
19
- "matplotlib>=3.7",
20
- ]
21
- llm = [
22
- "huggingface-hub>=0.20",
23
- ]
24
- all = [
25
- "numpy>=1.24",
26
- "scipy>=1.10",
27
- "matplotlib>=3.7",
28
- "wandb>=0.15",
29
- "huggingface-hub>=0.20",
30
- ]
31
-
32
- [build-system]
33
- requires = ["hatchling"]
34
- build-backend = "hatchling.build"
35
-
36
- [tool.hatch.build.targets.wheel]
37
- packages = ["itt_solver"]
38
-
39
- [dependency-groups]
40
- dev = [
41
- "pytest>=7.0",
42
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/entrypoint.py DELETED
@@ -1,84 +0,0 @@
1
- """
2
- Headless entrypoint for running a single experiment or a sweep.
3
-
4
- Usage:
5
- python scripts/entrypoint.py --task example1 --out_dir experiments
6
- python scripts/entrypoint.py --task example1 --out_dir experiments --use_wandb
7
-
8
- By default Weights & Biases logging is disabled. Use --use_wandb to enable it.
9
- """
10
- import argparse
11
- import json
12
- import os
13
- import importlib
14
-
15
- def main():
16
- parser = argparse.ArgumentParser(description="Run ARC-AGI experiment (headless).")
17
- parser.add_argument("--task", type=str, required=True, help="Task name or path to task JSON")
18
- parser.add_argument("--out_dir", type=str, default="experiments", help="Output directory")
19
- parser.add_argument("--use_wandb", action="store_true", help="Enable Weights & Biases logging (default: off)")
20
- parser.add_argument("--params", type=str, default=None, help="Optional JSON string of params")
21
- args = parser.parse_args()
22
-
23
- os.makedirs(args.out_dir, exist_ok=True)
24
-
25
- # lazy imports to avoid heavy startup cost
26
- import itt_solver.experiment_driver as ed
27
- import itt_solver.solver_core as sc
28
-
29
- # load task: if args.task is a JSON file path, load it; otherwise expect a built-in name
30
- if os.path.exists(args.task):
31
- with open(args.task) as fh:
32
- task = json.load(fh)
33
- else:
34
- # minimal built-in example if user passed 'example1'
35
- # Corrected target from real ARC task 007bbfb7 (Kronecker self-similar)
36
- if args.task == "example1":
37
- task = {
38
- 'name': 'example1',
39
- 'input': [[0,7,7],[7,7,7],[0,7,7]],
40
- 'target': [
41
- [0,0,0,0,7,7,0,7,7],
42
- [0,0,0,7,7,7,7,7,7],
43
- [0,0,0,0,7,7,0,7,7],
44
- [0,7,7,0,7,7,0,7,7],
45
- [7,7,7,7,7,7,7,7,7],
46
- [0,7,7,0,7,7,0,7,7],
47
- [0,0,0,0,7,7,0,7,7],
48
- [0,0,0,7,7,7,7,7,7],
49
- [0,0,0,0,7,7,0,7,7],
50
- ],
51
- 'target_shape': (9,9)
52
- }
53
- else:
54
- raise SystemExit(f"Unknown task identifier: {args.task}")
55
-
56
- # parse params if provided
57
- params = {}
58
- if args.params:
59
- try:
60
- params = json.loads(args.params)
61
- except Exception:
62
- print("Warning: could not parse --params JSON; ignoring.")
63
-
64
- # build atomic library using default factory
65
- atomic_library = ed.default_atomic_factory(params, task)
66
-
67
- # run single experiment
68
- result = ed.run_single(task, atomic_library, params, out_dir=args.out_dir)
69
-
70
- # optionally run W&B logging externally (only if requested)
71
- if args.use_wandb:
72
- try:
73
- from itt_solver.wandb_runner import run_and_log_wandb
74
- run_and_log_wandb(task, atomic_library, params, out_dir=args.out_dir,
75
- wandb_project=params.get('wandb_project','itt_solver'),
76
- wandb_entity=None, resume="allow")
77
- except Exception as e:
78
- print("W&B logging failed or not configured:", e)
79
-
80
- print("Run finished. Result summary:")
81
- print(json.dumps(result, indent=2))
82
-
83
- if __name__ == "__main__":
84
- main()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/fix_and_inspect_logs.py DELETED
@@ -1,104 +0,0 @@
1
- import glob, json, numpy as np, os
2
- from pprint import pprint
3
-
4
- def load_latest(pattern):
5
- files = sorted(glob.glob(pattern))
6
- return files[-1] if files else None
7
-
8
- logs_path = load_latest("experiments/*_logs.json")
9
- phi_path = load_latest("experiments/*_phi_best.npy")
10
- res_path = load_latest("experiments/*_result.json")
11
-
12
- print("logs:", logs_path)
13
- print("phi_best:", phi_path)
14
- print("result:", res_path)
15
-
16
- if not logs_path:
17
- raise SystemExit("No logs file found")
18
-
19
- logs = json.load(open(logs_path))
20
- res = json.load(open(res_path)) if res_path else {}
21
-
22
- # coerce gate values to booleans for all depth entries
23
- def coerce_gates(g):
24
- if not isinstance(g, dict):
25
- return g
26
- out = {}
27
- for k,v in g.items():
28
- if isinstance(v, str):
29
- lv = v.strip().lower()
30
- if lv in ("true","1","yes"):
31
- out[k] = True
32
- elif lv in ("false","0","no"):
33
- out[k] = False
34
- else:
35
- try:
36
- out[k] = bool(int(v))
37
- except Exception:
38
- out[k] = v
39
- else:
40
- out[k] = v
41
- return out
42
-
43
- for depth_idx, depth in enumerate(logs):
44
- for entry in depth:
45
- if 'gates' in entry:
46
- entry['gates'] = coerce_gates(entry['gates'])
47
-
48
- # attach phi_best into the first accepted entry (if not present)
49
- accepted_entry = None
50
- for entry in logs[0]:
51
- if entry.get('accepted'):
52
- accepted_entry = entry
53
- break
54
-
55
- phi = np.load(phi_path) if phi_path else None
56
- if accepted_entry is not None:
57
- if 'candidate_array' not in accepted_entry:
58
- accepted_entry['candidate_array'] = phi.tolist() if phi is not None else None
59
-
60
- # Corrected target from real ARC task 007bbfb7 (Kronecker self-similar)
61
- TARGET_GRID = [
62
- [0,0,0,0,7,7,0,7,7],
63
- [0,0,0,7,7,7,7,7,7],
64
- [0,0,0,0,7,7,0,7,7],
65
- [0,7,7,0,7,7,0,7,7],
66
- [7,7,7,7,7,7,7,7,7],
67
- [0,7,7,0,7,7,0,7,7],
68
- [0,0,0,0,7,7,0,7,7],
69
- [0,0,0,7,7,7,7,7,7],
70
- [0,0,0,0,7,7,0,7,7],
71
- ]
72
- TARGET = np.array(TARGET_GRID, dtype=int)
73
-
74
- def tile_transform(phi, out_shape):
75
- a = np.array(phi)
76
- h_out, w_out = out_shape
77
- h_in, w_in = a.shape
78
- reps_h = (h_out + h_in - 1) // h_in
79
- reps_w = (w_out + w_in - 1) // w_in
80
- tiled = np.tile(a, (reps_h, reps_w))
81
- return tiled[:h_out, :w_out]
82
-
83
- if accepted_entry is not None and accepted_entry.get('candidate_array') is not None:
84
- cand = np.array(accepted_entry['candidate_array'], dtype=float)
85
- if cand.shape != TARGET.shape:
86
- cand_resized = tile_transform(cand, TARGET.shape)
87
- else:
88
- cand_resized = cand
89
- cand_q = np.rint(cand_resized).astype(int)
90
- l1 = float(np.sum(np.abs(cand_q - TARGET)))
91
- print("Recomputed L1 residue for first accepted candidate:", l1)
92
- print("Candidate unique values:", np.unique(cand_q))
93
- diff = (cand_q != TARGET).astype(int)
94
- print("Changed cells count:", int(diff.sum()))
95
- print("Diff map (1=diff):")
96
- print(diff)
97
- else:
98
- print("No candidate array available in logs or phi_best missing.")
99
-
100
- # write fixed logs copy
101
- fixed_path = logs_path.replace("_logs.json", "_logs.fixed.json")
102
- with open(fixed_path, "w") as fh:
103
- json.dump(logs, fh, indent=2)
104
- print("Wrote fixed logs to", fixed_path)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/kaggle_llm_solver.py DELETED
@@ -1,452 +0,0 @@
1
- """
2
- PEMF ARC-AGI — LLM Program Synthesis via Ollama (Kaggle Edition)
3
- ================================================================
4
-
5
- Self-contained script for Kaggle GPU notebooks.
6
- Pulls a model via Ollama, runs LLM synthesis on unsolved ARC tasks.
7
-
8
- Usage on Kaggle:
9
- 1. Enable GPU (T4 x2 or P100)
10
- 2. Enable internet access
11
- 3. Upload this file + arc_data/ + already_solved.json
12
- 4. Run all cells
13
-
14
- The script:
15
- - Installs Ollama
16
- - Pulls the model (qwen2.5-coder:32b or smaller)
17
- - Loads ARC tasks
18
- - For each unsolved task: generates Python transform(), verifies against training pairs
19
- - Saves results to llm_results.json
20
- """
21
-
22
- import subprocess
23
- import sys
24
- import os
25
- import json
26
- import time
27
- import re
28
- import signal
29
- import numpy as np
30
- from typing import Dict, List, Optional, Tuple
31
- from collections import Counter
32
- from pathlib import Path
33
-
34
-
35
- # =============================================================================
36
- # 1. OLLAMA SETUP
37
- # =============================================================================
38
-
39
- def install_ollama():
40
- """Install Ollama on Kaggle/Linux."""
41
- print("Installing Ollama...")
42
- subprocess.run("curl -fsSL https://ollama.com/install.sh | sh",
43
- shell=True, check=True, capture_output=True)
44
- print("Ollama installed.")
45
-
46
-
47
- def start_ollama():
48
- """Start Ollama server in background."""
49
- print("Starting Ollama server...")
50
- proc = subprocess.Popen(
51
- ["ollama", "serve"],
52
- stdout=subprocess.DEVNULL,
53
- stderr=subprocess.DEVNULL,
54
- )
55
- time.sleep(3) # Wait for server to start
56
- print(f"Ollama server started (PID {proc.pid})")
57
- return proc
58
-
59
-
60
- def pull_model(model_name: str):
61
- """Pull a model via Ollama."""
62
- print(f"Pulling model {model_name}... (this may take several minutes)")
63
- result = subprocess.run(
64
- ["ollama", "pull", model_name],
65
- capture_output=True, text=True, timeout=1800
66
- )
67
- if result.returncode != 0:
68
- print(f"Pull failed: {result.stderr}")
69
- raise RuntimeError(f"Failed to pull {model_name}")
70
- print(f"Model {model_name} ready.")
71
-
72
-
73
- def call_ollama(prompt: str, model: str = "qwen2.5-coder:32b",
74
- temperature: float = 0.7, timeout_s: int = 120) -> str:
75
- """Call Ollama API and return response text."""
76
- import urllib.request
77
-
78
- payload = {
79
- "model": model,
80
- "prompt": prompt,
81
- "stream": False,
82
- "options": {
83
- "temperature": temperature,
84
- "num_predict": 2048,
85
- }
86
- }
87
-
88
- data = json.dumps(payload).encode('utf-8')
89
- req = urllib.request.Request(
90
- "http://localhost:11434/api/generate",
91
- data=data,
92
- headers={"Content-Type": "application/json"},
93
- method='POST'
94
- )
95
-
96
- try:
97
- with urllib.request.urlopen(req, timeout=timeout_s) as resp:
98
- result = json.loads(resp.read().decode())
99
- return result.get('response', '')
100
- except Exception as e:
101
- return f"ERROR: {e}"
102
-
103
-
104
- # =============================================================================
105
- # 2. PROMPT BUILDING
106
- # =============================================================================
107
-
108
- def build_prompt(task: Dict) -> str:
109
- """Build prompt for ARC task."""
110
- train_pairs = task.get('train', [])
111
-
112
- examples = []
113
- for i, pair in enumerate(train_pairs):
114
- examples.append(
115
- f"Example {i+1}:\n"
116
- f" Input: {json.dumps(pair['input'])}\n"
117
- f" Output: {json.dumps(pair['output'])}"
118
- )
119
- examples_str = "\n".join(examples)
120
-
121
- # Basic analysis
122
- inputs = [np.array(p['input']) for p in train_pairs]
123
- outputs = [np.array(p['output']) for p in train_pairs]
124
- same_shape = all(i.shape == o.shape for i, o in zip(inputs, outputs))
125
- in_colors = sorted(set(c for i in inputs for c in np.unique(i).tolist()))
126
- out_colors = sorted(set(c for o in outputs for c in np.unique(o).tolist()))
127
-
128
- analysis = f" Same input/output shape: {same_shape}\n"
129
- analysis += f" Input colors: {in_colors}\n"
130
- analysis += f" Output colors: {out_colors}\n"
131
- if not same_shape:
132
- ratios = [(o.shape[0]/i.shape[0], o.shape[1]/i.shape[1])
133
- for i, o in zip(inputs, outputs)]
134
- analysis += f" Shape ratios (h,w): {ratios}\n"
135
-
136
- prompt = f"""Solve this ARC-AGI puzzle. Write ONLY a Python function, no explanations.
137
-
138
- {examples_str}
139
-
140
- Analysis:
141
- {analysis}
142
- Write a complete Python function that transforms any input grid to its output.
143
- The function MUST work correctly for ALL examples above.
144
-
145
- ```python
146
- import numpy as np
147
- from collections import Counter
148
-
149
- def transform(grid: list[list[int]]) -> list[list[int]]:
150
- grid = np.array(grid)
151
- """
152
- return prompt
153
-
154
-
155
- # =============================================================================
156
- # 3. CODE EXTRACTION AND VERIFICATION
157
- # =============================================================================
158
-
159
- def extract_code(response: str) -> Optional[str]:
160
- """Extract Python function from LLM response."""
161
- # Try ```python blocks
162
- for pattern in [r'```python\s*(.*?)```', r'```\s*(.*?)```']:
163
- matches = re.findall(pattern, response, re.DOTALL)
164
- for match in matches:
165
- if 'def transform' in match:
166
- return match.strip()
167
-
168
- # Try finding def transform directly
169
- idx = response.find('def transform')
170
- if idx >= 0:
171
- # Look backwards for imports
172
- before = response[:idx]
173
- import_start = before.rfind('import ')
174
- if import_start >= 0:
175
- code = response[import_start:]
176
- else:
177
- code = response[idx:]
178
- # Trim at next ``` or double newline after function ends
179
- end = code.find('```')
180
- if end > 0:
181
- code = code[:end]
182
- return code.strip()
183
-
184
- # If response itself looks like code (starts with import or def)
185
- stripped = response.strip()
186
- if stripped.startswith('import') or stripped.startswith('def transform'):
187
- return stripped
188
-
189
- return None
190
-
191
-
192
- def verify_program(code: str, train_pairs: List[Dict]) -> bool:
193
- """Execute program and verify against all training pairs."""
194
- namespace = {'np': np, 'numpy': np, 'Counter': Counter,
195
- 'collections': __import__('collections')}
196
-
197
- try:
198
- exec(code, namespace)
199
- except Exception:
200
- return False
201
-
202
- if 'transform' not in namespace:
203
- return False
204
-
205
- transform_fn = namespace['transform']
206
-
207
- for pair in train_pairs:
208
- try:
209
- inp = [row[:] for row in pair['input']] # deep copy
210
- result = transform_fn(inp)
211
- if result is None:
212
- return False
213
- result_arr = np.array(result, dtype=int)
214
- expected_arr = np.array(pair['output'], dtype=int)
215
- if result_arr.shape != expected_arr.shape:
216
- return False
217
- if not np.array_equal(result_arr, expected_arr):
218
- return False
219
- except Exception:
220
- return False
221
-
222
- return True
223
-
224
-
225
- def apply_program(code: str, test_input: List[List[int]]) -> Optional[List[List[int]]]:
226
- """Apply verified program to test input."""
227
- namespace = {'np': np, 'numpy': np, 'Counter': Counter,
228
- 'collections': __import__('collections')}
229
- try:
230
- exec(code, namespace)
231
- result = namespace['transform']([row[:] for row in test_input])
232
- if result is not None:
233
- return [list(row) for row in np.array(result, dtype=int).tolist()]
234
- except Exception:
235
- pass
236
- return None
237
-
238
-
239
- # =============================================================================
240
- # 4. SYNTHESIS ENGINE
241
- # =============================================================================
242
-
243
- def synthesize_task(task: Dict, model: str = "qwen2.5-coder:32b",
244
- n_candidates: int = 8, verbose: bool = False) -> Optional[Tuple[str, str]]:
245
- """
246
- Try to solve a task via LLM.
247
- Returns (rule_name, code) if successful, None otherwise.
248
- """
249
- train_pairs = task.get('train', [])
250
- if not train_pairs:
251
- return None
252
-
253
- prompt = build_prompt(task)
254
-
255
- for i in range(n_candidates):
256
- temp = 0.1 if i == 0 else 0.5 + 0.1 * i # first try low temp, then increase
257
- response = call_ollama(prompt, model=model, temperature=min(temp, 1.0))
258
-
259
- if response.startswith("ERROR:"):
260
- if verbose:
261
- print(f" Candidate {i+1}: API error")
262
- continue
263
-
264
- code = extract_code(response)
265
- if code is None:
266
- if verbose:
267
- print(f" Candidate {i+1}: No code extracted")
268
- continue
269
-
270
- if verbose:
271
- print(f" Candidate {i+1}: {len(code)} chars", end="")
272
-
273
- if verify_program(code, train_pairs):
274
- if verbose:
275
- print(f" ✅")
276
- return (f"llm_c{i+1}_t{temp:.1f}", code)
277
- else:
278
- if verbose:
279
- print(f" ❌")
280
-
281
- return None
282
-
283
-
284
- # =============================================================================
285
- # 5. MAIN RUNNER
286
- # =============================================================================
287
-
288
- def main():
289
- # --- Configuration ---
290
- MODEL = os.environ.get("OLLAMA_MODEL", "qwen2.5-coder:32b")
291
- # For smaller GPUs, use:
292
- # MODEL = "qwen2.5-coder:14b" (fits T4 16GB)
293
- # MODEL = "qwen2.5-coder:7b" (fits any GPU)
294
-
295
- N_CANDIDATES = int(os.environ.get("N_CANDIDATES", "8"))
296
- ARC_DIR = os.environ.get("ARC_DIR", "arc_data/training")
297
- ALREADY_SOLVED_FILE = os.environ.get("ALREADY_SOLVED", "already_solved.json")
298
- OUTPUT_FILE = os.environ.get("OUTPUT_FILE", "llm_results.json")
299
-
300
- print("=" * 60)
301
- print("PEMF ARC-AGI — LLM Program Synthesis (Kaggle/Ollama)")
302
- print("=" * 60)
303
- print(f"Model: {MODEL}")
304
- print(f"Candidates per task: {N_CANDIDATES}")
305
- print(f"ARC data: {ARC_DIR}")
306
- print()
307
-
308
- # --- Install & start Ollama ---
309
- try:
310
- subprocess.run(["ollama", "--version"], capture_output=True, check=True)
311
- print("Ollama already installed.")
312
- except (FileNotFoundError, subprocess.CalledProcessError):
313
- install_ollama()
314
-
315
- server = start_ollama()
316
-
317
- try:
318
- pull_model(MODEL)
319
- except Exception as e:
320
- print(f"Failed to pull {MODEL}: {e}")
321
- print("Trying smaller model...")
322
- MODEL = "qwen2.5-coder:7b"
323
- pull_model(MODEL)
324
-
325
- # --- Load already solved tasks ---
326
- already_solved = set()
327
- if os.path.exists(ALREADY_SOLVED_FILE):
328
- with open(ALREADY_SOLVED_FILE) as f:
329
- already_solved = set(json.load(f))
330
- print(f"Already solved (symbolic): {len(already_solved)} tasks")
331
-
332
- # --- Load ARC tasks ---
333
- import glob
334
- task_files = sorted(glob.glob(os.path.join(ARC_DIR, "*.json")))
335
- print(f"Total ARC tasks: {len(task_files)}")
336
-
337
- unsolved_files = []
338
- for tf in task_files:
339
- tid = os.path.basename(tf).replace('.json', '')
340
- if tid not in already_solved:
341
- unsolved_files.append((tid, tf))
342
- print(f"Unsolved tasks to try: {len(unsolved_files)}")
343
- print()
344
-
345
- # --- Run synthesis ---
346
- results = {}
347
- solved = 0
348
- total_time = 0
349
-
350
- for idx, (tid, tf) in enumerate(unsolved_files):
351
- with open(tf) as f:
352
- task = json.load(f)
353
-
354
- print(f"[{idx+1:3d}/{len(unsolved_files)}] {tid}:", end=" ", flush=True)
355
- start = time.time()
356
-
357
- result = synthesize_task(task, model=MODEL, n_candidates=N_CANDIDATES, verbose=False)
358
- elapsed = time.time() - start
359
- total_time += elapsed
360
-
361
- if result:
362
- rule_name, code = result
363
- solved += 1
364
-
365
- # Apply to test pairs
366
- test_outputs = []
367
- for test in task.get('test', []):
368
- out = apply_program(code, test['input'])
369
- test_outputs.append(out)
370
-
371
- results[tid] = {
372
- 'status': 'solved',
373
- 'rule': rule_name,
374
- 'code': code,
375
- 'test_outputs': test_outputs,
376
- 'time_s': round(elapsed, 2),
377
- }
378
- print(f"✅ {rule_name} ({elapsed:.1f}s)")
379
- else:
380
- results[tid] = {
381
- 'status': 'failed',
382
- 'time_s': round(elapsed, 2),
383
- }
384
- print(f"❌ ({elapsed:.1f}s)")
385
-
386
- # Save progress periodically
387
- if (idx + 1) % 10 == 0:
388
- with open(OUTPUT_FILE, 'w') as f:
389
- json.dump({
390
- 'model': MODEL,
391
- 'n_candidates': N_CANDIDATES,
392
- 'solved': solved,
393
- 'attempted': idx + 1,
394
- 'total_time_s': round(total_time, 1),
395
- 'results': results,
396
- }, f, indent=2)
397
- print(f" [Progress saved: {solved}/{idx+1} solved]")
398
-
399
- # --- Final save ---
400
- with open(OUTPUT_FILE, 'w') as f:
401
- json.dump({
402
- 'model': MODEL,
403
- 'n_candidates': N_CANDIDATES,
404
- 'solved': solved,
405
- 'attempted': len(unsolved_files),
406
- 'total_time_s': round(total_time, 1),
407
- 'already_solved_symbolic': len(already_solved),
408
- 'total_solved': len(already_solved) + solved,
409
- 'total_tasks': len(task_files),
410
- 'solve_rate': round(100 * (len(already_solved) + solved) / len(task_files), 2),
411
- 'results': results,
412
- }, f, indent=2)
413
-
414
- # --- Summary ---
415
- print()
416
- print("=" * 60)
417
- print("FINAL RESULTS")
418
- print("=" * 60)
419
- print(f"LLM solved: {solved}/{len(unsolved_files)} unsolved tasks")
420
- print(f"Symbolic solved: {len(already_solved)}")
421
- print(f"TOTAL SOLVED: {len(already_solved) + solved}/{len(task_files)} ({100*(len(already_solved)+solved)/len(task_files):.1f}%)")
422
- print(f"Total LLM time: {total_time:.0f}s ({total_time/max(1,len(unsolved_files)):.1f}s/task)")
423
- print(f"Results saved to: {OUTPUT_FILE}")
424
-
425
- # Cleanup
426
- server.terminate()
427
-
428
-
429
- # =============================================================================
430
- # 6. GENERATE already_solved.json FROM SYMBOLIC RESULTS
431
- # =============================================================================
432
-
433
- def generate_already_solved(summary_file: str, output_file: str = "already_solved.json"):
434
- """
435
- Generate already_solved.json from a v4 summary file.
436
- Run this BEFORE running on Kaggle.
437
- """
438
- with open(summary_file) as f:
439
- data = json.load(f)
440
- solved = [r['task_id'] for r in data['results'] if r.get('all_train_solved')]
441
- with open(output_file, 'w') as f:
442
- json.dump(solved, f)
443
- print(f"Wrote {len(solved)} solved task IDs to {output_file}")
444
-
445
-
446
- if __name__ == "__main__":
447
- # If run with --generate-solved, create the already_solved.json
448
- if len(sys.argv) > 1 and sys.argv[1] == "--generate-solved":
449
- summary = sys.argv[2] if len(sys.argv) > 2 else "arc_results/summary_v4.json"
450
- generate_already_solved(summary)
451
- else:
452
- main()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/llm_solver_cloud.py DELETED
@@ -1,483 +0,0 @@
1
- """
2
- PEMF ARC-AGI — LLM Program Synthesis (Multi-Provider)
3
- =====================================================
4
-
5
- Supports:
6
- - NVIDIA NIM (free — DeepSeek V4 Pro, GLM-4, Qwen, Llama)
7
- - Google Gemini (free tier: 15 RPM)
8
- - DeepSeek direct API (very cheap)
9
- - GLM/Zhipu direct API (free tier)
10
- - Ollama local (any model)
11
-
12
- Usage:
13
- # NVIDIA NIM — FREE, best option (GLM 4.7 default)
14
- export LLM_PROVIDER=nvidia
15
- export NVIDIA_API_KEY=nvapi-xxxxx
16
- python llm_solver_cloud.py
17
- # Get key: https://build.nvidia.com/settings/api-keys
18
- # Default model: z-ai/glm4.7
19
-
20
- # NVIDIA NIM with DeepSeek V4
21
- export LLM_PROVIDER=nvidia
22
- export NVIDIA_API_KEY=nvapi-xxxxx
23
- export LLM_MODEL=deepseek-ai/deepseek-v4-pro
24
- python llm_solver_cloud.py
25
-
26
- # Gemini (free)
27
- export LLM_PROVIDER=gemini
28
- export GEMINI_API_KEY=your_key
29
- python llm_solver_cloud.py
30
-
31
- # Ollama local
32
- export LLM_PROVIDER=ollama
33
- export OLLAMA_MODEL=qwen2.5-coder:32b
34
- python llm_solver_cloud.py
35
- """
36
-
37
- import os
38
- import sys
39
- import json
40
- import time
41
- import re
42
- import glob
43
- import numpy as np
44
- from typing import Dict, List, Optional, Tuple
45
- from collections import Counter
46
- import urllib.request
47
-
48
-
49
- # =============================================================================
50
- # PROVIDER CONFIGS
51
- # =============================================================================
52
-
53
- PROVIDERS = {
54
- "nvidia": {
55
- "name": "NVIDIA NIM (free — DeepSeek V4, GLM 4.7, Qwen, Llama)",
56
- "base_url": "https://integrate.api.nvidia.com/v1/chat/completions",
57
- "default_model": "z-ai/glm4.7",
58
- "env_key": "NVIDIA_API_KEY",
59
- "free_tier": "Free for NVIDIA Developer Program members",
60
- "get_key_url": "https://build.nvidia.com/settings/api-keys",
61
- "models": {
62
- "glm4.7": "z-ai/glm4.7",
63
- "deepseek-v4": "deepseek-ai/deepseek-v4-pro",
64
- },
65
- },
66
- "gemini": {
67
- "name": "Google Gemini",
68
- "base_url": "https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent",
69
- "default_model": "gemini-2.0-flash",
70
- "env_key": "GEMINI_API_KEY",
71
- "free_tier": "15 RPM, 1M tokens/day",
72
- "get_key_url": "https://aistudio.google.com/apikey",
73
- },
74
- "deepseek": {
75
- "name": "DeepSeek (direct API)",
76
- "base_url": "https://api.deepseek.com/v1/chat/completions",
77
- "default_model": "deepseek-chat",
78
- "env_key": "DEEPSEEK_API_KEY",
79
- "free_tier": "$0.07/M input, $0.27/M output",
80
- "get_key_url": "https://platform.deepseek.com/api_keys",
81
- },
82
- "glm": {
83
- "name": "GLM (Zhipu AI direct)",
84
- "base_url": "https://open.bigmodel.cn/api/paas/v4/chat/completions",
85
- "default_model": "glm-4-flash",
86
- "env_key": "GLM_API_KEY",
87
- "free_tier": "glm-4-flash is free",
88
- "get_key_url": "https://open.bigmodel.cn/usercenter/apikeys",
89
- },
90
- "ollama": {
91
- "name": "Ollama (local)",
92
- "base_url": "http://localhost:11434/api/generate",
93
- "default_model": "qwen2.5-coder:32b",
94
- "env_key": None,
95
- },
96
- }
97
-
98
-
99
- # =============================================================================
100
- # API CALLERS
101
- # =============================================================================
102
-
103
- def call_nvidia(prompt: str, api_key: str, model: str = "deepseek-ai/deepseek-v4-pro",
104
- temperature: float = 0.7) -> str:
105
- """Call NVIDIA NIM API (OpenAI-compatible). Hosts DeepSeek V4, GLM, Qwen, Llama."""
106
- url = "https://integrate.api.nvidia.com/v1/chat/completions"
107
- payload = {
108
- "model": model,
109
- "messages": [{"role": "user", "content": prompt}],
110
- "max_tokens": 2048,
111
- "temperature": temperature,
112
- }
113
- data = json.dumps(payload).encode('utf-8')
114
- req = urllib.request.Request(url, data=data,
115
- headers={"Content-Type": "application/json",
116
- "Authorization": f"Bearer {api_key}"},
117
- method='POST')
118
- try:
119
- with urllib.request.urlopen(req, timeout=120) as resp:
120
- result = json.loads(resp.read().decode())
121
- return result['choices'][0]['message']['content']
122
- except Exception as e:
123
- return f"ERROR: {e}"
124
-
125
-
126
- def call_gemini(prompt: str, api_key: str, model: str = "gemini-2.0-flash",
127
- temperature: float = 0.7) -> str:
128
- """Call Google Gemini API."""
129
- url = f"https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={api_key}"
130
- payload = {
131
- "contents": [{"parts": [{"text": prompt}]}],
132
- "generationConfig": {
133
- "temperature": temperature,
134
- "maxOutputTokens": 2048,
135
- }
136
- }
137
- data = json.dumps(payload).encode('utf-8')
138
- req = urllib.request.Request(url, data=data,
139
- headers={"Content-Type": "application/json"},
140
- method='POST')
141
- try:
142
- with urllib.request.urlopen(req, timeout=120) as resp:
143
- result = json.loads(resp.read().decode())
144
- candidates = result.get('candidates', [])
145
- if candidates:
146
- parts = candidates[0].get('content', {}).get('parts', [])
147
- if parts:
148
- return parts[0].get('text', '')
149
- return "ERROR: No response content"
150
- except Exception as e:
151
- return f"ERROR: {e}"
152
-
153
-
154
- def call_deepseek(prompt: str, api_key: str, model: str = "deepseek-chat",
155
- temperature: float = 0.7) -> str:
156
- """Call DeepSeek API (OpenAI-compatible)."""
157
- url = "https://api.deepseek.com/v1/chat/completions"
158
- payload = {
159
- "model": model,
160
- "messages": [{"role": "user", "content": prompt}],
161
- "max_tokens": 2048,
162
- "temperature": temperature,
163
- }
164
- data = json.dumps(payload).encode('utf-8')
165
- req = urllib.request.Request(url, data=data,
166
- headers={"Content-Type": "application/json",
167
- "Authorization": f"Bearer {api_key}"},
168
- method='POST')
169
- try:
170
- with urllib.request.urlopen(req, timeout=120) as resp:
171
- result = json.loads(resp.read().decode())
172
- return result['choices'][0]['message']['content']
173
- except Exception as e:
174
- return f"ERROR: {e}"
175
-
176
-
177
- def call_glm(prompt: str, api_key: str, model: str = "glm-4-flash",
178
- temperature: float = 0.7) -> str:
179
- """Call GLM/Zhipu API (OpenAI-compatible)."""
180
- url = "https://open.bigmodel.cn/api/paas/v4/chat/completions"
181
- payload = {
182
- "model": model,
183
- "messages": [{"role": "user", "content": prompt}],
184
- "max_tokens": 2048,
185
- "temperature": temperature,
186
- }
187
- data = json.dumps(payload).encode('utf-8')
188
- req = urllib.request.Request(url, data=data,
189
- headers={"Content-Type": "application/json",
190
- "Authorization": f"Bearer {api_key}"},
191
- method='POST')
192
- try:
193
- with urllib.request.urlopen(req, timeout=120) as resp:
194
- result = json.loads(resp.read().decode())
195
- return result['choices'][0]['message']['content']
196
- except Exception as e:
197
- return f"ERROR: {e}"
198
-
199
-
200
- def call_ollama(prompt: str, model: str = "qwen2.5-coder:32b",
201
- temperature: float = 0.7) -> str:
202
- """Call local Ollama."""
203
- url = "http://localhost:11434/api/generate"
204
- payload = {
205
- "model": model,
206
- "prompt": prompt,
207
- "stream": False,
208
- "options": {"temperature": temperature, "num_predict": 2048},
209
- }
210
- data = json.dumps(payload).encode('utf-8')
211
- req = urllib.request.Request(url, data=data,
212
- headers={"Content-Type": "application/json"},
213
- method='POST')
214
- try:
215
- with urllib.request.urlopen(req, timeout=180) as resp:
216
- result = json.loads(resp.read().decode())
217
- return result.get('response', '')
218
- except Exception as e:
219
- return f"ERROR: {e}"
220
-
221
-
222
- def call_llm(prompt: str, provider: str, api_key: str = "",
223
- model: str = "", temperature: float = 0.7) -> str:
224
- """Unified LLM caller."""
225
- if provider == "nvidia":
226
- return call_nvidia(prompt, api_key, model or "deepseek-ai/deepseek-v4-pro", temperature)
227
- elif provider == "gemini":
228
- return call_gemini(prompt, api_key, model or "gemini-2.0-flash", temperature)
229
- elif provider == "deepseek":
230
- return call_deepseek(prompt, api_key, model or "deepseek-chat", temperature)
231
- elif provider == "glm":
232
- return call_glm(prompt, api_key, model or "glm-4-flash", temperature)
233
- elif provider == "ollama":
234
- return call_ollama(prompt, model or "qwen2.5-coder:32b", temperature)
235
- else:
236
- return f"ERROR: Unknown provider {provider}"
237
-
238
-
239
- # =============================================================================
240
- # PROMPT, EXTRACTION, VERIFICATION (same as before)
241
- # =============================================================================
242
-
243
- def build_prompt(task: Dict) -> str:
244
- train_pairs = task.get('train', [])
245
- examples = []
246
- for i, pair in enumerate(train_pairs):
247
- examples.append(
248
- f"Example {i+1}:\n"
249
- f" Input: {json.dumps(pair['input'])}\n"
250
- f" Output: {json.dumps(pair['output'])}"
251
- )
252
- examples_str = "\n".join(examples)
253
-
254
- inputs = [np.array(p['input']) for p in train_pairs]
255
- outputs = [np.array(p['output']) for p in train_pairs]
256
- same_shape = all(i.shape == o.shape for i, o in zip(inputs, outputs))
257
- in_colors = sorted(set(c for i in inputs for c in np.unique(i).tolist()))
258
- out_colors = sorted(set(c for o in outputs for c in np.unique(o).tolist()))
259
-
260
- analysis = f" Same input/output shape: {same_shape}\n"
261
- analysis += f" Input colors: {in_colors}, Output colors: {out_colors}\n"
262
- if not same_shape:
263
- for i, o in zip(inputs[:1], outputs[:1]):
264
- analysis += f" Shape: {i.shape} -> {o.shape}\n"
265
-
266
- return f"""Solve this ARC-AGI puzzle. Write ONLY a Python function, no explanations.
267
-
268
- {examples_str}
269
-
270
- Analysis:
271
- {analysis}
272
- ```python
273
- import numpy as np
274
- from collections import Counter, deque
275
- from scipy.ndimage import label
276
-
277
- def transform(grid: list[list[int]]) -> list[list[int]]:
278
- grid = np.array(grid)
279
- """
280
-
281
-
282
- def extract_code(response: str) -> Optional[str]:
283
- for pattern in [r'```python\s*(.*?)```', r'```\s*(.*?)```']:
284
- matches = re.findall(pattern, response, re.DOTALL)
285
- for match in matches:
286
- if 'def transform' in match:
287
- return match.strip()
288
- idx = response.find('def transform')
289
- if idx >= 0:
290
- before = response[:idx]
291
- import_start = max(before.rfind('import '), before.rfind('from '))
292
- start = import_start if import_start >= 0 else idx
293
- code = response[start:]
294
- end = code.find('```')
295
- if end > 0:
296
- code = code[:end]
297
- return code.strip()
298
- stripped = response.strip()
299
- if stripped.startswith(('import', 'def transform', 'from')):
300
- return stripped
301
- return None
302
-
303
-
304
- def verify_program(code: str, train_pairs: List[Dict]) -> bool:
305
- namespace = {'np': np, 'numpy': np, 'Counter': Counter,
306
- 'deque': __import__('collections').deque}
307
- try:
308
- # Allow scipy import in generated code
309
- try:
310
- import scipy.ndimage
311
- namespace['scipy'] = __import__('scipy')
312
- except ImportError:
313
- pass
314
- exec(code, namespace)
315
- except Exception:
316
- return False
317
- if 'transform' not in namespace:
318
- return False
319
- fn = namespace['transform']
320
- for pair in train_pairs:
321
- try:
322
- result = fn([row[:] for row in pair['input']])
323
- if result is None:
324
- return False
325
- r = np.array(result, dtype=int)
326
- e = np.array(pair['output'], dtype=int)
327
- if r.shape != e.shape or not np.array_equal(r, e):
328
- return False
329
- except Exception:
330
- return False
331
- return True
332
-
333
-
334
- def apply_program(code: str, test_input):
335
- namespace = {'np': np, 'numpy': np, 'Counter': Counter,
336
- 'deque': __import__('collections').deque}
337
- try:
338
- import scipy.ndimage
339
- namespace['scipy'] = __import__('scipy')
340
- except ImportError:
341
- pass
342
- try:
343
- exec(code, namespace)
344
- result = namespace['transform']([row[:] for row in test_input])
345
- if result is not None:
346
- return np.array(result, dtype=int).tolist()
347
- except Exception:
348
- pass
349
- return None
350
-
351
-
352
- # =============================================================================
353
- # SYNTHESIS + MAIN
354
- # =============================================================================
355
-
356
- def synthesize_task(task, provider, api_key, model, n_candidates=8, verbose=False):
357
- prompt = build_prompt(task)
358
- for i in range(n_candidates):
359
- temp = 0.1 if i == 0 else min(0.4 + 0.15 * i, 1.2)
360
- response = call_llm(prompt, provider, api_key, model, temp)
361
- if response.startswith("ERROR:"):
362
- if verbose: print(f" C{i+1}: {response[:60]}")
363
- # Rate limit — wait and retry
364
- if "429" in response or "rate" in response.lower():
365
- time.sleep(5)
366
- continue
367
- code = extract_code(response)
368
- if code is None:
369
- if verbose: print(f" C{i+1}: no code")
370
- continue
371
- if verbose: print(f" C{i+1}: {len(code)}ch", end="")
372
- if verify_program(code, task['train']):
373
- if verbose: print(" ✅")
374
- return (f"llm_c{i+1}", code)
375
- else:
376
- if verbose: print(" ❌")
377
- return None
378
-
379
-
380
- def main():
381
- PROVIDER = os.environ.get("LLM_PROVIDER", "gemini")
382
- config = PROVIDERS.get(PROVIDER, {})
383
- API_KEY = os.environ.get(config.get("env_key", ""), "") if config.get("env_key") else ""
384
- MODEL = os.environ.get("LLM_MODEL", config.get("default_model", ""))
385
- N_CANDIDATES = int(os.environ.get("N_CANDIDATES", "8"))
386
- ARC_DIR = os.environ.get("ARC_DIR", "arc_data/training")
387
- ALREADY_SOLVED = os.environ.get("ALREADY_SOLVED", "already_solved.json")
388
- OUTPUT = os.environ.get("OUTPUT_FILE", "llm_results.json")
389
-
390
- print("=" * 60)
391
- print(f"PEMF ARC-AGI — LLM Synthesis ({config.get('name', PROVIDER)})")
392
- print("=" * 60)
393
- print(f"Provider: {PROVIDER}")
394
- print(f"Model: {MODEL}")
395
- print(f"Candidates/task: {N_CANDIDATES}")
396
- if not API_KEY and PROVIDER != "ollama":
397
- print(f"\n⚠️ No API key! Set {config.get('env_key', '???')}")
398
- print(f" Get key: {config.get('get_key_url', '?')}")
399
- return
400
- print()
401
-
402
- # Load already solved
403
- already_solved = set()
404
- if os.path.exists(ALREADY_SOLVED):
405
- with open(ALREADY_SOLVED) as f:
406
- already_solved = set(json.load(f))
407
- print(f"Symbolic solved: {len(already_solved)}")
408
-
409
- # Load tasks
410
- task_files = sorted(glob.glob(os.path.join(ARC_DIR, "*.json")))
411
- unsolved = [(os.path.basename(tf).replace('.json',''), tf)
412
- for tf in task_files
413
- if os.path.basename(tf).replace('.json','') not in already_solved]
414
- print(f"Total tasks: {len(task_files)}, unsolved: {len(unsolved)}")
415
- print()
416
-
417
- # Run
418
- results = {}
419
- solved = 0
420
- total_time = 0
421
-
422
- for idx, (tid, tf) in enumerate(unsolved):
423
- with open(tf) as f:
424
- task = json.load(f)
425
- print(f"[{idx+1:3d}/{len(unsolved)}] {tid}:", end=" ", flush=True)
426
- start = time.time()
427
- result = synthesize_task(task, PROVIDER, API_KEY, MODEL, N_CANDIDATES, verbose=False)
428
- elapsed = time.time() - start
429
- total_time += elapsed
430
-
431
- if result:
432
- rule, code = result
433
- solved += 1
434
- test_outputs = [apply_program(code, t['input']) for t in task.get('test', [])]
435
- results[tid] = {'status': 'solved', 'rule': rule, 'code': code,
436
- 'test_outputs': test_outputs, 'time_s': round(elapsed, 2)}
437
- print(f"✅ ({elapsed:.1f}s)")
438
- else:
439
- results[tid] = {'status': 'failed', 'time_s': round(elapsed, 2)}
440
- print(f"❌ ({elapsed:.1f}s)")
441
-
442
- # Rate limit respect
443
- if PROVIDER == "gemini":
444
- time.sleep(4) # 15 RPM = 1 every 4s
445
- elif PROVIDER == "nvidia":
446
- time.sleep(2) # NIM free tier: ~30 RPM
447
- elif PROVIDER in ("deepseek", "glm"):
448
- time.sleep(1)
449
-
450
- # Save every 10
451
- if (idx + 1) % 10 == 0:
452
- _save(OUTPUT, PROVIDER, MODEL, N_CANDIDATES, solved, idx+1,
453
- total_time, already_solved, len(task_files), results)
454
- print(f" [Saved: {solved}/{idx+1}, total {len(already_solved)+solved}/{len(task_files)}]")
455
-
456
- # Final save
457
- _save(OUTPUT, PROVIDER, MODEL, N_CANDIDATES, solved, len(unsolved),
458
- total_time, already_solved, len(task_files), results)
459
-
460
- print(f"\n{'='*60}")
461
- print(f"LLM solved: {solved}/{len(unsolved)}")
462
- print(f"Symbolic: {len(already_solved)}")
463
- print(f"TOTAL: {len(already_solved)+solved}/{len(task_files)} ({100*(len(already_solved)+solved)/len(task_files):.1f}%)")
464
- print(f"Saved: {OUTPUT}")
465
-
466
-
467
- def _save(path, provider, model, n_cand, solved, attempted, total_time,
468
- already_solved, total_tasks, results):
469
- with open(path, 'w') as f:
470
- json.dump({
471
- 'provider': provider, 'model': model, 'n_candidates': n_cand,
472
- 'llm_solved': solved, 'attempted': attempted,
473
- 'total_time_s': round(total_time, 1),
474
- 'symbolic_solved': len(already_solved),
475
- 'total_solved': len(already_solved) + solved,
476
- 'total_tasks': total_tasks,
477
- 'solve_rate': round(100*(len(already_solved)+solved)/total_tasks, 2),
478
- 'results': results,
479
- }, f, indent=2)
480
-
481
-
482
- if __name__ == "__main__":
483
- main()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/merge_results.py DELETED
@@ -1,53 +0,0 @@
1
- """
2
- Merge LLM results with symbolic results to get final solve count.
3
-
4
- Usage:
5
- python merge_results.py arc_results/summary_v4.json llm_results.json
6
- """
7
- import json
8
- import sys
9
-
10
-
11
- def merge(symbolic_file: str, llm_file: str, output_file: str = "arc_results/summary_final.json"):
12
- with open(symbolic_file) as f:
13
- symbolic = json.load(f)
14
- with open(llm_file) as f:
15
- llm = json.load(f)
16
-
17
- symbolic_solved = {r['task_id'] for r in symbolic['results'] if r.get('all_train_solved')}
18
- llm_solved = {tid for tid, r in llm['results'].items() if r['status'] == 'solved'}
19
-
20
- total_solved = symbolic_solved | llm_solved
21
- new_from_llm = llm_solved - symbolic_solved
22
-
23
- print(f"Symbolic solved: {len(symbolic_solved)}")
24
- print(f"LLM solved: {len(llm_solved)}")
25
- print(f"New from LLM: {len(new_from_llm)}")
26
- print(f"TOTAL SOLVED: {len(total_solved)}/{symbolic['total_tasks']} ({100*len(total_solved)/symbolic['total_tasks']:.1f}%)")
27
-
28
- print(f"\nNew tasks solved by LLM:")
29
- for tid in sorted(new_from_llm):
30
- rule = llm['results'][tid].get('rule', '?')
31
- print(f" {tid}: {rule}")
32
-
33
- # Save merged
34
- merged = {
35
- 'total_tasks': symbolic['total_tasks'],
36
- 'symbolic_solved': len(symbolic_solved),
37
- 'llm_solved': len(llm_solved),
38
- 'new_from_llm': len(new_from_llm),
39
- 'total_solved': len(total_solved),
40
- 'solve_rate': round(100 * len(total_solved) / symbolic['total_tasks'], 2),
41
- 'symbolic_tasks': sorted(symbolic_solved),
42
- 'llm_tasks': sorted(llm_solved),
43
- 'new_llm_tasks': sorted(new_from_llm),
44
- }
45
- with open(output_file, 'w') as f:
46
- json.dump(merged, f, indent=2)
47
- print(f"\nMerged results saved to {output_file}")
48
-
49
-
50
- if __name__ == "__main__":
51
- sym = sys.argv[1] if len(sys.argv) > 1 else "arc_results/summary_v4.json"
52
- llm = sys.argv[2] if len(sys.argv) > 2 else "llm_results.json"
53
- merge(sym, llm)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/run_all_arc.py DELETED
@@ -1,183 +0,0 @@
1
- """
2
- Run the PEMF solver on all ARC-AGI tasks and report solve rates.
3
-
4
- For each task, the solver tries every training pair. A task is "solved"
5
- if the solver achieves σ=0 on ALL training pairs.
6
-
7
- Usage:
8
- 1. Download the ARC dataset into arc_data/training/:
9
- git clone https://github.com/fchollet/ARC-AGI.git /tmp/arc
10
- cp -r /tmp/arc/data/training arc_data/training
11
- 2. Run:
12
- python scripts/run_all_arc.py
13
-
14
- Outputs:
15
- arc_results/summary.json — per-task results
16
- arc_results/report.txt — human-readable report
17
- """
18
- import os, json, time, glob
19
-
20
- import numpy as np
21
- from itt_solver.solver_core import initialize_potential, sigma_l1
22
- from itt_solver.beam_logging import beam_minimize_with_log
23
- from itt_solver.experiment_driver import default_atomic_factory
24
-
25
- ARC_DIR = os.environ.get("ARC_DIR", "arc_data/training")
26
- OUT_DIR = os.environ.get("OUT_DIR", "arc_results")
27
- os.makedirs(OUT_DIR, exist_ok=True)
28
-
29
- PARAMS = {
30
- 'beam_width': 8,
31
- 'max_depth': 2,
32
- 'lock_coeff': 0.0,
33
- 'max_fraction': 1.0,
34
- 'use_symmetry': True,
35
- 'use_gravity': True,
36
- 'use_color_ops': True,
37
- 'boundary_source': 'target',
38
- }
39
-
40
- def solve_pair(inp, out, params):
41
- """Run solver on one input→output pair. Returns (sigma, transform_name, time_s)."""
42
- h, w = len(out), len(out[0])
43
- task = {
44
- 'name': 'pair',
45
- 'input': inp,
46
- 'target': out,
47
- 'target_shape': (h, w),
48
- }
49
- atomic_lib = default_atomic_factory(params, task)
50
- phi_in = initialize_potential(inp)
51
- phi_target = initialize_potential(out)
52
-
53
- start = time.time()
54
- T_best, phi_best, states, sigmas, logs = beam_minimize_with_log(
55
- phi_in, phi_target, atomic_lib,
56
- beam_width=params['beam_width'],
57
- max_depth=params['max_depth'],
58
- lock_coeff=params['lock_coeff'],
59
- max_fraction=params['max_fraction'],
60
- allowed_symbols=list(range(10)),
61
- enable_layer_minus_one=False,
62
- boundary_source=params['boundary_source'],
63
- )
64
- elapsed = time.time() - start
65
- final_sigma = float(sigmas[-1]) if sigmas else float('inf')
66
- return final_sigma, repr(T_best), elapsed
67
-
68
- def run_all():
69
- task_files = sorted(glob.glob(os.path.join(ARC_DIR, "*.json")))
70
- print(f"Running solver on {len(task_files)} ARC training tasks...")
71
- print(f"Params: beam_width={PARAMS['beam_width']}, max_depth={PARAMS['max_depth']}")
72
- print()
73
-
74
- results = []
75
- solved_count = 0
76
- partial_count = 0
77
- total_time = 0
78
-
79
- for ti, tf in enumerate(task_files):
80
- task_id = os.path.basename(tf).replace('.json', '')
81
- with open(tf) as fh:
82
- task_data = json.load(fh)
83
-
84
- train_pairs = task_data.get('train', [])
85
- test_pairs = task_data.get('test', [])
86
-
87
- pair_results = []
88
- all_zero = True
89
- best_sigma = float('inf')
90
- best_transform = None
91
-
92
- for pi, pair in enumerate(train_pairs):
93
- sigma, transform, elapsed = solve_pair(pair['input'], pair['output'], PARAMS)
94
- total_time += elapsed
95
- pair_results.append({
96
- 'pair': pi, 'sigma': sigma,
97
- 'transform': transform, 'time_s': round(elapsed, 4),
98
- })
99
- if sigma > 0:
100
- all_zero = False
101
- if sigma < best_sigma:
102
- best_sigma = sigma
103
- best_transform = transform
104
-
105
- test_results = []
106
- test_solved = None
107
- for pi, pair in enumerate(test_pairs):
108
- if 'output' in pair:
109
- sigma, transform, elapsed = solve_pair(pair['input'], pair['output'], PARAMS)
110
- total_time += elapsed
111
- test_results.append({
112
- 'pair': pi, 'sigma': sigma,
113
- 'transform': transform, 'time_s': round(elapsed, 4),
114
- })
115
- if test_solved is None:
116
- test_solved = True
117
- if sigma > 0:
118
- test_solved = False
119
-
120
- status = "SOLVED" if all_zero else "PARTIAL" if best_sigma < float('inf') and best_sigma > 0 else "FAILED"
121
- if all_zero:
122
- solved_count += 1
123
- elif best_sigma < float('inf'):
124
- partial_count += 1
125
-
126
- results.append({
127
- 'task_id': task_id, 'status': status,
128
- 'train_pairs': len(train_pairs), 'all_train_solved': all_zero,
129
- 'best_sigma': best_sigma, 'best_transform': best_transform,
130
- 'pair_results': pair_results,
131
- 'test_results': test_results, 'test_solved': test_solved,
132
- })
133
-
134
- if (ti + 1) % 20 == 0 or all_zero:
135
- marker = "✅" if all_zero else " "
136
- print(f"[{ti+1:3d}/{len(task_files)}] {task_id}: {status} (best σ={best_sigma:.1f}) {marker}")
137
-
138
- failed_count = len(task_files) - solved_count - partial_count
139
- print(f"\n{'='*60}")
140
- print(f"RESULTS: {len(task_files)} tasks")
141
- print(f" SOLVED (σ=0 all train pairs): {solved_count} ({100*solved_count/len(task_files):.1f}%)")
142
- print(f" PARTIAL (σ>0 but finite): {partial_count}")
143
- print(f" FAILED: {failed_count}")
144
- print(f" Total time: {total_time:.1f}s ({total_time/len(task_files):.2f}s/task)")
145
-
146
- summary = {
147
- 'total_tasks': len(task_files), 'solved': solved_count,
148
- 'partial': partial_count, 'failed': failed_count,
149
- 'solve_rate': round(100 * solved_count / len(task_files), 2),
150
- 'params': PARAMS, 'total_time_s': round(total_time, 2),
151
- 'results': results,
152
- }
153
- with open(os.path.join(OUT_DIR, 'summary.json'), 'w') as fh:
154
- json.dump(summary, fh, indent=2)
155
-
156
- solved_tasks = [r for r in results if r['all_train_solved']]
157
- print(f"\nSolved tasks:")
158
- for r in solved_tasks:
159
- print(f" {r['task_id']}: {r['best_transform']}")
160
-
161
- partial_tasks = sorted(
162
- [r for r in results if not r['all_train_solved'] and r['best_sigma'] < float('inf')],
163
- key=lambda r: r['best_sigma']
164
- )
165
- print(f"\nTop 20 closest-to-solving:")
166
- for r in partial_tasks[:20]:
167
- print(f" {r['task_id']}: σ={r['best_sigma']:.1f} ({r['best_transform']})")
168
-
169
- with open(os.path.join(OUT_DIR, 'report.txt'), 'w') as fh:
170
- fh.write(f"PEMF Solver — ARC-AGI Training Set Results\n{'='*60}\n")
171
- fh.write(f"Total tasks: {len(task_files)}\n")
172
- fh.write(f"Solved: {solved_count} ({100*solved_count/len(task_files):.1f}%)\n")
173
- fh.write(f"Partial: {partial_count}\nFailed: {failed_count}\n")
174
- fh.write(f"Time: {total_time:.1f}s\n\n")
175
- fh.write(f"Params: {json.dumps(PARAMS, indent=2)}\n\n")
176
- fh.write(f"Solved tasks:\n")
177
- for r in solved_tasks:
178
- fh.write(f" {r['task_id']}: {r['best_transform']}\n")
179
-
180
- print(f"\nResults saved to {OUT_DIR}/")
181
-
182
- if __name__ == '__main__':
183
- run_all()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
tests/test_transforms.py DELETED
@@ -1,156 +0,0 @@
1
- """
2
- Unit tests for all transforms in itt_solver.transforms.
3
-
4
- Usage:
5
- python tests/test_transforms.py
6
-
7
- 40 tests covering: Kronecker, mirror tiles, upscale, downscale, stack,
8
- rotate, reflect, color ops, gravity, crop, transpose, shifted tile,
9
- fill enclosed.
10
- """
11
- import numpy as np
12
- from itt_solver import transforms as tr
13
-
14
- INP = np.array([[0,7,7],[7,7,7],[0,7,7]], dtype=float)
15
-
16
- tests_passed = 0
17
- tests_failed = 0
18
-
19
- def check(name, condition):
20
- global tests_passed, tests_failed
21
- if condition:
22
- print(f" ✅ {name}")
23
- tests_passed += 1
24
- else:
25
- print(f" ❌ {name}")
26
- tests_failed += 1
27
-
28
- print("=== Kronecker Self-Similar ===")
29
- T = tr.KroneckerSelfSimilar()
30
- out = T.apply(INP)
31
- check("Output shape is 9x9", out.shape == (9, 9))
32
- check("σ=0 vs known target", np.array_equal(out, np.kron((INP!=0).astype(float), INP)))
33
-
34
- print("\n=== KroneckerSelfSimilarInv ===")
35
- T = tr.KroneckerSelfSimilarInv()
36
- out = T.apply(INP)
37
- check("Output shape is 9x9", out.shape == (9, 9))
38
-
39
- print("\n=== MirrorTileH ===")
40
- T = tr.MirrorTileH()
41
- out = T.apply(INP)
42
- check("Shape is 3x6", out.shape == (3, 6))
43
- check("Left half is input", np.array_equal(out[:, :3], INP))
44
- check("Right half is fliplr(input)", np.array_equal(out[:, 3:], np.fliplr(INP)))
45
-
46
- print("\n=== MirrorTileV ===")
47
- T = tr.MirrorTileV()
48
- out = T.apply(INP)
49
- check("Shape is 6x3", out.shape == (6, 3))
50
- check("Top half is input", np.array_equal(out[:3, :], INP))
51
- check("Bottom half is flipud(input)", np.array_equal(out[3:, :], np.flipud(INP)))
52
-
53
- print("\n=== MirrorTile4Way ===")
54
- T = tr.MirrorTile4Way()
55
- out = T.apply(INP)
56
- check("Shape is 6x6", out.shape == (6, 6))
57
-
58
- print("\n=== Upscale 2x ===")
59
- T = tr.Upscale(2)
60
- out = T.apply(INP)
61
- check("Shape is 6x6", out.shape == (6, 6))
62
- check("Top-left 2x2 block is INP[0,0]", np.all(out[:2, :2] == INP[0, 0]))
63
-
64
- print("\n=== Upscale 3x ===")
65
- T = tr.Upscale(3)
66
- out = T.apply(INP)
67
- check("Shape is 9x9", out.shape == (9, 9))
68
- check("Top-left 3x3 block is INP[0,0]", np.all(out[:3, :3] == INP[0, 0]))
69
-
70
- print("\n=== Downscale 2x ===")
71
- T = tr.Downscale(2)
72
- big = np.kron(INP, np.ones((2, 2)))
73
- out = T.apply(big)
74
- check("Downscale of upscaled recovers original", np.array_equal(out, INP))
75
-
76
- print("\n=== StackH 3 ===")
77
- T = tr.StackH(3)
78
- out = T.apply(INP)
79
- check("Shape is 3x9", out.shape == (3, 9))
80
- check("First third is input", np.array_equal(out[:, :3], INP))
81
-
82
- print("\n=== StackV 3 ===")
83
- T = tr.StackV(3)
84
- out = T.apply(INP)
85
- check("Shape is 9x3", out.shape == (9, 3))
86
- check("First third is input", np.array_equal(out[:3, :], INP))
87
-
88
- print("\n=== Rotate 90/180/270 ===")
89
- for k in [1, 2, 3]:
90
- T = tr.Rotate(k)
91
- out = T.apply(INP)
92
- check(f"Rotate_{90*k} matches np.rot90", np.array_equal(out, np.rot90(INP, k)))
93
-
94
- print("\n=== Reflect h/v ===")
95
- T = tr.Reflect('h')
96
- check("Reflect_h matches flipud", np.array_equal(T.apply(INP), np.flipud(INP)))
97
- T = tr.Reflect('v')
98
- check("Reflect_v matches fliplr", np.array_equal(T.apply(INP), np.fliplr(INP)))
99
-
100
- print("\n=== RetainColor ===")
101
- T = tr.RetainColor(7)
102
- out = T.apply(INP)
103
- check("Only 7s remain", np.all(out[INP == 7] == 7))
104
- check("Non-7 positions are 0", np.all(out[INP != 7] == 0))
105
-
106
- print("\n=== RemoveColor ===")
107
- T = tr.RemoveColor(7)
108
- out = T.apply(INP)
109
- check("7s are removed", np.all(out[INP == 7] == 0))
110
- check("0s stay 0", np.all(out[INP == 0] == 0))
111
-
112
- print("\n=== InvertColors ===")
113
- T = tr.InvertColors()
114
- out = T.apply(INP)
115
- check("0→7 swap", np.all(out[INP == 0] == 7))
116
- check("7→0 swap", np.all(out[INP == 7] == 0))
117
-
118
- print("\n=== GravityDown ===")
119
- T = tr.GravityDown()
120
- col_in = np.array([[0,7,0],[0,0,7],[7,0,0]], dtype=float)
121
- out = T.apply(col_in)
122
- check("Col 0: 7 at bottom", out[2, 0] == 7 and out[0, 0] == 0 and out[1, 0] == 0)
123
- check("Col 1: 7 at bottom", out[2, 1] == 7 and out[0, 1] == 0)
124
-
125
- print("\n=== GravityUp ===")
126
- T = tr.GravityUp()
127
- out = T.apply(col_in)
128
- check("Col 0: 7 at top", out[0, 0] == 7 and out[1, 0] == 0 and out[2, 0] == 0)
129
-
130
- print("\n=== CropToContent ===")
131
- T = tr.CropToContent()
132
- padded = np.array([[0,0,0,0],[0,7,7,0],[0,7,7,0],[0,0,0,0]], dtype=float)
133
- out = T.apply(padded)
134
- check("Crops to 2x2", out.shape == (2, 2))
135
- check("All 7s", np.all(out == 7))
136
-
137
- print("\n=== Transpose ===")
138
- T = tr.Transpose()
139
- out = T.apply(INP)
140
- check("Shape is transposed", out.shape == (3, 3))
141
- check("Values match transpose", np.array_equal(out, INP.T))
142
-
143
- print("\n=== ShiftedTile ===")
144
- T = tr.tile_to_target_shifted(shift=(1, 1), tile_factor=3)
145
- out = T.apply(INP)
146
- check("Shape is 9x9", out.shape == (9, 9))
147
- check("Differs from vanilla tile", not np.array_equal(out, np.tile(INP, (3, 3))))
148
-
149
- print("\n=== FillEnclosedHarmonic ===")
150
- T = tr.FillEnclosedHarmonic()
151
- enclosed = np.array([[7,7,7],[7,0,7],[7,7,7]], dtype=float)
152
- out = T.apply(enclosed)
153
- check("Center hole filled", out[1, 1] == 7)
154
-
155
- print(f"\n{'='*50}")
156
- print(f"Results: {tests_passed} passed, {tests_failed} failed")