rogermt commited on
Commit
1c98927
·
verified ·
1 Parent(s): e0a217d

Update results: 40/400 (10.0%) solved — +9 new tasks, 0 regressions, greedy stacker working

Browse files
Files changed (1) hide show
  1. arc_results/RESULTS.md +56 -51
arc_results/RESULTS.md CHANGED
@@ -1,17 +1,33 @@
1
  # PEMF Solver — ARC-AGI Training Set Evaluation
2
 
3
- ## Results
4
 
5
- | Metric | Value |
6
- |---|---|
7
- | **Total tasks** | 400 |
8
- | **Solved** (σ=0 all train pairs) | **31 (7.8%)** |
9
- | **≥1 pair solved** | 59 (14.8%) |
10
- | **Total pairs** | 1,302 |
11
- | **Pairs solved** | 146 (11.2%) |
12
- | **Total time** | 17.1s (0.04s/task) |
13
 
14
- ## Solved Tasks (31)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  | Task ID | Transform | Family |
17
  |---|---|---|
@@ -19,13 +35,18 @@
19
  | 1190e5a7 | KroneckerSelfSimilarInv | Self-similar |
20
  | 1cf80156 | CropToContent | Crop |
21
  | 1e0a9b12 | GravityDown | Gravity |
 
22
  | 2013d3e2 | CropToContent | Crop |
 
 
23
  | 239be575 | tile_to_target | Tiling |
 
24
  | 28bf18c6 | CropToContent | Crop |
25
  | 2dee498d | tile_to_target | Tiling |
26
  | 3906de3d | GravityUp | Gravity |
27
  | 3af2c5a8 | MirrorTileH | Mirror |
28
  | 3c9b0459 | Rotate_180 | Rotation |
 
29
  | 6150a2bd | Rotate_180 | Rotation |
30
  | 62c24649 | MirrorTile4Way | Mirror |
31
  | 67a3c6ac | Reflect_v | Reflection |
@@ -33,62 +54,46 @@
33
  | 68b16354 | Reflect_h | Reflection |
34
  | 6d0aefbc | MirrorTileH | Mirror |
35
  | 6fa7a44f | MirrorTileV | Mirror |
 
36
  | 74dd1130 | Transpose | Transpose |
37
  | 7b7f7511 | tile_to_target | Tiling |
38
  | 8be77c9e | MirrorTileV | Mirror |
39
  | 9172f3a0 | Upscale_3x | Upscale |
40
  | 9dfd6313 | Transpose | Transpose |
41
  | a416b8f3 | tile_to_target | Tiling |
 
42
  | c59eb873 | Upscale_2x | Upscale |
43
  | c9e6f938 | MirrorTileH | Mirror |
44
  | d10ecb37 | tile_to_target | Tiling |
45
- | d631b094 | GravityUp | Gravity |
46
  | d9fac9be | tile_to_target | Tiling |
47
- | de1cd16c | Rotate_270 | Rotation |
 
 
48
  | ed36ccf7 | Rotate_90 | Rotation |
49
 
50
- ## Transform Usage (across all 146 solved pairs)
51
-
52
- | Transform | Pairs |
53
- |---|---|
54
- | tile_to_target | 34 |
55
- | CropToContent | 19 |
56
- | MirrorTile4Way | 11 |
57
- | Rotate_90 | 9 |
58
- | MirrorTileH | 8 |
59
- | Rotate_180 | 7 |
60
- | MirrorTileV | 7 |
61
- | Transpose | 7 |
62
- | Upscale_2x | 6 |
63
- | ShiftedTile | 6 |
64
- | Upscale_3x | 6 |
65
- | KroneckerSelfSimilar | 5 |
66
- | GravityUp | 5 |
67
- | GravityDown | 4 |
68
- | Reflect_h | 4 |
69
- | KroneckerSelfSimilarInv | 3 |
70
- | Reflect_v | 3 |
71
- | InvertColors | 1 |
72
- | Rotate_270 | 1 |
73
-
74
- **Every single new transform contributed to at least one solve.**
75
-
76
- ## Unsolved σ Distribution
77
 
78
- | σ Range | Pairs |
79
- |---|---|
80
- | (0, 5] | 56 |
81
- | (5, 10] | 85 |
82
- | (10, 20] | 155 |
83
- | (20, 50] | 341 |
84
- | (50, 100] | 230 |
85
- | (100, 500] | 263 |
86
- | (500+) | 26 |
 
 
 
 
 
87
 
88
- Median σ for unsolved pairs: 44.0
 
89
 
90
- ## Almost-Solved Tasks (28)
91
- Tasks where ≥1 pair reaches σ=0 but not all. These likely need a **composition** of two transforms or a new primitive not yet in the library.
92
 
93
  ## Parameters
94
  ```json
 
1
  # PEMF Solver — ARC-AGI Training Set Evaluation
2
 
3
+ ## Results (v2 — object layer + greedy stacker)
4
 
5
+ | Metric | v1 | **v2** | Change |
6
+ |---|---|---|---|
7
+ | **Tasks solved** | 31 (7.8%) | **40 (10.0%)** | **+9 (+29%)** |
8
+ | ≥1 pair solved | 59 (14.8%) | — | — |
9
+ | Pairs solved | 146 (11.2%) | — | — |
10
+ | Transforms | 19 | **33** | +14 |
11
+ | Total time | 17.1s | **50.8s** | +greedy stacker |
12
+ | Regressions | — | **0** | |
13
 
14
+ ## 9 Newly Solved Tasks
15
+
16
+ | Task ID | Transform | Category |
17
+ |---|---|---|
18
+ | 1f85a75f | ExtractLargestObject | **Object extraction** |
19
+ | 22168020 | ConnectSameColorH | **Connect** |
20
+ | 22eb0ac0 | ConnectSameColorH | **Connect** |
21
+ | 23b5c85d | ExtractSmallestObject | **Object extraction** |
22
+ | 4347f46a | DrawBorder | **Border** |
23
+ | 746b3537 | tile_to_target | Tiling (new pair coverage) |
24
+ | be94b721 | CropToContent | Crop (new pair coverage) |
25
+ | ded97339 | overlay(ConnectSameColorV, ConnectSameColorH) | **Greedy stacker** |
26
+ | eb5a1d5d | CompressGrid | **Compress** |
27
+
28
+ **6 of the 9 new solves use new transforms.** 1 uses the greedy stacker overlay composition.
29
+
30
+ ## All 40 Solved Tasks
31
 
32
  | Task ID | Transform | Family |
33
  |---|---|---|
 
35
  | 1190e5a7 | KroneckerSelfSimilarInv | Self-similar |
36
  | 1cf80156 | CropToContent | Crop |
37
  | 1e0a9b12 | GravityDown | Gravity |
38
+ | 1f85a75f | ExtractLargestObject | Object |
39
  | 2013d3e2 | CropToContent | Crop |
40
+ | 22168020 | ConnectSameColorH | Connect |
41
+ | 22eb0ac0 | ConnectSameColorH | Connect |
42
  | 239be575 | tile_to_target | Tiling |
43
+ | 23b5c85d | ExtractSmallestObject | Object |
44
  | 28bf18c6 | CropToContent | Crop |
45
  | 2dee498d | tile_to_target | Tiling |
46
  | 3906de3d | GravityUp | Gravity |
47
  | 3af2c5a8 | MirrorTileH | Mirror |
48
  | 3c9b0459 | Rotate_180 | Rotation |
49
+ | 4347f46a | DrawBorder | Border |
50
  | 6150a2bd | Rotate_180 | Rotation |
51
  | 62c24649 | MirrorTile4Way | Mirror |
52
  | 67a3c6ac | Reflect_v | Reflection |
 
54
  | 68b16354 | Reflect_h | Reflection |
55
  | 6d0aefbc | MirrorTileH | Mirror |
56
  | 6fa7a44f | MirrorTileV | Mirror |
57
+ | 746b3537 | tile_to_target | Tiling |
58
  | 74dd1130 | Transpose | Transpose |
59
  | 7b7f7511 | tile_to_target | Tiling |
60
  | 8be77c9e | MirrorTileV | Mirror |
61
  | 9172f3a0 | Upscale_3x | Upscale |
62
  | 9dfd6313 | Transpose | Transpose |
63
  | a416b8f3 | tile_to_target | Tiling |
64
+ | be94b721 | CropToContent | Crop |
65
  | c59eb873 | Upscale_2x | Upscale |
66
  | c9e6f938 | MirrorTileH | Mirror |
67
  | d10ecb37 | tile_to_target | Tiling |
68
+ | d631b094 | ExtractLargestObject | Object |
69
  | d9fac9be | tile_to_target | Tiling |
70
+ | de1cd16c | KeepSmallestObject | Object |
71
+ | ded97339 | overlay(ConnectSameColorV, ConnectSameColorH) | Stacker |
72
+ | eb5a1d5d | CompressGrid | Compress |
73
  | ed36ccf7 | Rotate_90 | Rotation |
74
 
75
+ ## Architecture
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
 
77
+ ### 33 Atomic Transforms
78
+ - **Tiling**: tile_to_target, ShiftedTile, FillEnclosedHarmonic
79
+ - **Self-similar**: KroneckerSelfSimilar, KroneckerSelfSimilarInv
80
+ - **Mirror**: MirrorTileH, MirrorTileV, MirrorTile4Way
81
+ - **Upscale**: Upscale_2x, Upscale_3x
82
+ - **Stack**: StackH_3, StackV_3
83
+ - **Structural**: Transpose, CropToContent
84
+ - **Object**: ExtractLargest/Smallest/Unique/MostCommon, KeepLargest/Smallest, SortBySize
85
+ - **Fill/Connect**: FillInterior, ConnectSameColorH/V
86
+ - **Compress**: CompressGrid, RemoveBlackLines
87
+ - **Spatial**: ColorByProximity, DrawBorder
88
+ - **Symmetry**: Rotate_90/180/270, Reflect_h/v
89
+ - **Gravity**: GravityDown, GravityUp (optional)
90
+ - **Color**: InvertColors (optional)
91
 
92
+ ### Greedy Stacker
93
+ After beam search, tries `overlay(T1(input), T2(input))` for the top-N depth-1 pieces. This allows composition of two independent transforms — critical for tasks where the output is a combination of two views.
94
 
95
+ ### Object Layer (`object_layer.py`)
96
+ Connected component extraction (4-conn, 8-conn), color splitting, list reducers (largest/smallest/most_common/unique), spatial queries (bbox, center, height, width), and composition (paint/underpaint/overlay/cover/canvas).
97
 
98
  ## Parameters
99
  ```json