rogermt commited on
Commit
d7531b4
·
verified ·
1 Parent(s): 75bfb61

Add ARC-AGI evaluation results: 31/400 (7.8%) solved, 17s total

Browse files
Files changed (1) hide show
  1. arc_results/RESULTS.md +112 -0
arc_results/RESULTS.md ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PEMF Solver — ARC-AGI Training Set Evaluation
2
+
3
+ ## Results
4
+
5
+ | Metric | Value |
6
+ |---|---|
7
+ | **Total tasks** | 400 |
8
+ | **Solved** (σ=0 all train pairs) | **31 (7.8%)** |
9
+ | **≥1 pair solved** | 59 (14.8%) |
10
+ | **Total pairs** | 1,302 |
11
+ | **Pairs solved** | 146 (11.2%) |
12
+ | **Total time** | 17.1s (0.04s/task) |
13
+
14
+ ## Solved Tasks (31)
15
+
16
+ | Task ID | Transform | Family |
17
+ |---|---|---|
18
+ | 007bbfb7 | KroneckerSelfSimilar | Self-similar |
19
+ | 1190e5a7 | KroneckerSelfSimilarInv | Self-similar |
20
+ | 1cf80156 | CropToContent | Crop |
21
+ | 1e0a9b12 | GravityDown | Gravity |
22
+ | 2013d3e2 | CropToContent | Crop |
23
+ | 239be575 | tile_to_target | Tiling |
24
+ | 28bf18c6 | CropToContent | Crop |
25
+ | 2dee498d | tile_to_target | Tiling |
26
+ | 3906de3d | GravityUp | Gravity |
27
+ | 3af2c5a8 | MirrorTileH | Mirror |
28
+ | 3c9b0459 | Rotate_180 | Rotation |
29
+ | 6150a2bd | Rotate_180 | Rotation |
30
+ | 62c24649 | MirrorTile4Way | Mirror |
31
+ | 67a3c6ac | Reflect_v | Reflection |
32
+ | 67e8384a | MirrorTile4Way | Mirror |
33
+ | 68b16354 | Reflect_h | Reflection |
34
+ | 6d0aefbc | MirrorTileH | Mirror |
35
+ | 6fa7a44f | MirrorTileV | Mirror |
36
+ | 74dd1130 | Transpose | Transpose |
37
+ | 7b7f7511 | tile_to_target | Tiling |
38
+ | 8be77c9e | MirrorTileV | Mirror |
39
+ | 9172f3a0 | Upscale_3x | Upscale |
40
+ | 9dfd6313 | Transpose | Transpose |
41
+ | a416b8f3 | tile_to_target | Tiling |
42
+ | c59eb873 | Upscale_2x | Upscale |
43
+ | c9e6f938 | MirrorTileH | Mirror |
44
+ | d10ecb37 | tile_to_target | Tiling |
45
+ | d631b094 | GravityUp | Gravity |
46
+ | d9fac9be | tile_to_target | Tiling |
47
+ | de1cd16c | Rotate_270 | Rotation |
48
+ | ed36ccf7 | Rotate_90 | Rotation |
49
+
50
+ ## Transform Usage (across all 146 solved pairs)
51
+
52
+ | Transform | Pairs |
53
+ |---|---|
54
+ | tile_to_target | 34 |
55
+ | CropToContent | 19 |
56
+ | MirrorTile4Way | 11 |
57
+ | Rotate_90 | 9 |
58
+ | MirrorTileH | 8 |
59
+ | Rotate_180 | 7 |
60
+ | MirrorTileV | 7 |
61
+ | Transpose | 7 |
62
+ | Upscale_2x | 6 |
63
+ | ShiftedTile | 6 |
64
+ | Upscale_3x | 6 |
65
+ | KroneckerSelfSimilar | 5 |
66
+ | GravityUp | 5 |
67
+ | GravityDown | 4 |
68
+ | Reflect_h | 4 |
69
+ | KroneckerSelfSimilarInv | 3 |
70
+ | Reflect_v | 3 |
71
+ | InvertColors | 1 |
72
+ | Rotate_270 | 1 |
73
+
74
+ **Every single new transform contributed to at least one solve.**
75
+
76
+ ## Unsolved σ Distribution
77
+
78
+ | σ Range | Pairs |
79
+ |---|---|
80
+ | (0, 5] | 56 |
81
+ | (5, 10] | 85 |
82
+ | (10, 20] | 155 |
83
+ | (20, 50] | 341 |
84
+ | (50, 100] | 230 |
85
+ | (100, 500] | 263 |
86
+ | (500+) | 26 |
87
+
88
+ Median σ for unsolved pairs: 44.0
89
+
90
+ ## Almost-Solved Tasks (28)
91
+ Tasks where ≥1 pair reaches σ=0 but not all. These likely need a **composition** of two transforms or a new primitive not yet in the library.
92
+
93
+ ## Parameters
94
+ ```json
95
+ {
96
+ "beam_width": 8,
97
+ "max_depth": 2,
98
+ "lock_coeff": 0.0,
99
+ "max_fraction": 1.0,
100
+ "use_symmetry": true,
101
+ "use_gravity": true,
102
+ "use_color_ops": true,
103
+ "boundary_source": "target"
104
+ }
105
+ ```
106
+
107
+ ## How to reproduce
108
+ ```bash
109
+ git clone https://github.com/fchollet/ARC-AGI.git /tmp/arc
110
+ cp -r /tmp/arc/data/training arc_data/training
111
+ python scripts/run_all_arc.py
112
+ ```