Add ARC-AGI evaluation results: 31/400 (7.8%) solved, 17s total
Browse files- arc_results/RESULTS.md +112 -0
arc_results/RESULTS.md
ADDED
|
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# PEMF Solver — ARC-AGI Training Set Evaluation
|
| 2 |
+
|
| 3 |
+
## Results
|
| 4 |
+
|
| 5 |
+
| Metric | Value |
|
| 6 |
+
|---|---|
|
| 7 |
+
| **Total tasks** | 400 |
|
| 8 |
+
| **Solved** (σ=0 all train pairs) | **31 (7.8%)** |
|
| 9 |
+
| **≥1 pair solved** | 59 (14.8%) |
|
| 10 |
+
| **Total pairs** | 1,302 |
|
| 11 |
+
| **Pairs solved** | 146 (11.2%) |
|
| 12 |
+
| **Total time** | 17.1s (0.04s/task) |
|
| 13 |
+
|
| 14 |
+
## Solved Tasks (31)
|
| 15 |
+
|
| 16 |
+
| Task ID | Transform | Family |
|
| 17 |
+
|---|---|---|
|
| 18 |
+
| 007bbfb7 | KroneckerSelfSimilar | Self-similar |
|
| 19 |
+
| 1190e5a7 | KroneckerSelfSimilarInv | Self-similar |
|
| 20 |
+
| 1cf80156 | CropToContent | Crop |
|
| 21 |
+
| 1e0a9b12 | GravityDown | Gravity |
|
| 22 |
+
| 2013d3e2 | CropToContent | Crop |
|
| 23 |
+
| 239be575 | tile_to_target | Tiling |
|
| 24 |
+
| 28bf18c6 | CropToContent | Crop |
|
| 25 |
+
| 2dee498d | tile_to_target | Tiling |
|
| 26 |
+
| 3906de3d | GravityUp | Gravity |
|
| 27 |
+
| 3af2c5a8 | MirrorTileH | Mirror |
|
| 28 |
+
| 3c9b0459 | Rotate_180 | Rotation |
|
| 29 |
+
| 6150a2bd | Rotate_180 | Rotation |
|
| 30 |
+
| 62c24649 | MirrorTile4Way | Mirror |
|
| 31 |
+
| 67a3c6ac | Reflect_v | Reflection |
|
| 32 |
+
| 67e8384a | MirrorTile4Way | Mirror |
|
| 33 |
+
| 68b16354 | Reflect_h | Reflection |
|
| 34 |
+
| 6d0aefbc | MirrorTileH | Mirror |
|
| 35 |
+
| 6fa7a44f | MirrorTileV | Mirror |
|
| 36 |
+
| 74dd1130 | Transpose | Transpose |
|
| 37 |
+
| 7b7f7511 | tile_to_target | Tiling |
|
| 38 |
+
| 8be77c9e | MirrorTileV | Mirror |
|
| 39 |
+
| 9172f3a0 | Upscale_3x | Upscale |
|
| 40 |
+
| 9dfd6313 | Transpose | Transpose |
|
| 41 |
+
| a416b8f3 | tile_to_target | Tiling |
|
| 42 |
+
| c59eb873 | Upscale_2x | Upscale |
|
| 43 |
+
| c9e6f938 | MirrorTileH | Mirror |
|
| 44 |
+
| d10ecb37 | tile_to_target | Tiling |
|
| 45 |
+
| d631b094 | GravityUp | Gravity |
|
| 46 |
+
| d9fac9be | tile_to_target | Tiling |
|
| 47 |
+
| de1cd16c | Rotate_270 | Rotation |
|
| 48 |
+
| ed36ccf7 | Rotate_90 | Rotation |
|
| 49 |
+
|
| 50 |
+
## Transform Usage (across all 146 solved pairs)
|
| 51 |
+
|
| 52 |
+
| Transform | Pairs |
|
| 53 |
+
|---|---|
|
| 54 |
+
| tile_to_target | 34 |
|
| 55 |
+
| CropToContent | 19 |
|
| 56 |
+
| MirrorTile4Way | 11 |
|
| 57 |
+
| Rotate_90 | 9 |
|
| 58 |
+
| MirrorTileH | 8 |
|
| 59 |
+
| Rotate_180 | 7 |
|
| 60 |
+
| MirrorTileV | 7 |
|
| 61 |
+
| Transpose | 7 |
|
| 62 |
+
| Upscale_2x | 6 |
|
| 63 |
+
| ShiftedTile | 6 |
|
| 64 |
+
| Upscale_3x | 6 |
|
| 65 |
+
| KroneckerSelfSimilar | 5 |
|
| 66 |
+
| GravityUp | 5 |
|
| 67 |
+
| GravityDown | 4 |
|
| 68 |
+
| Reflect_h | 4 |
|
| 69 |
+
| KroneckerSelfSimilarInv | 3 |
|
| 70 |
+
| Reflect_v | 3 |
|
| 71 |
+
| InvertColors | 1 |
|
| 72 |
+
| Rotate_270 | 1 |
|
| 73 |
+
|
| 74 |
+
**Every single new transform contributed to at least one solve.**
|
| 75 |
+
|
| 76 |
+
## Unsolved σ Distribution
|
| 77 |
+
|
| 78 |
+
| σ Range | Pairs |
|
| 79 |
+
|---|---|
|
| 80 |
+
| (0, 5] | 56 |
|
| 81 |
+
| (5, 10] | 85 |
|
| 82 |
+
| (10, 20] | 155 |
|
| 83 |
+
| (20, 50] | 341 |
|
| 84 |
+
| (50, 100] | 230 |
|
| 85 |
+
| (100, 500] | 263 |
|
| 86 |
+
| (500+) | 26 |
|
| 87 |
+
|
| 88 |
+
Median σ for unsolved pairs: 44.0
|
| 89 |
+
|
| 90 |
+
## Almost-Solved Tasks (28)
|
| 91 |
+
Tasks where ≥1 pair reaches σ=0 but not all. These likely need a **composition** of two transforms or a new primitive not yet in the library.
|
| 92 |
+
|
| 93 |
+
## Parameters
|
| 94 |
+
```json
|
| 95 |
+
{
|
| 96 |
+
"beam_width": 8,
|
| 97 |
+
"max_depth": 2,
|
| 98 |
+
"lock_coeff": 0.0,
|
| 99 |
+
"max_fraction": 1.0,
|
| 100 |
+
"use_symmetry": true,
|
| 101 |
+
"use_gravity": true,
|
| 102 |
+
"use_color_ops": true,
|
| 103 |
+
"boundary_source": "target"
|
| 104 |
+
}
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
## How to reproduce
|
| 108 |
+
```bash
|
| 109 |
+
git clone https://github.com/fchollet/ARC-AGI.git /tmp/arc
|
| 110 |
+
cp -r /tmp/arc/data/training arc_data/training
|
| 111 |
+
python scripts/run_all_arc.py
|
| 112 |
+
```
|