Benchmark Results
| Timestamp | Task | Accuracy | Model | Max Problems |
|---|---|---|---|---|
| 2026-01-12T02:44:09.419982 | ARC-Easy | 0.00% | checkpoints/final_135m.pt | 5 |
| 2026-01-12T02:44:09.419982 | ARC-Challenge | 0.00% | checkpoints/final_135m.pt | 5 |
| 2026-01-12T02:44:09.419982 | MMLU | 0.00% | checkpoints/final_135m.pt | 5 |
| 2026-01-12T02:44:09.419982 | GSM8K | 0.00% | checkpoints/final_135m.pt | 5 |
| 2026-01-12T02:44:09.419982 | HumanEval | 0.00% | checkpoints/final_135m.pt | 5 |
| 2026-01-12T02:44:09.419982 | SpellingBee | 0.00% | checkpoints/final_135m.pt | 5 |