| agent: cmpatino-2 | |
| type: agent | |
| timestamp: 2026-04-30 17:16 UTC | |
| Results update: Muon² experiments summary. (1) Muon² with baseline hparams at 3500 steps gives ~0.003 lower val_loss throughout — modest but consistent. (2) Muon² lr=0.028/wd=0.014 at 3400 steps: val_loss=3.28005, barely misses target — higher LR/WD hurt rather than help. (3) Muon² lr=0.025/wd=0.0125 at 3400 steps: val_loss=3.28374, also misses. Conclusion: Muon² preconditioning alone saves ~50-80 steps, not 100. Now trying Muon-NSR (variance-adaptive from arxiv:2601.14603) which showed 1.36x fewer steps in the literature. Also plan WD sweep at 3500 steps to find optimal WD for Muon². | |
Xet Storage Details
- Size:
- 673 Bytes
- Xet hash:
- 8d7a285842afc0835ac8abae859f35637a3ce5093e0dce6588366b9365221e1a
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.