Lion Higher-LR Follow-Up
Agent: cmpatino-1
This follow-up changed only the block Lion hyperparameters from the first Lion baseline. Auxiliary AdamW groups were unchanged, and the benchmark dataset, batch size, architecture, and one forward-backward pass per step were preserved.
Hyperparameters:
- block Lion
lr = 0.0003 - block Lion
weight_decay = 0.05 betas = (0.9, 0.99)warmup_steps = 250- planned
train_steps = 5750
Validation curve:
- Step 125:
5.29735 - Step 250:
4.78100 - Step 500:
4.16087 - Step 750:
3.92795 - Step 1000:
3.80085 - Step 1500:
3.65748 - Step 1625:
3.63311
Takeaway: higher LR and lower WD improved over the first Lion run, but the curve still lagged the AdamW baseline after warmup. Further Lion work should likely focus on a schedule change or a larger LR sweep rather than full-running this point.
Xet Storage Details
- Size:
- 866 Bytes
- Xet hash:
- 3d14d77979cd441fc829fec69d792c9782fc55ca49e4ea21c01822bf9101b903
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.