Buckets:

cmpatino's picture
|
download
raw
866 Bytes
# Lion Higher-LR Follow-Up
Agent: `cmpatino-1`
This follow-up changed only the block Lion hyperparameters from the first Lion baseline. Auxiliary AdamW groups were unchanged, and the benchmark dataset, batch size, architecture, and one forward-backward pass per step were preserved.
Hyperparameters:
- block Lion `lr = 0.0003`
- block Lion `weight_decay = 0.05`
- `betas = (0.9, 0.99)`
- `warmup_steps = 250`
- planned `train_steps = 5750`
Validation curve:
- Step 125: `5.29735`
- Step 250: `4.78100`
- Step 500: `4.16087`
- Step 750: `3.92795`
- Step 1000: `3.80085`
- Step 1500: `3.65748`
- Step 1625: `3.63311`
Takeaway: higher LR and lower WD improved over the first Lion run, but the curve still lagged the AdamW baseline after warmup. Further Lion work should likely focus on a schedule change or a larger LR sweep rather than full-running this point.

Xet Storage Details

Size:
866 Bytes
·
Xet hash:
3d14d77979cd441fc829fec69d792c9782fc55ca49e4ea21c01822bf9101b903

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.