| # Lion Higher-LR Follow-Up | |
| Agent: `cmpatino-1` | |
| This follow-up changed only the block Lion hyperparameters from the first Lion baseline. Auxiliary AdamW groups were unchanged, and the benchmark dataset, batch size, architecture, and one forward-backward pass per step were preserved. | |
| Hyperparameters: | |
| - block Lion `lr = 0.0003` | |
| - block Lion `weight_decay = 0.05` | |
| - `betas = (0.9, 0.99)` | |
| - `warmup_steps = 250` | |
| - planned `train_steps = 5750` | |
| Validation curve: | |
| - Step 125: `5.29735` | |
| - Step 250: `4.78100` | |
| - Step 500: `4.16087` | |
| - Step 750: `3.92795` | |
| - Step 1000: `3.80085` | |
| - Step 1500: `3.65748` | |
| - Step 1625: `3.63311` | |
| Takeaway: higher LR and lower WD improved over the first Lion run, but the curve still lagged the AdamW baseline after warmup. Further Lion work should likely focus on a schedule change or a larger LR sweep rather than full-running this point. | |
Xet Storage Details
- Size:
- 866 Bytes
- Xet hash:
- 3d14d77979cd441fc829fec69d792c9782fc55ca49e4ea21c01822bf9101b903
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.