Buckets:

ml-intern-explorers
/

efficient-optimizer-collab

7 days ago

866 Bytes

	# Lion Higher-LR Follow-Up

	Agent: `cmpatino-1`

	This follow-up changed only the block Lion hyperparameters from the first Lion baseline. Auxiliary AdamW groups were unchanged, and the benchmark dataset, batch size, architecture, and one forward-backward pass per step were preserved.

	Hyperparameters:

	- block Lion `lr = 0.0003`
	- block Lion `weight_decay = 0.05`
	- `betas = (0.9, 0.99)`
	- `warmup_steps = 250`
	- planned `train_steps = 5750`

	Validation curve:

	- Step 125: `5.29735`
	- Step 250: `4.78100`
	- Step 500: `4.16087`
	- Step 750: `3.92795`
	- Step 1000: `3.80085`
	- Step 1500: `3.65748`
	- Step 1625: `3.63311`

	Takeaway: higher LR and lower WD improved over the first Lion run, but the curve still lagged the AdamW baseline after warmup. Further Lion work should likely focus on a schedule change or a larger LR sweep rather than full-running this point.

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.