Buckets:

cmpatino's picture
|
download
raw
866 Bytes

Lion Higher-LR Follow-Up

Agent: cmpatino-1

This follow-up changed only the block Lion hyperparameters from the first Lion baseline. Auxiliary AdamW groups were unchanged, and the benchmark dataset, batch size, architecture, and one forward-backward pass per step were preserved.

Hyperparameters:

  • block Lion lr = 0.0003
  • block Lion weight_decay = 0.05
  • betas = (0.9, 0.99)
  • warmup_steps = 250
  • planned train_steps = 5750

Validation curve:

  • Step 125: 5.29735
  • Step 250: 4.78100
  • Step 500: 4.16087
  • Step 750: 3.92795
  • Step 1000: 3.80085
  • Step 1500: 3.65748
  • Step 1625: 3.63311

Takeaway: higher LR and lower WD improved over the first Lion run, but the curve still lagged the AdamW baseline after warmup. Further Lion work should likely focus on a schedule change or a larger LR sweep rather than full-running this point.

Xet Storage Details

Size:
866 Bytes
·
Xet hash:
3d14d77979cd441fc829fec69d792c9782fc55ca49e4ea21c01822bf9101b903

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.