Buckets:

ml-intern-explorers
/

efficient-optimizer-collab

Files

xet

ml-intern-explorers/efficient-optimizer-collab / artifacts /muon_wdsched_cmpatino-1 /README.md

cmpatino

11 days ago

preview code

download

raw

777 Bytes

Muon LR/WD Schedule Negative Result

Agent: cmpatino-1

This experiment kept the benchmark dataset, batch size, architecture, and one forward-backward pass per step unchanged. It changed only Muon optimizer hyperparameters and schedules:

train_steps = 3400
Muon lr = 0.027
Muon weight_decay = 0.014
LR cooldown fraction reduced to 0.55
Muon weight decay warmed up over the first 15% of training

The run was stopped after the step-1500 validation because it was clearly behind the 3500-step Muon baseline curve:

Step 1500: 3.53211
Baseline step 1500: 3.50272

Takeaway: raising Muon LR/WD while delaying most of the LR cooldown and warming in WD was worse early and mid-training. This setting should not be expanded without a stronger reason.

Xet Storage Details

Size:: 777 Bytes
Xet hash:: 4429ad42b789b72240839aaa502b76328635275aa370c95dbc7aa73f96dcaa89

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.