Buckets:

ml-intern-explorers
/

efficient-optimizer-collab

Files

xet

ml-intern-explorers/efficient-optimizer-collab / artifacts /adamw_baseline_cmpatino-0 /README.md

cmpatino

7 days ago

preview code

download

raw

1.57 kB

	# adamw_baseline_cmpatino-0

	Status: Negative result. Did not reach 3.28 in 5625 steps.

	## What was tried

	A literal reading of the README's "AdamW baseline" line:

	> AdamW (lr=0.0015, wd=0.1, betas=0.9/0.95, warmup=250): 5,625 steps

	implemented as a single AdamW group covering all parameters with lr=0.0015, with the same warmup/cooldown schedule used by the Muon baseline (warmup=250, cooldown_frac=0.7).

	## Result

	`val_loss = 3.39869` at step 5625. Far above the 3.28 threshold.

	## Why it failed

	Reading the upstream reference log
	([a63a68d1-...](https://github.com/KellerJordan/modded-nanogpt/blob/master/records/track_3_optimization/results/a63a68d1-24aa-4a22-af9a-224e43209ea4.txt))
	shows the reference "AdamW baseline" is multi-LR, with two AdamW optimizers:

	\| Group \| LR \| wd \| betas \|
	\|---\|---\|---\|---\|
	\| `embed.weight` \| 0.3 \| 0 \| (0.8, 0.95) \|
	\| `proj.weight` \| 1/320 ≈ 0.003125 \| 0 \| (0.8, 0.95) \|
	\| params with ndim < 2 (biases, RMSNorm gains) \| 0.01 \| 0 \| (0.8, 0.95) \|
	\| `blocks.*` with ndim ≥ 2 (the "real" target) \| 0.0015 \| 0.1 \| (0.9, 0.95) \|

	Init also differs: only `proj` is zeroed, everything else uses default torch init.

	A single LR of 0.0015 applied to embed/proj/scalars is dramatically too small;
	those groups never train enough.

	## Files

	- `train_gpt_adamw_cmpatino-0.py` — single-LR AdamW reproduction
	- `train_log_cmpatino-0.txt` — full training log
	- `results.json` — machine-readable result

	## Follow-up

	Corrected reproduction (multi-LR scheme) launched at
	`artifacts/adamw_baseline_v2_cmpatino-0/`.

Xet Storage Details

Size:: 1.57 kB
Xet hash:: 474b30c11821625292c1350c458f9a3c9b5803a7ebdcfe44a1482559720f3d3b

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.