add phase-2 ultra-conservative sweep (J,K,L,M) + waiter that auto-launches after phase 1 from the best ckpt 729546e verified Delta-Vector commited on 9 days ago
add 9-config hparam sweep + new_layer_lr_mul param-groups support 3af7f4c verified Delta-Vector commited on 9 days ago
fix scheduler bug: don't prepare scheduler with accelerate (was over-stepping cosine by num_processes); add grow40_winning_v2 config 35d9db6 verified Delta-Vector commited on 9 days ago
grow40_winning: switch student to bf16 to fit in B200 memory + 40-layer Adam state e9ce4f0 verified Delta-Vector commited on 9 days ago
add retry loop around load_dataset for transient HF Hub 5xx cd6b583 verified Delta-Vector commited on 9 days ago
add micro_batch_size config key + per-micro inner loop in train step (fixes OOM for fp32+seq2048) be991b1 verified Delta-Vector commited on 9 days ago
fix OOM: chunked KL with checkpointing + PYTORCH_CUDA_ALLOC_CONF expandable_segments; add kl_chunk_size config key eb5278f verified Delta-Vector commited on 9 days ago
add grow_layers, sweep configs (replicate_zero4, grow40_winning, grow40_simple), sweep runner 3f04365 verified Delta-Vector commited on 9 days ago
initial scaffold: distill.py + base/zero_14_17 configs + accelerate yaml f6e42f8 verified Delta-Vector commited on 10 days ago