Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Delta-Vector
/
distill-m-6a3lnzvb-code

Model card Files Files and versions
xet
Community
distill-m-6a3lnzvb-code / scripts
9.23 kB
Ctrl+K
Ctrl+K
  • 1 contributor
History: 6 commits
Delta-Vector's picture
Delta-Vector
add phase-2 ultra-conservative sweep (J,K,L,M) + waiter that auto-launches after phase 1 from the best ckpt
729546e verified 7 days ago
  • backup_to_hf.py
    2.36 kB
    add phase-2 ultra-conservative sweep (J,K,L,M) + waiter that auto-launches after phase 1 from the best ckpt 7 days ago
  • run_hparam_sweep.sh
    1.85 kB
    add 9-config hparam sweep + new_layer_lr_mul param-groups support 8 days ago
  • run_phase2_sweep.sh
    2.83 kB
    add phase-2 ultra-conservative sweep (J,K,L,M) + waiter that auto-launches after phase 1 from the best ckpt 7 days ago
  • run_sweep.sh
    1.14 kB
    add grow_layers, sweep configs (replicate_zero4, grow40_winning, grow40_simple), sweep runner 8 days ago
  • run_sweep_rerun.sh
    1.05 kB
    fix OOM: chunked KL with checkpointing + PYTORCH_CUDA_ALLOC_CONF expandable_segments; add kl_chunk_size config key 8 days ago