YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Qwen3-8B DFlash Draft Model (Distillation)

  • Base model: Qwen/Qwen3-8B
  • Training method: DFlash with teacher-student distillation loss
  • Checkpoint: epoch_1_step_50000
  • Dataset: nemotron + codealpaca (greedy regen)
  • Hyperparameters:
    • batch_size: 2
    • learning_rate: 3e-4
    • loss_type: distill
    • loss_decay_gamma: 7.0
    • block_size: 16
    • num_epochs: 2
Downloads last month
35
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support