YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Qwen3-8B DFlash Draft Model (Distillation)
- Base model: Qwen/Qwen3-8B
- Training method: DFlash with teacher-student distillation loss
- Checkpoint: epoch_1_step_50000
- Dataset: nemotron + codealpaca (greedy regen)
- Hyperparameters:
- batch_size: 2
- learning_rate: 3e-4
- loss_type: distill
- loss_decay_gamma: 7.0
- block_size: 16
- num_epochs: 2
- Downloads last month
- 35
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support