metadata
license: mit
library_name: transformers
pipeline_tag: text-generation
tags:
- dflash
- speculative-decoding
- draft-model
base_model: z-lab/Qwen3.6-27B-DFlash
This model was converted to FP16 from z-lab/Qwen3.6-27B-DFlash BF16.
What is "DFlash"?
DFlash is a novel speculative decoding method that utilizes a lightweight block diffusion model for drafting. It enables efficient, high-quality parallel drafting that pushes the limits of inference speed.
What is "FP16"?
"FP16" is M1/M2 Apple Silicon only optimization that leads to a very noticeable prompt processing boost. See "Metal FP32 Vs BF16 Vs FP16 benchmark" and jundot/omlx/pull/880 for details.
Use the original model if you have M3+ Apple Silicon.