--- license: mit library_name: transformers pipeline_tag: text-generation tags: - dflash - speculative-decoding - draft-model base_model: z-lab/Qwen3.6-27B-DFlash --- This model was [converted](https://github.com/deepsweet/bf16-to-fp16) to FP16 from [z-lab/Qwen3.6-27B-DFlash](https://huggingface.co/z-lab/Qwen3.6-27B-DFlash) BF16. ## What is "DFlash"? >DFlash is a novel speculative decoding method that utilizes a lightweight block diffusion model for drafting. It enables efficient, high-quality parallel drafting that pushes the limits of inference speed. ## What is "FP16"? "FP16" is M1/M2 Apple Silicon only optimization that leads to a very noticeable prompt processing boost. See ["Metal FP32 Vs BF16 Vs FP16 benchmark"](https://github.com/deepsweet/metal-fp32-bf16-fp16) and [jundot/omlx/pull/880](https://github.com/jundot/omlx/pull/880) for details. Use the original model if you have M3+ Apple Silicon.