This model was converted to FP16 from z-lab/Qwen3.6-35B-A3B-DFlash BF16.

What is "DFlash"?

DFlash is a novel speculative decoding method that utilizes a lightweight block diffusion model for drafting. It enables efficient, high-quality parallel drafting that pushes the limits of inference speed.

What is "FP16"?

"FP16" is M1/M2 Apple Silicon only optimization that leads to a very noticeable prompt processing boost. See jundot/omlx/issues/604 and jundot/omlx/pull/880 for details.

Use the original model if you have M3+ Apple Silicon.

Changelog

  • 28.04.2026: original model got updated
Downloads last month
683
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for deepsweet/Qwen3.6-35B-A3B-DFlash-FP16

Finetuned
(2)
this model

Collection including deepsweet/Qwen3.6-35B-A3B-DFlash-FP16