Qwen3.6-35B-A3B-FP8

Summary

Reference wrapper around Qwen/Qwen3.6-35B-A3B-FP8 — the official FP8 release. This repository carries no weights; it exists only to anchor the FP8 variant inside the majentik/* family navigation.

Why this variant

Pick this for Hopper / Ada / Blackwell GPUs where FP8 is natively supported and you want the closest-to-bf16 fidelity with ~50% memory savings. For additional compression pick one of the 4-bit variants below.

Hardware compatibility

Device	VRAM	Recommendation
H100 / H200	80–141 GB	native
RTX 4090	24 GB	does not fit full precision — use 4-bit
RTX 5090	32 GB	native

Reproduce

# No re-quantization needed — use the upstream weights directly.
huggingface-cli download Qwen/Qwen3.6-35B-A3B-FP8

Evaluation

Benchmarks pending — populated after the eval-harness workstream lands.

Family

bf16 — Qwen/Qwen3.6-35B-A3B
FP8 (this) — Qwen/Qwen3.6-35B-A3B-FP8
RotorQuant family — majentik/Qwen3.6-35B-A3B-RotorQuant
TurboQuant family — majentik/Qwen3.6-35B-A3B-TurboQuant

Provenance

Card-only. No weights stored.

License

Released under apache-2.0. Upstream license of the base model applies.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for majentik/Qwen3.6-35B-A3B-FP8

Base model

Qwen/Qwen3.6-35B-A3B

Quantized

Qwen/Qwen3.6-35B-A3B-FP8

Finetuned

(1)

this model