Instructions to use majentik/Qwen3.6-35B-A3B-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use majentik/Qwen3.6-35B-A3B-FP8 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="majentik/Qwen3.6-35B-A3B-FP8")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("majentik/Qwen3.6-35B-A3B-FP8", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use majentik/Qwen3.6-35B-A3B-FP8 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "majentik/Qwen3.6-35B-A3B-FP8" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "majentik/Qwen3.6-35B-A3B-FP8", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/majentik/Qwen3.6-35B-A3B-FP8
- SGLang
How to use majentik/Qwen3.6-35B-A3B-FP8 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "majentik/Qwen3.6-35B-A3B-FP8" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "majentik/Qwen3.6-35B-A3B-FP8", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "majentik/Qwen3.6-35B-A3B-FP8" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "majentik/Qwen3.6-35B-A3B-FP8", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use majentik/Qwen3.6-35B-A3B-FP8 with Docker Model Runner:
docker model run hf.co/majentik/Qwen3.6-35B-A3B-FP8
Qwen3.6-35B-A3B-FP8
Summary
Reference wrapper around Qwen/Qwen3.6-35B-A3B-FP8 β the official FP8 release. This repository carries no weights; it exists only to anchor the FP8 variant inside the majentik/* family navigation.
Why this variant
Pick this for Hopper / Ada / Blackwell GPUs where FP8 is natively supported and you want the closest-to-bf16 fidelity with ~50% memory savings. For additional compression pick one of the 4-bit variants below.
Hardware compatibility
| Device | VRAM | Recommendation |
|---|---|---|
| H100 / H200 | 80β141 GB | native |
| RTX 4090 | 24 GB | does not fit full precision β use 4-bit |
| RTX 5090 | 32 GB | native |
Reproduce
# No re-quantization needed β use the upstream weights directly.
huggingface-cli download Qwen/Qwen3.6-35B-A3B-FP8
Evaluation
Benchmarks pending β populated after the eval-harness workstream lands.
Family
- bf16 β Qwen/Qwen3.6-35B-A3B
- FP8 (this) β Qwen/Qwen3.6-35B-A3B-FP8
- RotorQuant family β majentik/Qwen3.6-35B-A3B-RotorQuant
- TurboQuant family β majentik/Qwen3.6-35B-A3B-TurboQuant
Provenance
Card-only. No weights stored.
License
Released under apache-2.0. Upstream license of the base model applies.