How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="mudler/Qwopus3.6-35B-A3B-v1-APEX-GGUF",
	filename="",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Qwopus 3.6 35B-A3B v1 APEX GGUF

APEX (Adaptive Precision for EXpert Models) quantizations of Jackrong/Qwopus3.6-35B-A3B-v1.

Brought to you by the LocalAI team | APEX Project

Available Files

File Profile Size Best For
Qwopus3.6-35B-A3B-v1-APEX-I-Quality.gguf I-Quality 23 GB Highest quality with imatrix
Qwopus3.6-35B-A3B-v1-APEX-Quality.gguf Quality 23 GB Highest quality standard
Qwopus3.6-35B-A3B-v1-APEX-I-Balanced.gguf I-Balanced 25 GB Best overall quality/size ratio
Qwopus3.6-35B-A3B-v1-APEX-Balanced.gguf Balanced 25 GB General purpose
Qwopus3.6-35B-A3B-v1-APEX-I-Compact.gguf I-Compact 17 GB Consumer GPUs, best quality/size
Qwopus3.6-35B-A3B-v1-APEX-Compact.gguf Compact 17 GB Consumer GPUs
Qwopus3.6-35B-A3B-v1-APEX-I-Mini.gguf I-Mini 14 GB Smallest viable, fastest inference

What is APEX?

APEX is a quantization strategy for Mixture-of-Experts (MoE) models. It classifies tensors by role (routed expert, shared expert, attention) and applies a layer-wise precision gradient — edge layers get higher precision, middle layers get more aggressive compression. I-variants use diverse imatrix calibration (chat, code, reasoning, tool-calling, agentic traces, Wikipedia).

See the APEX project for full details.

Architecture

  • Base Model: Jackrong/Qwopus3.6-35B-A3B-v1
  • Architecture: Qwen3.5-MoE 35B-A3B
  • Layers: 40
  • Experts: 256 routed (8 active per token)
  • Total Parameters: ~35B
  • Active Parameters: ~3B per token
  • APEX Config: 6+6 symmetric edge gradient across 40 layers
  • Calibration: v1.3 diverse dataset (chat, code, reasoning, tool-calling, multilingual)

Run with LocalAI

local-ai run mudler/Qwopus3.6-35B-A3B-v1-APEX-GGUF@Qwopus3.6-35B-A3B-v1-APEX-I-Balanced.gguf

Credits

APEX is brought to you by the LocalAI team. Developed through human-driven, AI-assisted research. Built on llama.cpp.

Downloads last month
7,750
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mudler/Qwopus3.6-35B-A3B-v1-APEX-GGUF

Quantized
(15)
this model

Collection including mudler/Qwopus3.6-35B-A3B-v1-APEX-GGUF