JANGQ-AI/Qwen3.5-35B-A3B-JANG_2S

CRITICAL FIX (2026-03-19): Fixed eos_token_id — previous versions caused infinite thinking loops. You MUST re-download this model if you downloaded before today.

Update (2026-03-18): Models have been updated to v2.1.0 with VLM support, proper tokenizer, and fixed configs. If you downloaded before this date, please re-download for full MLX Studio compatibility.

MLX Studio — the only app that natively supports JANG models

Early Adoption: LM Studio, Ollama, oMLX, Inferencer do not support JANG yet. Use MLX Studio or pip install "jang[mlx]". Ask your favorite app's creators to add JANG support!

Qwen3.5-35B-A3B — JANG_2S (MoE, 2-bit) — VLM

JANG — Jang Adaptive N-bit Grading | Mixed-Precision Quantization for Apple Silicon

JANG is fully open-source. Quantization engine, research, and full commit history: github.com/jjang-ai/jangq. Created by Jinho Jang.

Results (200-question MMLU)

Model	MMLU	Size	Speed
JANG_2S	65.5%	9.0 GB	—
JANG_4K	77.5%	16.4 GB	—
JANG_4S	76.5%	16.7 GB	76 tok/s
MLX 4-bit	77.0%	18 GB	—
MLX 5-bit	80.5%	22 GB	62 tok/s
MLX 2-bit	~20%	10 GB	—

JANG_2S at 9 GB scores 65.5% while MLX 2-bit at 10 GB scores ~20%. On MoE models, JANG dominates at 2-bit because expert MLP absorbs the compression while attention stays protected.

Per-Subject Reference (JANG_4S vs MLX 4-bit, Qwen3.5-35B family)

Subject	JANG_4S	MLX 4-bit
Abstract Algebra	10/20	10/20
Anatomy	17/20	17/20
Astronomy	18/20	18/20
College CS	16/20	15/20
College Physics	13/20	14/20
HS Biology	18/20	18/20
HS Chemistry	17/20	18/20
HS Mathematics	11/20	9/20
Logical Fallacies	16/20	18/20
World Religions	17/20	17/20
Total (/200)	153	154

Specs

Metric	Value
Source	Qwen3.5-35B-A3B
Architecture	MoE (256 experts, 8 active) + GatedDeltaNet SSM
Profile	JANG_2S (CRITICAL=6, IMPORTANT=4, COMPRESS=2)
VLM	Yes
Format	v2 (MLX-native, instant load)

Install

pip install "jang[mlx]"

For Vision-Language models:

pip install "jang[vlm]"

Quick Start

from jang_tools.loader import load_jang_model
from mlx_lm.sample_utils import make_sampler
from mlx_lm.generate import generate_step
import mlx.core as mx

model, tokenizer = load_jang_model("JANGQ-AI/Qwen3.5-35B-A3B-JANG_2S")
sampler = make_sampler(temp=0.7)

tokens = tokenizer.encode("What is photosynthesis?")
for tok, _ in generate_step(prompt=mx.array(tokens), model=model, max_tokens=200, sampler=sampler):
    t = tok.item() if hasattr(tok, 'item') else int(tok)
    print(tokenizer.decode([t]), end="", flush=True)
    if t == tokenizer.eos_token_id:
        break

VLM Inference

from jang_tools.loader import load_jang_vlm_model
from mlx_vlm import generate

model, processor = load_jang_vlm_model("JANGQ-AI/Qwen3.5-35B-A3B-JANG_2S")

prompt = processor.tokenizer.apply_chat_template(
    [{"role": "user", "content": [
        {"type": "image", "image": "photo.jpg"},
        {"type": "text", "text": "Describe this image."}
    ]}], add_generation_prompt=True, tokenize=False, enable_thinking=False)

result = generate(model, processor, prompt, ["photo.jpg"], max_tokens=200)
print(result.text)

한국어

Qwen3.5-35B (MoE) — JANG 2S

JANG은 Apple Silicon을 위한 혼합정밀도 양자화 포맷입니다. MLX를 위한 GGUF와 같은 역할을 합니다.

모델	MMLU	크기
JANG_2S	65.5%	9 GB
MLX 2-bit	~20%	10 GB

설치

pip install "jang[mlx]"

호환성

현재 **MLX Studio**만 JANG 포맷을 기본 지원합니다. LM Studio, Ollama 등은 아직 지원하지 않습니다.

GitHub · HuggingFace · MLX Studio · PyPI

장진호 제작 · Created by Jinho Jang — jangq.ai · @dealignai

Downloads last month: 1,051

Safetensors

Model size

3B params

Tensor type

U32

F16

MLX

Hardware compatibility

Quantized

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JANGQ-AI/Qwen3.5-35B-A3B-JANG_2S

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B