Instructions to use JANGQ-AI/ZAYA1-8B-JANGTQ4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use JANGQ-AI/ZAYA1-8B-JANGTQ4 with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("JANGQ-AI/ZAYA1-8B-JANGTQ4") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use JANGQ-AI/ZAYA1-8B-JANGTQ4 with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "JANGQ-AI/ZAYA1-8B-JANGTQ4"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "JANGQ-AI/ZAYA1-8B-JANGTQ4" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use JANGQ-AI/ZAYA1-8B-JANGTQ4 with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "JANGQ-AI/ZAYA1-8B-JANGTQ4"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default JANGQ-AI/ZAYA1-8B-JANGTQ4
Run Hermes
hermes
- MLX LM
How to use JANGQ-AI/ZAYA1-8B-JANGTQ4 with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "JANGQ-AI/ZAYA1-8B-JANGTQ4"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "JANGQ-AI/ZAYA1-8B-JANGTQ4" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "JANGQ-AI/ZAYA1-8B-JANGTQ4", "messages": [ {"role": "user", "content": "Hello"} ] }'

ZAYA1-8B-JANGTQ4
Quantized Zyphra/ZAYA1-8B for Apple Silicon runtimes.
| Source | Zyphra/ZAYA1-8B |
| License | Apache-2.0, inherited from upstream |
| Format | JANGTQ4 |
| Modality | text |
| Bundle size | 4.65 GiB |
| Tensor keys | 1965 |
| Expert layout | Pre-stacked zaya_block.experts.switch_mlp |
Important Runtime Note
ZAYA is not a stock mlx_lm architecture. It alternates CCA attention layers and top-1 MoE layers. Use this bundle only with a runtime that implements the ZAYA CCA state contract and the converted pre-stacked expert layout.
Runtime Pin Required
Use a vmlx-swift-lm build that includes the ZAYA Swift runtime (Libraries/MLXLLM/Models/Zaya.swift + MLXLMCommon/Cache/ZayaCCACache.swift + BatchEngine/BatchZayaCCACache.swift). The first verified pin is commit b9da180 or newer.
Architecture Summary
- 80 decoder layers: alternating CCA attention and top-1 MoE
- Hidden size 2048, 16 query heads, 2 KV heads, head dim 128
- CCA state per attention layer: standard KV plus
conv_state [B,1280,2]andprev_hs [B,2048] - 16 routed experts per MoE layer, top-1 routing with MOD skip route
- Context length 131072,
rope_theta=5000000
Quantization
4-bit MXTQ routed experts + 8-bit affine non-routed tensors.
Passthrough floor for first release prep:
conv_qk.*,temp, norms, residual scaling, router path, biases, and balancing biases are preserved as float tensors.- Embeddings and
lm_headuse 8-bit affine in the prepared bundles. jangtq_runtime.safetensorsis included: true.
mxtq_bits:
{
"routed_expert": 4,
"attention": 8,
"router": 16,
"embed_tokens": 8,
"lm_head": 8,
"cca_conv": 16,
"norms_residual": 16
}
Bundle Verification
- Safetensor headers scanned.
- Source tensor coverage checked.
- Converted bundles checked for
local_expertsremoval. - Converted expert tensors checked for pre-stacked
switch_mlplayout. - JANGTQ sidecars checked for the Swift runtime contract.
- Capabilities verified:
family=zaya,supports_thinking=False,tool_parser=zaya_xml.
Korean Summary
이 번들은 Zyphra/ZAYA1-8B를 Apple Silicon MLX/JANG 런타임용으로 양자화한 모델입니다. ZAYA의 CCA attention 상태와 MoE 라우팅을 정확히 구현한 런타임에서만 사용해야 합니다.
Files
config.jsoncarriesweight_format=mxtq,zaya_expert_layout=split_switch_mlp.jang_config.jsoncarriescache_subtype=zaya_cca.- Tokenizer files and
chat_template.jinjaare preserved from the upstream source snapshot.
- Downloads last month
- 2,897
Quantized