Kimi K2.6 GGUF โ€” Quantized by BatiAI

BatiFlow Ollama

IQ3_XXS / IQ4_XS quantization of moonshotai/Kimi-K2.6 (1T total / 32B active MoE). Quantized directly from official Moonshot FP8 weights by BatiAI.

Why Kimi K2.6?

  • 1T parameters (32B active) โ€” frontier-class open weight model
  • SWE-Bench Pro 58.6 โ€” beats GPT-5.4 xhigh (57.7), Claude Opus 4.6 max (53.4), Gemini 3.1 Pro (54.2)
  • HLE 36.4% (no tools) / 55.5% (w/ tools) โ€” Humanity's Last Exam frontier tier
  • Agent swarm architecture โ€” 300 sub-agents, 4,000 coordinated steps
  • 256K native context (262,144 tokens) via YARN scaling
  • Native tool calling โ€” search, code-interpreter, web-browsing
  • Modified-MIT license โ€” redistribution + fine-tuning allowed
  • Released 2026-04-20 by Moonshot AI

Quick Start

# IQ4_XS (recommended balance, 546GB, M3 Ultra 512GB+)
ollama pull batiai/kimi-k2.6:iq4

# IQ3_XXS (smaller, 394GB, 384GB+ RAM)
ollama pull batiai/kimi-k2.6:iq3

# Q5_K_M (highest quality, 728GB, needs 768GB+ RAM)
ollama pull batiai/kimi-k2.6:q5

Available Quantizations

Quant Size Min RAM Target Hardware Notes
IQ3_XXS 394GB 384GB M3 Ultra 512GB / H100 node aggressive compression, imatrix-calibrated
IQ4_XS 546GB 512GB M3 Ultra 512GB / 8ร—A100 80GB recommended balance
Q5_K_M 728GB 768GB 2ร— M3 Ultra / 8ร—A100 80GB / H100 node highest quality, near-original

โš ๏ธ Not for consumer Mac โ€” this is a workstation / server / frontier research model. 16-128GB Macs should use batiai/qwen3.6-35b or batiai/minimax-m2.7 instead (see comparison table below).

Hardware Reality Check

Your System IQ3 (394GB) IQ4 (546GB) Q5 (728GB)
Mac 128GB โŒ Won't fit โŒ โŒ
Mac 192GB โŒ Won't fit โŒ โŒ
Mac 256GB โš ๏ธ Heavy swap (unusable) โŒ โŒ
Mac 384GB โš ๏ธ Tight โŒ โŒ
Mac M3 Ultra 512GB โœ… Comfortable โœ… Usable (tight) โŒ
2ร— M3 Ultra (cluster) โœ… โœ… โœ…
8ร— A100 80GB (640GB total) โœ… โœ… Fast โœ…
H100 node (640GB+) โœ… Fast โœ… Fast โœ… Fast

Numbers based on MoE activation patterns โ€” 32B active params ร— 4 bytes buffer โ‰ˆ 130GB runtime even after quantization, plus shard headers + KV cache (at 256K context, cache alone is 30-80GB).

What BatiAI's Quantization Delivers

BatiAI unsloth / ubergarm
Source Direct from official Moonshot FP8 weights Same (major providers)
Quantization flow FP8 โ†’ Q8_0 โ†’ IQ3_XXS/IQ4_XS with imatrix (wikitext-2 calibration, 200 chunks) Similar
imatrix โœ… 200 chunks (quality saturation point) Varies
Tool-calling preservation โœ… Native template preserved โœ…
Korean validation โœ… (pending benchmark on target hardware) โœ—
BatiAI signature โœ… general.author=BatiAI, general.url=https://flow.bati.ai โœ—
Pipeline Open source โ€” docs/202604-large-moe-quantization.md Internal

Model Comparison โ€” BatiAI Model Lineup

Kimi K2.6 is for frontier workstation users. For everyone else:

Your Hardware Best BatiAI Model Size
16GB Mac batiai/gemma4-e4b:q4 4.9GB
24GB Mac batiai/gemma4-26b:iq4 15GB
48GB Mac batiai/qwen3.5-35b:iq4 22GB
96GB Mac batiai/qwen3.6-35b:iq4 22GB
128GB Mac batiai/minimax-m2.7:iq3 82GB
M3 Ultra 512GB / H100 batiai/kimi-k2.6:iq4 509GB

Benchmarks (source model)

Benchmark numbers from Moonshot AI's official report โ€” validating that aggressive quantization preserves these capabilities is pending on our end (bench.sh on M3 Ultra / H100 target).

Benchmark Kimi K2.6 Comparison
SWE-Bench Pro 58.6 GPT-5.4 xhigh 57.7, Opus 4.6 max 53.4
HLE (no tools) 36.4% frontier tier
HLE (w/ tools) 55.5% frontier tier
Context 256K YARN scaling
Native tool use โœ… search, code, web

Technical Details

  • Original Model: moonshotai/Kimi-K2.6
  • Architecture: Mixture of Experts โ€” 1T total / 32B active, 61 layers, 384 experts (8 selected + 1 shared), MLA attention
  • Original storage: FP8 / INT4 hybrid QAT (555GB)
  • License: Modified-MIT
  • Quantized with: llama.cpp
  • Calibration: wikitext-2-raw, 200 chunks (quality saturation)
  • Quantized by: BatiAI

Usage

llama.cpp

./llama-cli -m Kimi-K2.6-IQ4_XS.gguf \
  -p "Your prompt" \
  --ctx-size 65536 \
  --n-gpu-layers 99

Ollama

ollama run batiai/kimi-k2.6:iq4

vLLM / TGI

Not directly compatible โ€” these serve FP8/BF16 safetensors. Use original moonshotai/Kimi-K2.6 for vLLM.

About BatiAI

BatiAI quantizes frontier open weight models with validated quality and transparent provenance. We built BatiFlow โ€” free, on-device AI automation for Mac โ€” and open-source our full quantization pipeline.

The Kimi K2.6 release demonstrates our pipeline handles 1T+ MoE models (most quantization providers stop at 70B). See our Kimi K2.6 quantization notes for the engineering trade-offs.

License

Quantized from moonshotai/Kimi-K2.6. License: Modified-MIT โ€” commercial use + redistribution allowed.

Downloads last month
6,113
GGUF
Model size
1T params
Architecture
deepseek2
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for batiai/Kimi-K2.6-GGUF

Quantized
(30)
this model

Collection including batiai/Kimi-K2.6-GGUF