Instructions to use Deepdive404/Kimi-K2.6-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Deepdive404/Kimi-K2.6-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Deepdive404/Kimi-K2.6-GGUF")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Deepdive404/Kimi-K2.6-GGUF", dtype="auto")

llama-cpp-python

How to use Deepdive404/Kimi-K2.6-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Deepdive404/Kimi-K2.6-GGUF",
	filename="BF16/Kimi-K2.6-BF16-00001-of-00046.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Deepdive404/Kimi-K2.6-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Deepdive404/Kimi-K2.6-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
llama-cli -hf Deepdive404/Kimi-K2.6-GGUF:UD-Q4_K_XL

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Deepdive404/Kimi-K2.6-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
llama-cli -hf Deepdive404/Kimi-K2.6-GGUF:UD-Q4_K_XL

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Deepdive404/Kimi-K2.6-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
./llama-cli -hf Deepdive404/Kimi-K2.6-GGUF:UD-Q4_K_XL

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Deepdive404/Kimi-K2.6-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Deepdive404/Kimi-K2.6-GGUF:UD-Q4_K_XL

Use Docker

docker model run hf.co/Deepdive404/Kimi-K2.6-GGUF:UD-Q4_K_XL

LM Studio
Jan

vLLM

How to use Deepdive404/Kimi-K2.6-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Deepdive404/Kimi-K2.6-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Deepdive404/Kimi-K2.6-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Deepdive404/Kimi-K2.6-GGUF:UD-Q4_K_XL

SGLang

How to use Deepdive404/Kimi-K2.6-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Deepdive404/Kimi-K2.6-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Deepdive404/Kimi-K2.6-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Deepdive404/Kimi-K2.6-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Deepdive404/Kimi-K2.6-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use Deepdive404/Kimi-K2.6-GGUF with Ollama:
```
ollama run hf.co/Deepdive404/Kimi-K2.6-GGUF:UD-Q4_K_XL
```

Unsloth Studio new

How to use Deepdive404/Kimi-K2.6-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Deepdive404/Kimi-K2.6-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Deepdive404/Kimi-K2.6-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Deepdive404/Kimi-K2.6-GGUF to start chatting

Pi new

How to use Deepdive404/Kimi-K2.6-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Deepdive404/Kimi-K2.6-GGUF:UD-Q4_K_XL

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Kimi-K2.6-GGUF"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Docker Model Runner
How to use Deepdive404/Kimi-K2.6-GGUF with Docker Model Runner:
```
docker model run hf.co/Deepdive404/Kimi-K2.6-GGUF:UD-Q4_K_XL
```

Lemonade

How to use Deepdive404/Kimi-K2.6-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Deepdive404/Kimi-K2.6-GGUF:UD-Q4_K_XL

Run and chat with the model

lemonade run user.Kimi-K2.6-GGUF-UD-Q4_K_XL

List all available models

lemonade list

Kimi-K2.6-GGUF / README.md

Deepdive404

Duplicate from unsloth/Kimi-K2.6-GGUF

0d1195e 13 days ago

preview code

raw

history blame contribute delete

29.2 kB

	---
	base_model:
	- moonshotai/Kimi-K2.6
	tags:
	- compressed-tensors
	- unsloth
	- kimi_k25
	license: other
	license_name: modified-mit
	library_name: transformers
	pipeline_tag: image-text-to-text
	---
	# Read our How to [Run Kimi K2.6 Guide!](https://unsloth.ai/docs/models/kimi-k2.6)
	<div>
	<p style="margin: 0 0 0px 0; margin-top: 0px;">
	<em>See <a href="https://unsloth.ai/docs/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0 GGUFs</a> for our quantization benchmarks.</em>
	</p>
	<div style="display: flex; gap: 5px; align-items: center; margin-bottom: 0px;">
	<a href="https://github.com/unslothai/unsloth/">
	<img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
	</a>
	<a href="https://discord.gg/unsloth">
	<img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
	</a>
	<a href="https://unsloth.ai/docs/models/kimi-k2.6">
	<img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
	</a>
	</div>
	<ul style="margin: 0;">
	<li>To run Kimi K2.6 in full precision lossless, run Q8 (UD-Q8_K_XL), which is 595GB and only 10GB bigger than Q4 (UD-Q4_K_XL).</li>
	<li>See our <a href="https://unsloth.ai/docs/models/kimi-k2.6">Kimi K2.6 guide</a> for quantization analysis and instructions.</li>
	</ul>
	</div>

	<br>

	# Kimi-K2.6

	![kimi k2.6](https://unsloth.ai/docs/~gitbook/image?url=https%3A%2F%2F3215535692-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FxhOjnexMCB3dmuQFQ2Zq%252Fuploads%252FdBDLDaRXybr9JMCs33bC%252Fkimibench.jpg%3Falt%3Dmedia%26token%3D040ea87d-09e8-452c-bfb2-4231305a20d2&width=768&dpr=3&quality=100&sign=fb360710&sv=2)

	Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration.

	### Key Features
	- Long-Horizon Coding: K2.6 achieves significant improvements on complex, end-to-end coding tasks, generalizing robustly across programming languages (Rust, Go, Python) and domains spanning front-end, DevOps, and performance optimization.
	- Coding-Driven Design: K2.6 is capable of transforming simple prompts and visual inputs into production-ready interfaces and lightweight full-stack workflows, generating structured layouts, interactive elements, and rich animations with deliberate aesthetic precision.
	- Elevated Agent Swarm: Scaling horizontally to 300 sub-agents executing 4,000 coordinated steps, K2.6 can dynamically decompose tasks into parallel, domain-specialized subtasks, delivering end-to-end outputs from documents to websites to spreadsheets in a single autonomous run.
	- Proactive & Open Orchestration: For autonomous tasks, K2.6 demonstrates strong performance in powering persistent, 24/7 background agents that proactively manage schedules, execute code, and orchestrate cross-platform operations without human oversight.

	## 2. Model Summary

	<div align="center">


	\| \| \|
	\|:---:\|:---:\|
	\| Architecture \| Mixture-of-Experts (MoE) \|
	\| Total Parameters \| 1T \|
	\| Activated Parameters \| 32B \|
	\| Number of Layers (Dense layer included) \| 61 \|
	\| Number of Dense Layers \| 1 \|
	\| Attention Hidden Dimension \| 7168 \|
	\| MoE Hidden Dimension (per Expert) \| 2048 \|
	\| Number of Attention Heads \| 64 \|
	\| Number of Experts \| 384 \|
	\| Selected Experts per Token \| 8 \|
	\| Number of Shared Experts \| 1 \|
	\| Vocabulary Size \| 160K \|
	\| Context Length \| 256K \|
	\| Attention Mechanism \| MLA \|
	\| Activation Function \| SwiGLU \|
	\| Vision Encoder \| MoonViT \|
	\| Parameters of Vision Encoder \| 400M \|
	</div>

	## 3. Evaluation Results

	<div align="center">
	<table>
	<thead>
	<tr>
	<th align="center">Benchmark</th>
	<th align="center"><sup>Kimi K2.6</sup></th>
	<th align="center"><sup>GPT-5.4 <br><sup>(xhigh)</sup></sup></th>
	<th align="center"><sup>Claude Opus 4.6 <br><sup>(max effort)</sup></sup></th>
	<th align="center"><sup>Gemini 3.1 Pro<br><sup>(thinking high)</sup></sup></th>
	<th align="center"><sup>Kimi K2.5</sup></th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td align="center" colspan=6><strong>Agentic</strong></td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">HLE-Full<br>(w/ tools)</td>
	<td align="center" style="vertical-align: middle">54.0</td>
	<td align="center" style="vertical-align: middle">52.1</td>
	<td align="center" style="vertical-align: middle">53.0</td>
	<td align="center" style="vertical-align: middle">51.4</td>
	<td align="center" style="vertical-align: middle">50.2</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">BrowseComp</td>
	<td align="center" style="vertical-align: middle">83.2</td>
	<td align="center" style="vertical-align: middle" rowspan="2">82.7</td>
	<td align="center" style="vertical-align: middle" rowspan="2">83.7</td>
	<td align="center" style="vertical-align: middle" rowspan="2">85.9</td>
	<td align="center" style="vertical-align: middle">74.9</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">BrowseComp<br>(Agent Swarm)</td>
	<td align="center" style="vertical-align: middle">86.3</td>
	<td align="center" style="vertical-align: middle">78.4</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">DeepSearchQA<br>(f1-score)</td>
	<td align="center" style="vertical-align: middle">92.5</td>
	<td align="center" style="vertical-align: middle">78.6</td>
	<td align="center" style="vertical-align: middle">91.3</td>
	<td align="center" style="vertical-align: middle">81.9</td>
	<td align="center" style="vertical-align: middle">89.0</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">DeepSearchQA<br>(accuracy)</td>
	<td align="center" style="vertical-align: middle">83.0</td>
	<td align="center" style="vertical-align: middle">63.7</td>
	<td align="center" style="vertical-align: middle">80.6</td>
	<td align="center" style="vertical-align: middle">60.2</td>
	<td align="center" style="vertical-align: middle">77.1</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">WideSearch<br> (item-f1)</td>
	<td align="center" style="vertical-align: middle">80.8</td>
	<td align="center" style="vertical-align: middle">-</td>
	<td align="center" style="vertical-align: middle">-</td>
	<td align="center" style="vertical-align: middle">-</td>
	<td align="center" style="vertical-align: middle">72.7</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">Toolathlon</td>
	<td align="center" style="vertical-align: middle">50.0</td>
	<td align="center" style="vertical-align: middle">54.6</td>
	<td align="center" style="vertical-align: middle">47.2</td>
	<td align="center" style="vertical-align: middle">48.8</td>
	<td align="center" style="vertical-align: middle">27.8</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">MCPMark</td>
	<td align="center" style="vertical-align: middle">55.9</td>
	<td align="center" style="vertical-align: middle">62.5*</td>
	<td align="center" style="vertical-align: middle">56.7*</td>
	<td align="center" style="vertical-align: middle">55.9*</td>
	<td align="center" style="vertical-align: middle">29.5</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">Claw Eval (pass^3)</td>
	<td align="center" style="vertical-align: middle">62.3</td>
	<td align="center" style="vertical-align: middle">60.3</td>
	<td align="center" style="vertical-align: middle">70.4</td>
	<td align="center" style="vertical-align: middle">57.8</td>
	<td align="center" style="vertical-align: middle">52.3</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">Claw Eval (pass@3)</td>
	<td align="center" style="vertical-align: middle">80.9</td>
	<td align="center" style="vertical-align: middle">78.4</td>
	<td align="center" style="vertical-align: middle">82.4</td>
	<td align="center" style="vertical-align: middle">82.9</td>
	<td align="center" style="vertical-align: middle">75.4</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">APEX-Agents</td>
	<td align="center" style="vertical-align: middle">27.9</td>
	<td align="center" style="vertical-align: middle">33.3</td>
	<td align="center" style="vertical-align: middle">33.0</td>
	<td align="center" style="vertical-align: middle">32.0</td>
	<td align="center" style="vertical-align: middle">11.5</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">OSWorld-Verified</td>
	<td align="center" style="vertical-align: middle">73.1</td>
	<td align="center" style="vertical-align: middle">75.0</td>
	<td align="center" style="vertical-align: middle">72.7</td>
	<td align="center" style="vertical-align: middle">-</td>
	<td align="center" style="vertical-align: middle">63.3</td>
	</tr>
	<tr>
	<td align="center" colspan=6><strong>Coding</strong></td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">Terminal-Bench 2.0<br>(Terminus-2)</td>
	<td align="center" style="vertical-align: middle">66.7</td>
	<td align="center" style="vertical-align: middle">65.4*</td>
	<td align="center" style="vertical-align: middle">65.4</td>
	<td align="center" style="vertical-align: middle">68.5</td>
	<td align="center" style="vertical-align: middle">50.8</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">SWE-Bench Pro</td>
	<td align="center" style="vertical-align: middle">58.6</td>
	<td align="center" style="vertical-align: middle">57.7</td>
	<td align="center" style="vertical-align: middle">53.4</td>
	<td align="center" style="vertical-align: middle">54.2</td>
	<td align="center" style="vertical-align: middle">50.7</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">SWE-Bench Multilingual</td>
	<td align="center" style="vertical-align: middle">76.7</td>
	<td align="center" style="vertical-align: middle">-</td>
	<td align="center" style="vertical-align: middle">77.8</td>
	<td align="center" style="vertical-align: middle">76.9*</td>
	<td align="center" style="vertical-align: middle">73.0</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">SWE-Bench Verified</td>
	<td align="center" style="vertical-align: middle">80.2</td>
	<td align="center" style="vertical-align: middle">-</td>
	<td align="center" style="vertical-align: middle">80.8</td>
	<td align="center" style="vertical-align: middle">80.6</td>
	<td align="center" style="vertical-align: middle">76.8</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">SciCode</td>
	<td align="center" style="vertical-align: middle">52.2</td>
	<td align="center" style="vertical-align: middle">56.6</td>
	<td align="center" style="vertical-align: middle">51.9</td>
	<td align="center" style="vertical-align: middle">58.9</td>
	<td align="center" style="vertical-align: middle">48.7</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">OJBench (python)</td>
	<td align="center" style="vertical-align: middle">60.6</td>
	<td align="center" style="vertical-align: middle">-</td>
	<td align="center" style="vertical-align: middle">60.3</td>
	<td align="center" style="vertical-align: middle">70.7</td>
	<td align="center" style="vertical-align: middle">54.7</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">LiveCodeBench (v6)</td>
	<td align="center" style="vertical-align: middle">89.6</td>
	<td align="center" style="vertical-align: middle">-</td>
	<td align="center" style="vertical-align: middle">88.8</td>
	<td align="center" style="vertical-align: middle">91.7</td>
	<td align="center" style="vertical-align: middle">85.0</td>
	</tr>
	<tr>
	<td align="center" colspan=6><strong>Reasoning & Knowledge</strong></td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">HLE-Full</td>
	<td align="center" style="vertical-align: middle">34.7</td>
	<td align="center" style="vertical-align: middle">39.8</td>
	<td align="center" style="vertical-align: middle">40.0</td>
	<td align="center" style="vertical-align: middle">44.4</td>
	<td align="center" style="vertical-align: middle">30.1</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">AIME 2026</td>
	<td align="center" style="vertical-align: middle">96.4</td>
	<td align="center" style="vertical-align: middle">99.2</td>
	<td align="center" style="vertical-align: middle">96.7</td>
	<td align="center" style="vertical-align: middle">98.3</td>
	<td align="center" style="vertical-align: middle">95.8</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">HMMT 2026 (Feb)</td>
	<td align="center" style="vertical-align: middle">92.7</td>
	<td align="center" style="vertical-align: middle">97.7</td>
	<td align="center" style="vertical-align: middle">96.2</td>
	<td align="center" style="vertical-align: middle">94.7</td>
	<td align="center" style="vertical-align: middle">87.1</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">IMO-AnswerBench</td>
	<td align="center" style="vertical-align: middle">86.0</td>
	<td align="center" style="vertical-align: middle">91.4</td>
	<td align="center" style="vertical-align: middle">75.3</td>
	<td align="center" style="vertical-align: middle">91.0*</td>
	<td align="center" style="vertical-align: middle">81.8</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">GPQA-Diamond</td>
	<td align="center" style="vertical-align: middle">90.5</td>
	<td align="center" style="vertical-align: middle">92.8</td>
	<td align="center" style="vertical-align: middle">91.3</td>
	<td align="center" style="vertical-align: middle">94.3</td>
	<td align="center" style="vertical-align: middle">87.6</td>
	</tr>
	<tr>
	<td align="center" colspan=6><strong>Vision</strong></td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">MMMU-Pro</td>
	<td align="center" style="vertical-align: middle">79.4</td>
	<td align="center" style="vertical-align: middle">81.2</td>
	<td align="center" style="vertical-align: middle">73.9</td>
	<td align="center" style="vertical-align: middle">83.0*</td>
	<td align="center" style="vertical-align: middle">78.5</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">MMMU-Pro (w/ python)</td>
	<td align="center" style="vertical-align: middle">80.1</td>
	<td align="center" style="vertical-align: middle">82.1</td>
	<td align="center" style="vertical-align: middle">77.3</td>
	<td align="center" style="vertical-align: middle">85.3*</td>
	<td align="center" style="vertical-align: middle">77.7</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">CharXiv (RQ)</td>
	<td align="center" style="vertical-align: middle">80.4</td>
	<td align="center" style="vertical-align: middle">82.8*</td>
	<td align="center" style="vertical-align: middle">69.1</td>
	<td align="center" style="vertical-align: middle">80.2*</td>
	<td align="center" style="vertical-align: middle">77.5</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">CharXiv (RQ) (w/ python)</td>
	<td align="center" style="vertical-align: middle">86.7</td>
	<td align="center" style="vertical-align: middle">90.0*</td>
	<td align="center" style="vertical-align: middle">84.7</td>
	<td align="center" style="vertical-align: middle">89.9*</td>
	<td align="center" style="vertical-align: middle">78.7</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">MathVision</td>
	<td align="center" style="vertical-align: middle">87.4</td>
	<td align="center" style="vertical-align: middle">92.0*</td>
	<td align="center" style="vertical-align: middle">71.2*</td>
	<td align="center" style="vertical-align: middle">89.8*</td>
	<td align="center" style="vertical-align: middle">84.2</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">MathVision (w/ python)</td>
	<td align="center" style="vertical-align: middle">93.2</td>
	<td align="center" style="vertical-align: middle">96.1*</td>
	<td align="center" style="vertical-align: middle">84.6*</td>
	<td align="center" style="vertical-align: middle">95.7*</td>
	<td align="center" style="vertical-align: middle">85.0</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">BabyVision</td>
	<td align="center" style="vertical-align: middle">39.8</td>
	<td align="center" style="vertical-align: middle">49.7</td>
	<td align="center" style="vertical-align: middle">14.8</td>
	<td align="center" style="vertical-align: middle">51.6</td>
	<td align="center" style="vertical-align: middle">36.5</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">BabyVision (w/ python)</td>
	<td align="center" style="vertical-align: middle">68.5</td>
	<td align="center" style="vertical-align: middle">80.2*</td>
	<td align="center" style="vertical-align: middle">38.4*</td>
	<td align="center" style="vertical-align: middle">68.3*</td>
	<td align="center" style="vertical-align: middle">40.5</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">V* (w/ python)</td>
	<td align="center" style="vertical-align: middle">96.9</td>
	<td align="center" style="vertical-align: middle">98.4*</td>
	<td align="center" style="vertical-align: middle">86.4*</td>
	<td align="center" style="vertical-align: middle">96.9*</td>
	<td align="center" style="vertical-align: middle">86.9</td>
	</tr>
	</tbody>
	</table>
	</div>

	<details>
	<summary><b>Footnotes</b></summary>

	1. General Testing Details
	- We report results for Kimi K2.6 and Kimi K2.5 with thinking mode enabled, Claude Opus 4.6 with max effort, GPT-5.4 with xhigh reasoning effort, and Gemini 3.1 Pro with a high thinking level.
	- Unless otherwise specified, all Kimi K2.6 experiments were conducted with temperature = 1.0, top-p = 1.0, and a context length of 262,144 tokens.
	- Benchmarks without publicly available scores were re-evaluated under the same conditions used for Kimi K2.6 and are marked with an asterisk (`*`). Except where noted with an asterisk, all other results are cited from official reports.
	2. Reasoning Benchmarks
	- IMO-AnswerBench scores for GPT-5.4 and Claude 4.6 were obtained from [z.ai/blog/glm-5.1](https://z.ai/blog/glm-5.1).
	- Humanity's Last Exam (HLE) and other reasoning tasks were evaluated with a maximum generation length of 98,304 tokens. By default, we report results on the HLE full set. For the text-only subset, Kimi K2.6 achieves 36.4% accuracy without tools and 55.5% with tools.
	3. Tool-Augmented / Agentic Tasks
	- Kimi K2.6 was equipped with search, code-interpreter, and web-browsing tools for HLE with tools, BrowseComp, DeepSearchQA, and WideSearch.
	- For HLE-Full with tools, the maximum generation length is 262,144 tokens with a per-step limit of 49,152 tokens. We employ a simple context management strategy: once the context window exceeds the threshold, only the most recent round of tool-related messages is retained.
	- For BrowseComp, we report scores obtained with context management using the same discard-all strategy as Kimi K2.5 and DeepSeek-V3.2.
	- For DeepSearchQA, no context management was applied to Kimi K2.6 tests, and tasks exceeding the supported context length were directly counted as failed. Scores for Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on DeepSearchQA are cited from the [Claude Opus 4.7 System Card](https://cdn.sanity.io/files/4zrzovbb/website/037f06850df7fbe871e206dad004c3db5fd50340.pdf).
	- For WideSearch, we report results under the "hide tool result" context management setting. Once the context window exceeds the threshold, only the most recent round of tool-related messages is retained.
	- The test system prompts are identical to those used in the [Kimi K2.5 technical report](https://arxiv.org/pdf/2602.02276).
	- Claw Eval was conducted using version 1.1 with max-tokens-per-step = 16384.
	- For APEX-Agents, we evaluate 452 tasks from the public 480-task release, as done by [Artificial Analysis](https://artificialanalysis.ai/evaluations/apex-agents-aa)(excluding Investment Banking Worlds 244 and 246, which have external runtime dependencies)
	4. Coding Tasks
	- Terminal-Bench 2.0 scores were obtained with the default agent framework (Terminus-2) and the provided JSON parser, operating in preserve thinking mode.
	- For the SWE-Bench series of evaluations (including Verified, Multilingual, and Pro), we used an in-house evaluation framework adapted from SWE-agent. This framework includes a minimal set of tools—bash tool, createfile tool, insert tool, view tool, strreplace tool, and submit tool.
	- All reported scores for coding tasks are averaged over 10 independent runs.
	5. Vision Benchmarks
	- Max-tokens = 98,304, averaged over three runs (avg@3).
	- Settings with Python tool use max-tokens-per-step = 65,536 and max-steps = 50 for multi-step reasoning.
	- MMMU-Pro follows the official protocol, preserving input order and prepending images.

	</details>


	## 4. Native INT4 Quantization
	Kimi-K2.6 adopts the same native int4 quantization method as [Kimi-K2-Thinking](https://huggingface.co/moonshotai/Kimi-K2-Thinking#4-native-int4-quantization).

	## 5. Deployment

	> [!Note]
	> You can access Kimi-K2.6's API on https://platform.moonshot.ai and we provide OpenAI/Anthropic-compatible API for you. To verify the deployment is correct, we also provide the [Kimi Vendor Verifier](https://kimi.com/blog/kimi-vendor-verifier.html).
	Currently, Kimi-K2.6 is recommended to run on the following inference engines:
	* vLLM
	* SGLang
	* KTransformers

	Kimi-K2.6 has the same architecture as Kimi-K2.5, and the deployment method can be directly reused.

	The version requirement for `transformers` is `>=4.57.1, <5.0.0`.

	Deployment examples can be found in the [Model Deployment Guide](docs/deploy_guidance.md).


	---
	## 6. Model Usage

	The usage demos below demonstrate how to call our official API.

	For third-party APIs deployed with vLLM or SGLang, please note that:
	> [!Note]
	> - Chat with video content is an experimental feature and is only supported in our official API for now.
	>
	> - The recommended `temperature` will be `1.0` for Thinking mode and `0.6` for Instant mode.
	>
	> - The recommended `top_p` is `0.95`.
	>
	> - To use instant mode, you need to pass `{'chat_template_kwargs': {"thinking": False}}` in `extra_body`.

	### Chat Completion

	This is a simple chat completion script which shows how to call K2.6 API in Thinking and Instant modes.

	```python
	import openai
	import base64
	import requests
	def simple_chat(client: openai.OpenAI, model_name: str):
	messages = [
	{'role': 'system', 'content': 'You are Kimi, an AI assistant created by Moonshot AI.'},
	{
	'role': 'user',
	'content': [
	{'type': 'text', 'text': 'which one is bigger, 9.11 or 9.9? think carefully.'}
	],
	},
	]
	response = client.chat.completions.create(
	model=model_name, messages=messages, stream=False, max_tokens=4096
	)
	print('====== Below is reasoning content in Thinking Mode ======')
	print(f'reasoning content: {response.choices[0].message.reasoning}')
	print('====== Below is response in Thinking Mode ======')
	print(f'response: {response.choices[0].message.content}')

	# To use instant mode, pass {"thinking" = {"type":"disabled"}}
	response = client.chat.completions.create(
	model=model_name,
	messages=messages,
	stream=False,
	max_tokens=4096,
	extra_body={'thinking': {'type': 'disabled'}}, # this is for official API
	# extra_body= {'chat_template_kwargs': {"thinking": False}} # this is for vLLM/SGLang
	)
	print('====== Below is response in Instant Mode ======')
	print(f'response: {response.choices[0].message.content}')
	```


	### Chat Completion with visual content

	K2.6 supports Image and Video input.

	The following example demonstrates how to call K2.6 API with image input:

	```python
	import openai
	import base64
	import requests

	def chat_with_image(client: openai.OpenAI, model_name: str):
	url = 'https://huggingface.co/moonshotai/Kimi-K2.6/resolve/main/figures/kimi-logo.png'
	image_base64 = base64.b64encode(requests.get(url).content).decode()
	messages = [
	{
	'role': 'user',
	'content': [
	{'type': 'text', 'text': 'Describe this image in detail.'},
	{
	'type': 'image_url',
	'image_url': {'url': f'data:image/png;base64, {image_base64}'},
	},
	],
	}
	]

	response = client.chat.completions.create(
	model=model_name, messages=messages, stream=False, max_tokens=8192
	)
	print('====== Below is reasoning content in Thinking Mode ======')
	print(f'reasoning content: {response.choices[0].message.reasoning}')
	print('====== Below is response in Thinking Mode ======')
	print(f'response: {response.choices[0].message.content}')

	# Also support instant mode if you pass {"thinking" = {"type":"disabled"}}
	response = client.chat.completions.create(
	model=model_name,
	messages=messages,
	stream=False,
	max_tokens=4096,
	extra_body={'thinking': {'type': 'disabled'}}, # this is for official API
	# extra_body= {'chat_template_kwargs': {"thinking": False}} # this is for vLLM/SGLang
	)
	print('====== Below is response in Instant Mode ======')
	print(f'response: {response.choices[0].message.content}')

	return response.choices[0].message.content
	```

	The following example demonstrates how to call K2.6 API with video input:

	```python
	import openai
	import base64
	import requests

	def chat_with_video(client: openai.OpenAI, model_name:str):
	url = 'https://huggingface.co/moonshotai/Kimi-K2.6/resolve/main/figures/demo_video.mp4'
	video_base64 = base64.b64encode(requests.get(url).content).decode()
	messages = [
	{
	"role": "user",
	"content": [
	{"type": "text","text": "Describe the video in detail."},
	{
	"type": "video_url",
	"video_url": {"url": f"data:video/mp4;base64,{video_base64}"},
	},
	],
	}
	]

	response = client.chat.completions.create(model=model_name, messages=messages)
	print('====== Below is reasoning content in Thinking Mode ======')
	print(f'reasoning content: {response.choices[0].message.reasoning}')
	print('====== Below is response in Thinking Mode ======')
	print(f'response: {response.choices[0].message.content}')

	# Also support instant mode if pass {"thinking" = {"type":"disabled"}}
	response = client.chat.completions.create(
	model=model_name,
	messages=messages,
	stream=False,
	max_tokens=4096,
	extra_body={'thinking': {'type': 'disabled'}}, # this is for official API
	# extra_body= {'chat_template_kwargs': {"thinking": False}} # this is for vLLM/SGLang
	)
	print('====== Below is response in Instant Mode ======')
	print(f'response: {response.choices[0].message.content}')
	return response.choices[0].message.content
	```

	### Preserve Thinking
	Kimi K2.6 supports `preserve_thinking` mode, which retains full reasoning content across multi-turn interactions and enhances performance in coding agent scenarios.

	This feature is disabled by default. The following example demonstrates how to call K2.6 API in `preserve_thinking` mode:

	```python
	def chat_with_preserve_thinking(client: openai.OpenAI, model_name: str):
	messages = [
	{
	"role": "user",
	"content": "Tell me three random numbers."
	},
	{
	"role": "assistant",
	"reasoning_content": "I'll start by listing five numbers: 473, 921, 235, 215, 222, and I'll tell you the first three.",
	"content": "473, 921, 235"
	},
	{
	"role": "user",
	"content": "What are the other two numbers you have in mind?"
	}
	]

	response = client.chat.completions.create(
	model=model_name,
	messages=messages,
	stream=False,
	max_tokens=4096,
	extra_body={'thinking': {'type': 'enabled', 'keep': 'all'}}, # this is for official API
	# extra_body={"chat_template_kwargs": {"thinking":True, "preserve_thinking": True}}, # this is for vLLM/SGLang
	# We recommend enabling preserve_thinking only in think mode.
	)
	# the assistant should mention 215 and 222 that appear in the prior reasoning content
	print(f"response: {response.choices[0].message.reasoning}")
	return response.choices[0].message.content

	```

	### Interleaved Thinking and Multi-Step Tool Call

	K2.6 shares the same design of Interleaved Thinking and Multi-Step Tool Call as K2 Thinking. For usage example, please refer to the [K2 Thinking documentation](https://platform.moonshot.ai/docs/guide/use-kimi-k2-thinking-model#complete-example).

	### Coding Agent Framework

	Kimi K2.6 works best with Kimi Code CLI as its agent framework — give it a try at https://www.kimi.com/code.


	---

	## 7. License

	Both the code repository and the model weights are released under the [Modified MIT License](LICENSE).

	---

	## 8. Third Party Notices

	See [THIRD PARTY NOTICES](THIRD_PARTY_NOTICES.md)

	---

	## 9. Contact Us

	If you have any questions, please reach out at [support@moonshot.ai](mailto:support@moonshot.ai).