--- tags: - compressed-tensors license: other license_name: modified-mit library_name: transformers pipeline_tag: image-text-to-text ---
Kimi K2.6

Chat Homepage
Hugging Face Twitter Follow Discord
License
## 1. Model Introduction Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration. ### Key Features - **Long-Horizon Coding**: K2.6 achieves significant improvements on complex, end-to-end coding tasks, generalizing robustly across programming languages (Rust, Go, Python) and domains spanning front-end, DevOps, and performance optimization. - **Coding-Driven Design**: K2.6 is capable of transforming simple prompts and visual inputs into production-ready interfaces and lightweight full-stack workflows, generating structured layouts, interactive elements, and rich animations with deliberate aesthetic precision. - **Elevated Agent Swarm**: Scaling horizontally to 300 sub-agents executing 4,000 coordinated steps, K2.6 can dynamically decompose tasks into parallel, domain-specialized subtasks, delivering end-to-end outputs from documents to websites to spreadsheets in a single autonomous run. - **Proactive & Open Orchestration**: For autonomous tasks, K2.6 demonstrates strong performance in powering persistent, 24/7 background agents that proactively manage schedules, execute code, and orchestrate cross-platform operations without human oversight. ## 2. Model Summary
| | | |:---:|:---:| | **Architecture** | Mixture-of-Experts (MoE) | | **Total Parameters** | 1T | | **Activated Parameters** | 32B | | **Number of Layers** (Dense layer included) | 61 | | **Number of Dense Layers** | 1 | | **Attention Hidden Dimension** | 7168 | | **MoE Hidden Dimension** (per Expert) | 2048 | | **Number of Attention Heads** | 64 | | **Number of Experts** | 384 | | **Selected Experts per Token** | 8 | | **Number of Shared Experts** | 1 | | **Vocabulary Size** | 160K | | **Context Length** | 256K | | **Attention Mechanism** | MLA | | **Activation Function** | SwiGLU | | **Vision Encoder** | MoonViT | | **Parameters of Vision Encoder** | 400M |
## 3. Evaluation Results
Benchmark Kimi K2.6 GPT-5.4
(xhigh)
Claude Opus 4.6
(max effort)
Gemini 3.1 Pro
(thinking high)
Kimi K2.5
Agentic
HLE-Full
(w/ tools)
54.0 52.1 53.0 51.4 50.2
BrowseComp 83.2 82.7 83.7 85.9 74.9
BrowseComp
(Agent Swarm)
86.3 78.4
DeepSearchQA
(f1-score)
92.5 78.6 91.3 81.9 89.0
DeepSearchQA
(accuracy)
83.0 63.7 80.6 60.2 77.1
WideSearch
(item-f1)
80.8 - - - 72.7
Toolathlon 50.0 54.6 47.2 48.8 27.8
MCPMark 55.9 62.5* 56.7* 55.9* 29.5
Claw Eval (pass^3) 62.3 60.3 70.4 57.8 52.3
Claw Eval (pass@3) 80.9 78.4 82.4 82.9 75.4
APEX-Agents 27.9 33.3 33.0 32.0 11.5
OSWorld-Verified 73.1 75.0 72.7 - 63.3
Coding
Terminal-Bench 2.0
(Terminus-2)
66.7 65.4* 65.4 68.5 50.8
SWE-Bench Pro 58.6 57.7 53.4 54.2 50.7
SWE-Bench Multilingual 76.7 - 77.8 76.9* 73.0
SWE-Bench Verified 80.2 - 80.8 80.6 76.8
SciCode 52.2 56.6 51.9 58.9 48.7
OJBench (python) 60.6 - 60.3 70.7 54.7
LiveCodeBench (v6) 89.6 - 88.8 91.7 85.0
Reasoning & Knowledge
HLE-Full 34.7 39.8 40.0 44.4 30.1
AIME 2026 96.4 99.2 96.7 98.3 95.8
HMMT 2026 (Feb) 92.7 97.7 96.2 94.7 87.1
IMO-AnswerBench 86.0 91.4 75.3 91.0* 81.8
GPQA-Diamond 90.5 92.8 91.3 94.3 87.6
Vision
MMMU-Pro 79.4 81.2 73.9 83.0* 78.5
MMMU-Pro (w/ python) 80.1 82.1 77.3 85.3* 77.7
CharXiv (RQ) 80.4 82.8* 69.1 80.2* 77.5
CharXiv (RQ) (w/ python) 86.7 90.0* 84.7 89.9* 78.7
MathVision 87.4 92.0* 71.2* 89.8* 84.2
MathVision (w/ python) 93.2 96.1* 84.6* 95.7* 85.0
BabyVision 39.8 49.7 14.8 51.6 36.5
BabyVision (w/ python) 68.5 80.2* 38.4* 68.3* 40.5
V* (w/ python) 96.9 98.4* 86.4* 96.9* 86.9
Footnotes 1. **General Testing Details** - We report results for Kimi K2.6 and Kimi K2.5 with thinking mode enabled, Claude Opus 4.6 with max effort, GPT-5.4 with xhigh reasoning effort, and Gemini 3.1 Pro with a high thinking level. - Unless otherwise specified, all Kimi K2.6 experiments were conducted with temperature = 1.0, top-p = 1.0, and a context length of 262,144 tokens. - Benchmarks without publicly available scores were re-evaluated under the same conditions used for Kimi K2.6 and are marked with an asterisk (`*`). Except where noted with an asterisk, all other results are cited from official reports. 2. **Reasoning Benchmarks** - IMO-AnswerBench scores for GPT-5.4 and Claude 4.6 were obtained from [z.ai/blog/glm-5.1](https://z.ai/blog/glm-5.1). - Humanity's Last Exam (HLE) and other reasoning tasks were evaluated with a maximum generation length of 98,304 tokens. By default, we report results on the HLE full set. For the text-only subset, Kimi K2.6 achieves 36.4% accuracy without tools and 55.5% with tools. 3. **Tool-Augmented / Agentic Tasks** - Kimi K2.6 was equipped with search, code-interpreter, and web-browsing tools for HLE with tools, BrowseComp, DeepSearchQA, and WideSearch. - For HLE-Full with tools, the maximum generation length is 262,144 tokens with a per-step limit of 49,152 tokens. We employ a simple context management strategy: once the context window exceeds the threshold, only the most recent round of tool-related messages is retained. - For BrowseComp, we report scores obtained with context management using the same discard-all strategy as Kimi K2.5 and DeepSeek-V3.2. - For DeepSearchQA, no context management was applied to Kimi K2.6 tests, and tasks exceeding the supported context length were directly counted as failed. Scores for Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on DeepSearchQA are cited from the [Claude Opus 4.7 System Card](https://cdn.sanity.io/files/4zrzovbb/website/037f06850df7fbe871e206dad004c3db5fd50340.pdf). - For WideSearch, we report results under the "hide tool result" context management setting. Once the context window exceeds the threshold, only the most recent round of tool-related messages is retained. - The test system prompts are identical to those used in the [Kimi K2.5 technical report](https://arxiv.org/pdf/2602.02276). - Claw Eval was conducted using version 1.1 with max-tokens-per-step = 16384. - For APEX-Agents, we evaluate 452 tasks from the public 480-task release, as done by [Artificial Analysis](https://artificialanalysis.ai/evaluations/apex-agents-aa)(excluding Investment Banking Worlds 244 and 246, which have external runtime dependencies) 4. **Coding Tasks** - Terminal-Bench 2.0 scores were obtained with the default agent framework (Terminus-2) and the provided JSON parser, operating in preserve thinking mode. - For the SWE-Bench series of evaluations (including Verified, Multilingual, and Pro), we used an in-house evaluation framework adapted from SWE-agent. This framework includes a minimal set of tools—bash tool, createfile tool, insert tool, view tool, strreplace tool, and submit tool. - All reported scores for coding tasks are averaged over 10 independent runs. 5. **Vision Benchmarks** - Max-tokens = 98,304, averaged over three runs (avg@3). - Settings with Python tool use max-tokens-per-step = 65,536 and max-steps = 50 for multi-step reasoning. - MMMU-Pro follows the official protocol, preserving input order and prepending images.
## 4. Native INT4 Quantization Kimi-K2.6 adopts the same native int4 quantization method as [Kimi-K2-Thinking](https://huggingface.co/moonshotai/Kimi-K2-Thinking#4-native-int4-quantization). ## 5. Deployment > [!Note] > You can access Kimi-K2.6's API on https://platform.moonshot.ai and we provide OpenAI/Anthropic-compatible API for you. To verify the deployment is correct, we also provide the [Kimi Vendor Verifier](https://kimi.com/blog/kimi-vendor-verifier.html). Currently, Kimi-K2.6 is recommended to run on the following inference engines: * vLLM * SGLang * KTransformers Kimi-K2.6 has the same architecture as Kimi-K2.5, and the deployment method can be directly reused. The version requirement for `transformers` is `>=4.57.1, <5.0.0`. Deployment examples can be found in the [Model Deployment Guide](docs/deploy_guidance.md). --- ## 6. Model Usage The usage demos below demonstrate how to call our official API. For third-party APIs deployed with vLLM or SGLang, please note that: > [!Note] > - Chat with video content is an experimental feature and is only supported in our official API for now. > > - The recommended `temperature` will be `1.0` for Thinking mode and `0.6` for Instant mode. > > - The recommended `top_p` is `0.95`. > > - To use instant mode, you need to pass `{'chat_template_kwargs': {"thinking": False}}` in `extra_body`. ### Chat Completion This is a simple chat completion script which shows how to call K2.6 API in Thinking and Instant modes. ```python import openai import base64 import requests def simple_chat(client: openai.OpenAI, model_name: str): messages = [ {'role': 'system', 'content': 'You are Kimi, an AI assistant created by Moonshot AI.'}, { 'role': 'user', 'content': [ {'type': 'text', 'text': 'which one is bigger, 9.11 or 9.9? think carefully.'} ], }, ] response = client.chat.completions.create( model=model_name, messages=messages, stream=False, max_tokens=4096 ) print('====== Below is reasoning content in Thinking Mode ======') print(f'reasoning content: {response.choices[0].message.reasoning}') print('====== Below is response in Thinking Mode ======') print(f'response: {response.choices[0].message.content}') # To use instant mode, pass {"thinking" = {"type":"disabled"}} response = client.chat.completions.create( model=model_name, messages=messages, stream=False, max_tokens=4096, extra_body={'thinking': {'type': 'disabled'}}, # this is for official API # extra_body= {'chat_template_kwargs': {"thinking": False}} # this is for vLLM/SGLang ) print('====== Below is response in Instant Mode ======') print(f'response: {response.choices[0].message.content}') ``` ### Chat Completion with visual content K2.6 supports Image and Video input. The following example demonstrates how to call K2.6 API with image input: ```python import openai import base64 import requests def chat_with_image(client: openai.OpenAI, model_name: str): url = 'https://huggingface.co/moonshotai/Kimi-K2.6/resolve/main/figures/kimi-logo.png' image_base64 = base64.b64encode(requests.get(url).content).decode() messages = [ { 'role': 'user', 'content': [ {'type': 'text', 'text': 'Describe this image in detail.'}, { 'type': 'image_url', 'image_url': {'url': f'data:image/png;base64, {image_base64}'}, }, ], } ] response = client.chat.completions.create( model=model_name, messages=messages, stream=False, max_tokens=8192 ) print('====== Below is reasoning content in Thinking Mode ======') print(f'reasoning content: {response.choices[0].message.reasoning}') print('====== Below is response in Thinking Mode ======') print(f'response: {response.choices[0].message.content}') # Also support instant mode if you pass {"thinking" = {"type":"disabled"}} response = client.chat.completions.create( model=model_name, messages=messages, stream=False, max_tokens=4096, extra_body={'thinking': {'type': 'disabled'}}, # this is for official API # extra_body= {'chat_template_kwargs': {"thinking": False}} # this is for vLLM/SGLang ) print('====== Below is response in Instant Mode ======') print(f'response: {response.choices[0].message.content}') return response.choices[0].message.content ``` The following example demonstrates how to call K2.6 API with video input: ```python import openai import base64 import requests def chat_with_video(client: openai.OpenAI, model_name:str): url = 'https://huggingface.co/moonshotai/Kimi-K2.6/resolve/main/figures/demo_video.mp4' video_base64 = base64.b64encode(requests.get(url).content).decode() messages = [ { "role": "user", "content": [ {"type": "text","text": "Describe the video in detail."}, { "type": "video_url", "video_url": {"url": f"data:video/mp4;base64,{video_base64}"}, }, ], } ] response = client.chat.completions.create(model=model_name, messages=messages) print('====== Below is reasoning content in Thinking Mode ======') print(f'reasoning content: {response.choices[0].message.reasoning}') print('====== Below is response in Thinking Mode ======') print(f'response: {response.choices[0].message.content}') # Also support instant mode if pass {"thinking" = {"type":"disabled"}} response = client.chat.completions.create( model=model_name, messages=messages, stream=False, max_tokens=4096, extra_body={'thinking': {'type': 'disabled'}}, # this is for official API # extra_body= {'chat_template_kwargs': {"thinking": False}} # this is for vLLM/SGLang ) print('====== Below is response in Instant Mode ======') print(f'response: {response.choices[0].message.content}') return response.choices[0].message.content ``` ### Preserve Thinking Kimi K2.6 supports `preserve_thinking` mode, which retains full reasoning content across multi-turn interactions and enhances performance in coding agent scenarios. This feature is disabled by default. The following example demonstrates how to call K2.6 API in `preserve_thinking` mode: ```python def chat_with_preserve_thinking(client: openai.OpenAI, model_name: str): messages = [ { "role": "user", "content": "Tell me three random numbers." }, { "role": "assistant", "reasoning_content": "I'll start by listing five numbers: 473, 921, 235, 215, 222, and I'll tell you the first three.", "content": "473, 921, 235" }, { "role": "user", "content": "What are the other two numbers you have in mind?" } ] response = client.chat.completions.create( model=model_name, messages=messages, stream=False, max_tokens=4096, extra_body={'thinking': {'type': 'enabled', keep: 'all'}}, # this is for official API # extra_body={"chat_template_kwargs": {"thinking":True, "preserve_thinking": True}}, # this is for vLLM/SGLang # We recommend enabling preserve_thinking only in think mode. ) # the assistant should mention 215 and 222 that appear in the prior reasoning content print(f"response: {response.choices[0].message.reasoning}") return response.choices[0].message.content ``` ### Interleaved Thinking and Multi-Step Tool Call K2.6 shares the same design of Interleaved Thinking and Multi-Step Tool Call as K2 Thinking. For usage example, please refer to the [K2 Thinking documentation](https://platform.moonshot.ai/docs/guide/use-kimi-k2-thinking-model#complete-example). ### Coding Agent Framework Kimi K2.6 works best with Kimi Code CLI as its agent framework — give it a try at https://www.kimi.com/code. --- ## 7. License Both the code repository and the model weights are released under the [Modified MIT License](LICENSE). --- ## 8. Third Party Notices See [THIRD PARTY NOTICES](THIRD_PARTY_NOTICES.md) --- ## 9. Contact Us If you have any questions, please reach out at [support@moonshot.ai](mailto:support@moonshot.ai).