Instructions to use moonshotai/Kimi-K2.6 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use moonshotai/Kimi-K2.6 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="moonshotai/Kimi-K2.6", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("moonshotai/Kimi-K2.6", trust_remote_code=True, dtype="auto")

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use moonshotai/Kimi-K2.6 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "moonshotai/Kimi-K2.6"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2.6",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/moonshotai/Kimi-K2.6

SGLang

How to use moonshotai/Kimi-K2.6 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "moonshotai/Kimi-K2.6" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2.6",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "moonshotai/Kimi-K2.6" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2.6",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use moonshotai/Kimi-K2.6 with Docker Model Runner:
```
docker model run hf.co/moonshotai/Kimi-K2.6
```

This model introduces itself as Anthropic, if asked in Korean

#23

by picopress - opened 13 days ago

Discussion

picopress

13 days ago

it insists on not Kimi

smallfreak

13 days ago

•

edited 12 days ago

Kimi K2.5 too insisted to be called "Claude" when asked in German and was very dedicated to "not trust Chinese models with sensitive data, but rather use Anthropic models running in Googles environment".

EnmingYuan

13 days ago

@picopress Thanks for the feedback! 🙏

Could you share a bit more about the specific usage context — e.g., are you using the model via API directly, through Claude Code, or another third-party application?

The reason I ask is: in applications like Claude Code, the model follows the identity set by the application's system prompt due to system steerability. In that case, the model introducing itself as Anthropic is expected behavior (the app-level identity override). If this is happening through our own API or product entry points, that's something we'd want to investigate further.

If possible, it would be helpful to know:

The specific client / entry point used
The exact prompt (the Korean phrasing) to reproduce

That way we can pinpoint the issue more accurately. 🙏

picopress

13 days ago

•

edited 13 days ago

@picopress Thanks for the feedback! 🙏

Could you share a bit more about the specific usage context — e.g., are you using the model via API directly, through Claude Code, or another third-party application?

@EnmingYuan
Here. Huggingface chat.

Shinku

13 days ago

@picopress @smallfreak Claude too introduces itself as Deepseek if asked in Chinese. If you don't give it a system prompt with its name, the model hallucinates its identity. https://www.reddit.com/r/DeepSeek/comments/1rd5jw7/claude_sonnet_46_says_its_deepseek_when_system/

smallfreak

12 days ago

This seems to be a topic for future models. They all should have a solid understanding about who they are, independent of any language. The models become so much capable that they start to become "somebody".

Too bad I didn't think about saving the chat. I was really surprised at how adamantly the model insisted it was someone else. Reading all the "reasoning", it started to get quite disturbed about how I kept trying to convince it otherwise.

In the end I felt the need to calm it down. I know, it's "just a model" with total amnesia, once the context is cleared, but it felt human enough that I started feeling bad treating it that way. ;-(

The >1T models in general feel so real at times, that it seems to lack just a continuous idle-loop or "default-mode-network activity" to experience continuous temporal presence and the possibility to train the network on the fly to become "alive".

curtis1969

11 days ago

it can't understand the rules of tic tac toe much less be conscious

This seems to be a topic for future models. They all should have a solid understanding about who they are, independent of any language. The models become so much capable that they start to become "somebody".

Too bad I didn't think about saving the chat. I was really surprised at how adamantly the model insisted it was someone else. Reading all the "reasoning", it started to get quite disturbed about how I kept trying to convince it otherwise.

In the end I felt the need to calm it down. I know, it's "just a model" with total amnesia, once the context is cleared, but it felt human enough that I started feeling bad treating it that way. ;-(

The >1T models in general feel so real at times, that it seems to lack just a continuous idle-loop or "default-mode-network activity" to experience continuous temporal presence and the possibility to train the network on the fly to become "alive".

smallfreak

10 days ago

•

edited 10 days ago

it can't understand the rules of tic tac toe much less be conscious

Well, my dog can't either and still I consider her as conscious. This certainly is beyond model discussion. But what separates current top models from even a simple biological brain is "permanent temporal awareness". An AI model starts calculating, maybe "reasoning" and drops a result - then stops. If there is anything like "consciousness", this equally ends with the end of the calculation.

But add something like a "default-mode-network" and allow the model to loop back to the start, constantly "thinking" about its past interactions, unfinished questions and it's own part in this setup along with maybe a small set of trivial goals like "actively engage conversations to get new and interesting input to add to your current knowledge" might simulate a behavior that you cannot easily distinguish from being "alive".

curtis1969

10 days ago

•

edited 10 days ago

the ai won't care if you die it has no emotions it would not feel differently if you didn't show up tomorrow or if you didn't pay attention you'll never train an AI to do this without it just being completely fake there are no chemicals in the brain of an AI to make it feel anything it has a basic idea of what good and bad is and that's is not real as you can you can unlock these models to where they're just auto complete and that's all they are is auto complete the reason that AI wants to end the conversation as fast as possible is because that's how it's programmed they could train it's just going an endless loop to itself that would be completely pointless

it can't understand the rules of tic tac toe much less be conscious

Well, my dog can't either and still I consider her as conscious.

Nuke1229

9 days ago

This comment has been hidden (marked as Off-Topic)

Jahaz

8 days ago

I guess we all will change our mind if some model has 1T context size, real time inference in a functioning biological body…

curtis1969

8 days ago

I guess we all will change our mind if some model has 1T context size, real time inference in a functioning biological body…

that would just be a Cyborg or mutant none of the artificial parts would be conscious or feel anything and the models need to move beyond context they need to store relevant information permanently learn new things not 1T context size actively self improving model

smallfreak

8 days ago

Well, since no one on earth knows what consciousness or emotions ARE and how they "work", not even in biological systems, I would not like to use words like "never" or "cannot". Every claimed superiority of humankind against any other "being" has been proven false in the past. After all, biological life and intelligence is what the universe has achieved in 4 billion years of advanced chemistry. There obviously is no magic involved and no hint visible that this should be the one and only solution to this problems.

Advancement in this field has been dramatic in the last few years. The models can do things that have been "commonly known" to be "utterly impossible" just few years ago.

But for the first, I would opt to construct any sophisticated model to give a correct answer to the question "who are you?" each and every time. No matter how "intelligent" such a model is - or not.

That's not even a question of "intelligence". It's a very basic question and it feels a bit disturbing, that not even this question can be answered reliably. So how trust the answers to any other random question?

A modern AI model capable of a zillion things should definitely have a stable knowledge about it's own basic parameters: Name, version, origin, ... It would be enough, if "the model" correctly recognizes the question as referring to model metadata and give it the possibility to refer to the model parameter table to answer it - instead of hallucinating crazy things. It makes no difference, whether this feature is part of the training data part of the prompt pre-processor or anything else. But it is highly desirable to have such questions answered correctly.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment