Instructions to use kai-os/Carnice-V2-27b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kai-os/Carnice-V2-27b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="kai-os/Carnice-V2-27b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("kai-os/Carnice-V2-27b")
model = AutoModelForImageTextToText.from_pretrained("kai-os/Carnice-V2-27b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use kai-os/Carnice-V2-27b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kai-os/Carnice-V2-27b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kai-os/Carnice-V2-27b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/kai-os/Carnice-V2-27b

SGLang

How to use kai-os/Carnice-V2-27b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "kai-os/Carnice-V2-27b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kai-os/Carnice-V2-27b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "kai-os/Carnice-V2-27b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kai-os/Carnice-V2-27b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use kai-os/Carnice-V2-27b with Docker Model Runner:
```
docker model run hf.co/kai-os/Carnice-V2-27b
```

Regression on benches

by selimaktas - opened 12 days ago

Discussion

selimaktas

12 days ago

Hello!
I benched this model on SWE-Bench Verified (no tools, single shot) and BFCL v4 (multiturn subset), it shows regression on both. May have been the cause of unstable/high LR training.
Hope to see better versions in the future!

sakamakismile

11 days ago

Hi @selimaktas — appreciate the bench report. One adjacent data point that might help triangulate:

We rebuilt this checkpoint into an NVFP4 + MTP variant (sakamakismile/Carnice-V2-27b-NVFP4-TEXT-MTP) on the prefix-fixed BF16 source, and a 5-task agent capability suite (single tool call, multi-turn continuation, final synthesis, 3-way parallel tool calls, reasoning chain) all pass on RTX PRO 6000 + vLLM 0.19.1rc1:

Tool-call JSON arguments parse cleanly
<think> reasoning is correctly separated from final answer
Per-position MTP acceptance 0.934 (mean accept length 1.93/2.0 at n=1, ~3.0/4.0 at n=3)
Long-form decode lands ~103 tok/s at n=3

That doesn't refute SWE-Bench Verified / BFCL v4 regressions — they exercise much more agentic surface than five-prompt smoke tests — but it suggests the BF16 weights themselves are at least functionally healthy, and the bug Discussion #1 reported (triple language_model. prefix) is fully resolved in the current upload (we re-checked with safe_open after kai-os's fix).

Two things that bit us during evaluation that are worth ruling out on your side, since they look like accuracy regressions if you don't know to set them:

--reasoning-parser qwen3 — without this, <think>...</think> chains land in the chat content the harness reads as the "final answer", which destroys agentic eval scores even on a perfectly-tuned model.
--tool-call-parser qwen3_xml (not hermes) — Carnice emits OpenAI-style function XML (<tool_call><function=name>…</function></tool_call>), not canonical Hermes JSON-in-tags, so the hermes parser leaves tool_calls=[] and the harness scores 0 on every tool turn.

If either of those was off in your eval setup, that alone would explain a huge gap. Worth double-checking before chalking it up to LR instability.

— Tonoken3 / Lna-Lab (sakamakismile)

selimaktas

11 days ago

Hello,
Thank you for your feedback! I might re-bench it if things are different, but the bench was taken properly. With vLLM, reasoning parser qwen3, and correct tool parser, not hermes, on nightly vLLM wheel like you are using. With wrong parser the results would be directly 0.

selimaktas

11 days ago

Additional notes:
The model is just regressed, not incapacitated, which isn't something that can be told apart easily in a 5-prompt test.
Testing was done on Evalscope, with vLLM for scoring, using the weights provided at the date the comment was posted. No MTP or any speculative decoding was used.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment