Instructions to use CohereLabs/command-a-plus-05-2026-fp8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CohereLabs/command-a-plus-05-2026-fp8 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="CohereLabs/command-a-plus-05-2026-fp8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("CohereLabs/command-a-plus-05-2026-fp8")
model = AutoModelForImageTextToText.from_pretrained("CohereLabs/command-a-plus-05-2026-fp8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use CohereLabs/command-a-plus-05-2026-fp8 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CohereLabs/command-a-plus-05-2026-fp8"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CohereLabs/command-a-plus-05-2026-fp8",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/CohereLabs/command-a-plus-05-2026-fp8

SGLang

How to use CohereLabs/command-a-plus-05-2026-fp8 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CohereLabs/command-a-plus-05-2026-fp8" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CohereLabs/command-a-plus-05-2026-fp8",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CohereLabs/command-a-plus-05-2026-fp8" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CohereLabs/command-a-plus-05-2026-fp8",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use CohereLabs/command-a-plus-05-2026-fp8 with Docker Model Runner:
```
docker model run hf.co/CohereLabs/command-a-plus-05-2026-fp8
```

alexrs commited on about 20 hours ago

Commit

780fa22

verified ·

1 Parent(s): f2360bf

Update README via Huggy

Browse files

Files changed (1) hide show

README.md +13 -17

README.md CHANGED Viewed

@@ -67,17 +67,16 @@ Command A+ is an open source model with 25 billion active parameters and 218B to
 Developed by: [Cohere](https://cohere.com/) and [Cohere Labs](https://cohere.com/research)
-* Point of Contact: [**Cohere Labs**](https://cohere.com/research)
-* License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
-* Model: command-a-plus-05-2026
-* Model Size: 25B active parameters, 218B total parameters
 * Context length: 128K input
-For more details about this model, please check out our [blog post](http://cohere.com/blog/command-a-plus).
 You can try out Command A+ before downloading the weights in our hosted [Hugging Face Space](https://huggingface.co/spaces/CohereLabs/command-a-plus-05-2026).
 **Available quantizations**
 The following quantizations are available with example minimum GPU requirements
@@ -90,8 +89,7 @@ The following quantizations are available with example minimum GPU requirements
 All three quantizations show negligible differences in benchmark quality and performance. **Our recommended quantization for most uses is [W4A4](https://huggingface.co/CohereLabs/command-a-plus-05-2026-w4a4) which boasts superior speed and latency characteristics alongside a smaller hardware footprint.**
-For more details, please check out our [blog post](http://cohere.com/blog/command-a-plus).
 **Usage**
@@ -117,9 +115,9 @@ input_ids = tokenizer.apply_chat_template(
 )
 gen_tokens = model.generate(
-    input_ids,
-    max_new_tokens=4096,
-    do_sample=True,
     temperature=0.6,
     top_p=0.95
 )
@@ -171,10 +169,10 @@ print(outputs[0]["generated_text"][-1])
 **vLLM**
-You can also run the model in vLLM. `vllm>=0.21.0` is required for Command A+ and accurate response parsing also requires installing [Cohere’s `melody` library](https://pypi.org/project/cohere-melody/).
 ```
-uv pip install vllm>=0.21.0
 uv pip install transformers uv pip install cohere_melody>=0.9.0
 ```
@@ -188,9 +186,9 @@ Then the vllm server can be started with the following command:
 **Input**: Text and images.
-**Output**: Model generates text.
-**Model Architecture**: Command A+ is a decoder-only Sparse Mixture-of-Experts Transformer Model. With 25B active parameters and 218B total parameters, it has 128 experts, out of which 8 are active per token, and a single shared expert is applied to all tokens. The attention layers interleave sliding-window attention layers with Rotational Positional Embeddings and global attention layers without positional embeddings in a 3:1 ratio, as first introduced in Command A. The sparse MoE layer is trained in a fully dropless manner and uses a token-choice router. We use additive-bias-based load balancing to encourage balanced token load across all experts, and swap out the softmax router activation function with a normalized sigmoid over the topk expert logits per token.
 **Languages covered:** The model has been trained on 48 languages: English, Arabic, Bulgarian, Bengali, Catalan, Czech, Danish, German, Greek, Spanish, Estonian, Persian, Finnish, Filipino, French, Irish, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Icelandic, Italian, Japanese, Korean, Lithuanian, Latvian, Malay, Maltese, Dutch, Norwegian, Punjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Serbian, Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Chinese.
@@ -289,5 +287,3 @@ For errors or additional questions about details in this model card, contact \[[
 **Try it now:**
 You can try Command A+ in the [playground](https://dashboard.cohere.com/playground/chat?model=command-a-plus-05-2026). You can also use it in our dedicated [Hugging Face Space](https://huggingface.co/spaces/CohereLabs/command-a-plus-05-2026).

 Developed by: [Cohere](https://cohere.com/) and [Cohere Labs](https://cohere.com/research)
+* Point of Contact: [**Cohere Labs**](https://cohere.com/research)
+* License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+* Model: command-a-plus-05-2026
+* Model Size: 25B active parameters, 218B total parameters
 * Context length: 128K input
+For more details about this model, please check out our [blog post](http://cohere.com/blog/command-a-plus).
 You can try out Command A+ before downloading the weights in our hosted [Hugging Face Space](https://huggingface.co/spaces/CohereLabs/command-a-plus-05-2026).
 **Available quantizations**
 The following quantizations are available with example minimum GPU requirements
 All three quantizations show negligible differences in benchmark quality and performance. **Our recommended quantization for most uses is [W4A4](https://huggingface.co/CohereLabs/command-a-plus-05-2026-w4a4) which boasts superior speed and latency characteristics alongside a smaller hardware footprint.**
+For more details, please check out our [blog post](http://cohere.com/blog/command-a-plus).
 **Usage**
 )
 gen_tokens = model.generate(
+    input_ids,
+    max_new_tokens=4096,
+    do_sample=True,
     temperature=0.6,
     top_p=0.95
 )
 **vLLM**
+You can also run the model in vLLM. `vllm>=0.21.0` is required for Command A+ and accurate response parsing also requires installing [Cohere’s `melody` library](https://pypi.org/project/cohere-melody/).
 ```
+uv pip install vllm>=0.21.0
 uv pip install transformers uv pip install cohere_melody>=0.9.0
 ```
 **Input**: Text and images.
+**Output**: Model generates text.
+**Model Architecture**: Command A+ is a decoder-only Sparse Mixture-of-Experts Transformer Model. With 25B active parameters and 218B total parameters, it has 128 experts, out of which 8 are active per token, and a single shared expert is applied to all tokens. The attention layers interleave sliding-window attention layers with Rotational Positional Embeddings and global attention layers without positional embeddings in a 3:1 ratio, as first introduced in Command A. The sparse MoE layer is trained in a fully dropless manner and uses a token-choice router. We use additive-bias-based load balancing to encourage balanced token load across all experts, and swap out the softmax router activation function with a normalized sigmoid over the topk expert logits per token.
 **Languages covered:** The model has been trained on 48 languages: English, Arabic, Bulgarian, Bengali, Catalan, Czech, Danish, German, Greek, Spanish, Estonian, Persian, Finnish, Filipino, French, Irish, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Icelandic, Italian, Japanese, Korean, Lithuanian, Latvian, Malay, Maltese, Dutch, Norwegian, Punjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Serbian, Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Chinese.
 **Try it now:**
 You can try Command A+ in the [playground](https://dashboard.cohere.com/playground/chat?model=command-a-plus-05-2026). You can also use it in our dedicated [Hugging Face Space](https://huggingface.co/spaces/CohereLabs/command-a-plus-05-2026).