File size: 9,150 Bytes
fcac823
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28c3cbd
 
 
fd0b446
 
fcac823
e93b39c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fcac823
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cf59633
 
 
 
 
 
 
 
 
 
fcac823
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b99f760
 
78f93f3
 
 
 
b99f760
 
 
 
 
 
 
 
 
78f93f3
b99f760
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
---
base_model:
- nvidia/Cosmos-Reason2-8B
tags:
- nvidia
- cosmos
- cosmos-reason2
- multimodal
- vlm
- quantized
- flashhead
- qwen3_vl
pipeline_tag: image-text-to-text
license: other
license_name: embedl-models-community-licence-1.0
license_link: https://github.com/embedl/embedl-models/blob/main/LICENSE
extra_gated_prompt: >-
  The information you provide will be collected, stored, processed and shared in accordance
  with the [Embedl Privacy Policy](https://www.embedl.com/privacy-policy).
extra_gated_fields:
  Company: text
---
<!-- embedl-banner:start -->
<style>
.embedl-btn-primary { transition: background 160ms ease, box-shadow 160ms ease; }
.embedl-btn-primary:hover { background: #4FDCE4 !important; box-shadow: 0 8px 22px rgba(45,212,221,0.45) !important; }
.embedl-btn-secondary { transition: background 160ms ease; }
.embedl-btn-secondary:hover { background: rgba(45,212,221,0.15) !important; }
.embedl-headline { font-size: clamp(11px, 2.15vw, 15px) !important; }
.embedl-btn-primary, .embedl-btn-secondary {
  font-size: clamp(11px, 1.65vw, 13px) !important;
  padding: clamp(6px, 1.1vw, 9px) clamp(10px, 1.6vw, 14px) !important;
}
</style>
<div style="background:radial-gradient(600px 220px at 0% 50%,rgba(45,212,221,0.22) 0%,rgba(45,212,221,0) 60%),radial-gradient(400px 180px at 100% 100%,rgba(45,212,221,0.10) 0%,rgba(45,212,221,0) 55%),linear-gradient(135deg,#0B1626 0%,#142338 100%);border:1px solid rgba(45,212,221,0.28);border-radius:12px;padding:22px 24px;margin:0 0 24px 0;color:#F2F6FA;box-shadow:0 4px 16px rgba(11,22,38,0.18);overflow:hidden;box-sizing:border-box;max-width:100%;">
  <table style="width:100%;border-collapse:collapse;border:0;background:transparent;">
    <tr style="background:transparent;">
      <td style="vertical-align:middle;border:0;padding:0;background:transparent;">
        <div style="display:inline-block;font-size:10px;letter-spacing:0.08em;text-transform:uppercase;font-weight:700;color:#2DD4DD;background:rgba(45,212,221,0.15);border:1px solid rgba(45,212,221,0.35);padding:4px 10px;border-radius:999px;margin-bottom:10px;white-space:nowrap;">Optimized by Embedl</div>
        <div class="embedl-headline" style="font-size:15px;font-weight:700;line-height:1.35;color:#F2F6FA;margin-bottom:4px;">Need to <span style="color:#2DD4DD;white-space:nowrap;">fine-tune</span>, hit <span style="color:#2DD4DD;white-space:nowrap;">performance targets</span>, or deploy on <span style="color:#2DD4DD;white-space:nowrap;">specific hardware</span>?</div>
        <div style="font-size:13px;color:#9BA7B5;">We've got you covered.</div>
      </td>
      <td width="1%" style="vertical-align:middle;border:0;padding:0 0 0 18px;white-space:nowrap;text-align:right;background:transparent;">
        <a href="https://www.embedl.com/models" class="embedl-btn-secondary" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #2DD4DD;color:#2DD4DD;text-decoration:none;margin-right:8px;">Learn more</a>
        <a href="https://www.embedl.com/contact" class="embedl-btn-primary" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #2DD4DD;background:#2DD4DD;color:#0B1626;text-decoration:none;box-shadow:0 6px 18px rgba(45,212,221,0.28);">Get in touch →</a>
      </td>
    </tr>
  </table>
</div>
<!-- embedl-banner:end -->

# Cosmos-Reason2-8B-W4A16-FlashHead

[![GitHub](https://img.shields.io/badge/GitHub-flash--head-black?logo=github)](https://github.com/embedl/flash-head)

**Optimized version of [nvidia/Cosmos-Reason2-8B](https://huggingface.co/nvidia/Cosmos-Reason2-8B) using quantization and FlashHead, Embedl's efficient replacement for the language model head.**

Designed for **low-latency inference** on **NVIDIA GPUs**, leveraging:

- FlashHead
- Quantization (W4A16)
- vLLM plugin via [`flash-head`](https://github.com/embedl/flash-head)

---

## Model Details

| **Field** | **Value** |
|---|---|
| **Base Model** | [nvidia/Cosmos-Reason2-8B](https://huggingface.co/nvidia/Cosmos-Reason2-8B) |
| **Input / Output** | Text + Image / Video -> Text |
| **Optimizations** | FlashHead LM Head + Quantization (W4A16) |
| **Developers** | Embedl |
| **Licenses** | Upstream: [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license). <br>Optimized components: Embedl Models Community Licence v1.0 *(no redistribution)* |

---

## Benchmarks

Accuracy and on-device latency benchmarks can be explored on [embedl/Edge-Inference-Benchmarks](https://huggingface.co/spaces/embedl/Edge-Inference-Benchmarks).

<a href="https://huggingface.co/spaces/embedl/Edge-Inference-Benchmarks" target="_blank" rel="noopener">
  <img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/Edge-Inference-Benchmarks/screenshot.png" alt="Screenshot Edge Inference Benchmarks" width="75%">
</a>

---

## Installation

```bash
pip install flash-head
```

The [`flash-head`](https://github.com/embedl/flash-head) vLLM plugin is required. It activates automatically at startup.

---

## Usage Examples

### vLLM Serve

```bash
vllm serve embedl/Cosmos-Reason2-8B-W4A16-FlashHead \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.75
```

### vLLM Video Inference

```python
from vllm import LLM, SamplingParams

if __name__ == "__main__":
    model = "embedl/Cosmos-Reason2-8B-W4A16-FlashHead"
    video_url = "https://nvidia-cosmos.github.io/cosmos-cookbook/gallery/vs_assets/clip_1_short.mp4"

    messages = [
        {
            "role": "system",
            "content": [{"type": "text", "text": "You are a helpful assistant."}],
        },
        {
            "role": "user",
            "content": [
                {"type": "video_url", "video_url": {"url": video_url, "fps": 4}},
                {"type": "text", "text": "Describe this video in detail."},
            ],
        },
    ]

    llm = LLM(
        model=model,
        limit_mm_per_prompt={
            "video": {"count": 1, "num_frames": 12, "width": 1280, "height": 720},
            "image": 0,
            "audio": 0,
        },
        media_io_kwargs={"video": {"num_frames": -1}},
        max_model_len=8192,
        mm_processor_kwargs={"truncation": False},
        gpu_memory_utilization=0.75,
        trust_remote_code=True,
    )

    output = llm.chat(messages, sampling_params=SamplingParams(temperature=0.0, max_tokens=256))
    print(output[0].outputs[0].text)
```

---

## License

- **Upstream:** [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license)
- **Optimized Components:** Embedl Models Community Licence v1.0 *(no redistribution)*

---

## Contact

- Enterprise and Commercial Inquiries: `models@embedl.com`
- Technical Issues and Early Access: [`https://github.com/embedl/flash-head`](https://github.com/embedl/flash-head)
- More Information and Model Releases: `https://embedl.com`

<!-- embedl-discord-banner:start -->
<style>
.embedl-discord-btn { transition: background 160ms ease, box-shadow 160ms ease; }
.embedl-discord-btn:hover { background: #6C77F5 !important; box-shadow: 0 8px 22px rgba(88,101,242,0.55) !important; }
</style>
<div style="background:radial-gradient(600px 220px at 0% 50%,rgba(88,101,242,0.22) 0%,rgba(88,101,242,0) 60%),radial-gradient(400px 180px at 100% 100%,rgba(88,101,242,0.10) 0%,rgba(88,101,242,0) 55%),linear-gradient(135deg,#0B1626 0%,#142338 100%);border:1px solid rgba(88,101,242,0.35);border-radius:12px;padding:22px 24px;margin:24px 0 0 0;color:#F2F6FA;box-shadow:0 4px 16px rgba(11,22,38,0.18);overflow:hidden;box-sizing:border-box;max-width:100%;">
  <table style="width:100%;border-collapse:collapse;border:0;background:transparent;">
    <tr style="background:transparent;">
      <td style="vertical-align:middle;border:0;padding:0;background:transparent;">
        <div style="display:inline-block;font-size:10px;letter-spacing:0.08em;text-transform:uppercase;font-weight:700;color:#A5B4FC;background:rgba(88,101,242,0.18);border:1px solid rgba(88,101,242,0.45);padding:4px 10px;border-radius:999px;margin-bottom:10px;white-space:nowrap;">Community &amp; support</div>
        <div style="font-size:15px;font-weight:700;line-height:1.35;color:#F2F6FA;margin-bottom:4px;">Need help with this model? Chat with the Embedl team and other engineers on <span style="color:#A5B4FC;white-space:nowrap;">Discord</span>.</div>
        <div style="font-size:13px;color:#9BA7B5;">Quantization gotchas, hardware questions, fine-tuning tips — bring them all.</div>
      </td>
      <td width="1%" style="vertical-align:middle;border:0;padding:0 0 0 18px;white-space:nowrap;text-align:right;background:transparent;">
        <a href="https://discord.gg/MTbMWdKqE" class="embedl-discord-btn" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #5865F2;background:#5865F2;color:#FFFFFF;text-decoration:none;box-shadow:0 6px 18px rgba(88,101,242,0.35);">Join our Discord →</a>
      </td>
    </tr>
  </table>
</div>
<!-- embedl-discord-banner:end -->