File size: 9,150 Bytes
fcac823 28c3cbd fd0b446 fcac823 e93b39c fcac823 cf59633 fcac823 b99f760 78f93f3 b99f760 78f93f3 b99f760 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 | ---
base_model:
- nvidia/Cosmos-Reason2-8B
tags:
- nvidia
- cosmos
- cosmos-reason2
- multimodal
- vlm
- quantized
- flashhead
- qwen3_vl
pipeline_tag: image-text-to-text
license: other
license_name: embedl-models-community-licence-1.0
license_link: https://github.com/embedl/embedl-models/blob/main/LICENSE
extra_gated_prompt: >-
The information you provide will be collected, stored, processed and shared in accordance
with the [Embedl Privacy Policy](https://www.embedl.com/privacy-policy).
extra_gated_fields:
Company: text
---
<!-- embedl-banner:start -->
<style>
.embedl-btn-primary { transition: background 160ms ease, box-shadow 160ms ease; }
.embedl-btn-primary:hover { background: #4FDCE4 !important; box-shadow: 0 8px 22px rgba(45,212,221,0.45) !important; }
.embedl-btn-secondary { transition: background 160ms ease; }
.embedl-btn-secondary:hover { background: rgba(45,212,221,0.15) !important; }
.embedl-headline { font-size: clamp(11px, 2.15vw, 15px) !important; }
.embedl-btn-primary, .embedl-btn-secondary {
font-size: clamp(11px, 1.65vw, 13px) !important;
padding: clamp(6px, 1.1vw, 9px) clamp(10px, 1.6vw, 14px) !important;
}
</style>
<div style="background:radial-gradient(600px 220px at 0% 50%,rgba(45,212,221,0.22) 0%,rgba(45,212,221,0) 60%),radial-gradient(400px 180px at 100% 100%,rgba(45,212,221,0.10) 0%,rgba(45,212,221,0) 55%),linear-gradient(135deg,#0B1626 0%,#142338 100%);border:1px solid rgba(45,212,221,0.28);border-radius:12px;padding:22px 24px;margin:0 0 24px 0;color:#F2F6FA;box-shadow:0 4px 16px rgba(11,22,38,0.18);overflow:hidden;box-sizing:border-box;max-width:100%;">
<table style="width:100%;border-collapse:collapse;border:0;background:transparent;">
<tr style="background:transparent;">
<td style="vertical-align:middle;border:0;padding:0;background:transparent;">
<div style="display:inline-block;font-size:10px;letter-spacing:0.08em;text-transform:uppercase;font-weight:700;color:#2DD4DD;background:rgba(45,212,221,0.15);border:1px solid rgba(45,212,221,0.35);padding:4px 10px;border-radius:999px;margin-bottom:10px;white-space:nowrap;">Optimized by Embedl</div>
<div class="embedl-headline" style="font-size:15px;font-weight:700;line-height:1.35;color:#F2F6FA;margin-bottom:4px;">Need to <span style="color:#2DD4DD;white-space:nowrap;">fine-tune</span>, hit <span style="color:#2DD4DD;white-space:nowrap;">performance targets</span>, or deploy on <span style="color:#2DD4DD;white-space:nowrap;">specific hardware</span>?</div>
<div style="font-size:13px;color:#9BA7B5;">We've got you covered.</div>
</td>
<td width="1%" style="vertical-align:middle;border:0;padding:0 0 0 18px;white-space:nowrap;text-align:right;background:transparent;">
<a href="https://www.embedl.com/models" class="embedl-btn-secondary" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #2DD4DD;color:#2DD4DD;text-decoration:none;margin-right:8px;">Learn more</a>
<a href="https://www.embedl.com/contact" class="embedl-btn-primary" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #2DD4DD;background:#2DD4DD;color:#0B1626;text-decoration:none;box-shadow:0 6px 18px rgba(45,212,221,0.28);">Get in touch →</a>
</td>
</tr>
</table>
</div>
<!-- embedl-banner:end -->
# Cosmos-Reason2-8B-W4A16-FlashHead
[](https://github.com/embedl/flash-head)
**Optimized version of [nvidia/Cosmos-Reason2-8B](https://huggingface.co/nvidia/Cosmos-Reason2-8B) using quantization and FlashHead, Embedl's efficient replacement for the language model head.**
Designed for **low-latency inference** on **NVIDIA GPUs**, leveraging:
- FlashHead
- Quantization (W4A16)
- vLLM plugin via [`flash-head`](https://github.com/embedl/flash-head)
---
## Model Details
| **Field** | **Value** |
|---|---|
| **Base Model** | [nvidia/Cosmos-Reason2-8B](https://huggingface.co/nvidia/Cosmos-Reason2-8B) |
| **Input / Output** | Text + Image / Video -> Text |
| **Optimizations** | FlashHead LM Head + Quantization (W4A16) |
| **Developers** | Embedl |
| **Licenses** | Upstream: [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license). <br>Optimized components: Embedl Models Community Licence v1.0 *(no redistribution)* |
---
## Benchmarks
Accuracy and on-device latency benchmarks can be explored on [embedl/Edge-Inference-Benchmarks](https://huggingface.co/spaces/embedl/Edge-Inference-Benchmarks).
<a href="https://huggingface.co/spaces/embedl/Edge-Inference-Benchmarks" target="_blank" rel="noopener">
<img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/Edge-Inference-Benchmarks/screenshot.png" alt="Screenshot Edge Inference Benchmarks" width="75%">
</a>
---
## Installation
```bash
pip install flash-head
```
The [`flash-head`](https://github.com/embedl/flash-head) vLLM plugin is required. It activates automatically at startup.
---
## Usage Examples
### vLLM Serve
```bash
vllm serve embedl/Cosmos-Reason2-8B-W4A16-FlashHead \
--max-model-len 8192 \
--gpu-memory-utilization 0.75
```
### vLLM Video Inference
```python
from vllm import LLM, SamplingParams
if __name__ == "__main__":
model = "embedl/Cosmos-Reason2-8B-W4A16-FlashHead"
video_url = "https://nvidia-cosmos.github.io/cosmos-cookbook/gallery/vs_assets/clip_1_short.mp4"
messages = [
{
"role": "system",
"content": [{"type": "text", "text": "You are a helpful assistant."}],
},
{
"role": "user",
"content": [
{"type": "video_url", "video_url": {"url": video_url, "fps": 4}},
{"type": "text", "text": "Describe this video in detail."},
],
},
]
llm = LLM(
model=model,
limit_mm_per_prompt={
"video": {"count": 1, "num_frames": 12, "width": 1280, "height": 720},
"image": 0,
"audio": 0,
},
media_io_kwargs={"video": {"num_frames": -1}},
max_model_len=8192,
mm_processor_kwargs={"truncation": False},
gpu_memory_utilization=0.75,
trust_remote_code=True,
)
output = llm.chat(messages, sampling_params=SamplingParams(temperature=0.0, max_tokens=256))
print(output[0].outputs[0].text)
```
---
## License
- **Upstream:** [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license)
- **Optimized Components:** Embedl Models Community Licence v1.0 *(no redistribution)*
---
## Contact
- Enterprise and Commercial Inquiries: `models@embedl.com`
- Technical Issues and Early Access: [`https://github.com/embedl/flash-head`](https://github.com/embedl/flash-head)
- More Information and Model Releases: `https://embedl.com`
<!-- embedl-discord-banner:start -->
<style>
.embedl-discord-btn { transition: background 160ms ease, box-shadow 160ms ease; }
.embedl-discord-btn:hover { background: #6C77F5 !important; box-shadow: 0 8px 22px rgba(88,101,242,0.55) !important; }
</style>
<div style="background:radial-gradient(600px 220px at 0% 50%,rgba(88,101,242,0.22) 0%,rgba(88,101,242,0) 60%),radial-gradient(400px 180px at 100% 100%,rgba(88,101,242,0.10) 0%,rgba(88,101,242,0) 55%),linear-gradient(135deg,#0B1626 0%,#142338 100%);border:1px solid rgba(88,101,242,0.35);border-radius:12px;padding:22px 24px;margin:24px 0 0 0;color:#F2F6FA;box-shadow:0 4px 16px rgba(11,22,38,0.18);overflow:hidden;box-sizing:border-box;max-width:100%;">
<table style="width:100%;border-collapse:collapse;border:0;background:transparent;">
<tr style="background:transparent;">
<td style="vertical-align:middle;border:0;padding:0;background:transparent;">
<div style="display:inline-block;font-size:10px;letter-spacing:0.08em;text-transform:uppercase;font-weight:700;color:#A5B4FC;background:rgba(88,101,242,0.18);border:1px solid rgba(88,101,242,0.45);padding:4px 10px;border-radius:999px;margin-bottom:10px;white-space:nowrap;">Community & support</div>
<div style="font-size:15px;font-weight:700;line-height:1.35;color:#F2F6FA;margin-bottom:4px;">Need help with this model? Chat with the Embedl team and other engineers on <span style="color:#A5B4FC;white-space:nowrap;">Discord</span>.</div>
<div style="font-size:13px;color:#9BA7B5;">Quantization gotchas, hardware questions, fine-tuning tips — bring them all.</div>
</td>
<td width="1%" style="vertical-align:middle;border:0;padding:0 0 0 18px;white-space:nowrap;text-align:right;background:transparent;">
<a href="https://discord.gg/MTbMWdKqE" class="embedl-discord-btn" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #5865F2;background:#5865F2;color:#FFFFFF;text-decoration:none;box-shadow:0 6px 18px rgba(88,101,242,0.35);">Join our Discord →</a>
</td>
</tr>
</table>
</div>
<!-- embedl-discord-banner:end -->
|