Add Discord support banner to model card

44e23bc verified 3 days ago

9.16 kB

	---
	base_model:
	- nvidia/Cosmos-Reason2-32B
	tags:
	- nvidia
	- cosmos
	- cosmos-reason2
	- multimodal
	- vlm
	- quantized
	- flashhead
	- qwen3_vl
	pipeline_tag: image-text-to-text
	license: other
	license_name: embedl-models-community-licence-1.0
	license_link: https://github.com/embedl/embedl-models/blob/main/LICENSE
	extra_gated_prompt: >-
	The information you provide will be collected, stored, processed and shared in accordance
	with the [Embedl Privacy Policy](https://www.embedl.com/privacy-policy).
	extra_gated_fields:
	Company: text
	---
	<!-- embedl-banner:start -->
	<style>
	.embedl-btn-primary { transition: background 160ms ease, box-shadow 160ms ease; }
	.embedl-btn-primary:hover { background: #4FDCE4 !important; box-shadow: 0 8px 22px rgba(45,212,221,0.45) !important; }
	.embedl-btn-secondary { transition: background 160ms ease; }
	.embedl-btn-secondary:hover { background: rgba(45,212,221,0.15) !important; }
	.embedl-headline { font-size: clamp(11px, 2.15vw, 15px) !important; }
	.embedl-btn-primary, .embedl-btn-secondary {
	font-size: clamp(11px, 1.65vw, 13px) !important;
	padding: clamp(6px, 1.1vw, 9px) clamp(10px, 1.6vw, 14px) !important;
	}
	</style>
	<div style="background:radial-gradient(600px 220px at 0% 50%,rgba(45,212,221,0.22) 0%,rgba(45,212,221,0) 60%),radial-gradient(400px 180px at 100% 100%,rgba(45,212,221,0.10) 0%,rgba(45,212,221,0) 55%),linear-gradient(135deg,#0B1626 0%,#142338 100%);border:1px solid rgba(45,212,221,0.28);border-radius:12px;padding:22px 24px;margin:0 0 24px 0;color:#F2F6FA;box-shadow:0 4px 16px rgba(11,22,38,0.18);overflow:hidden;box-sizing:border-box;max-width:100%;">
	<table style="width:100%;border-collapse:collapse;border:0;background:transparent;">
	<tr style="background:transparent;">
	<td style="vertical-align:middle;border:0;padding:0;background:transparent;">
	<div style="display:inline-block;font-size:10px;letter-spacing:0.08em;text-transform:uppercase;font-weight:700;color:#2DD4DD;background:rgba(45,212,221,0.15);border:1px solid rgba(45,212,221,0.35);padding:4px 10px;border-radius:999px;margin-bottom:10px;white-space:nowrap;">Optimized by Embedl</div>
	<div class="embedl-headline" style="font-size:15px;font-weight:700;line-height:1.35;color:#F2F6FA;margin-bottom:4px;">Need to <span style="color:#2DD4DD;white-space:nowrap;">fine-tune</span>, hit <span style="color:#2DD4DD;white-space:nowrap;">performance targets</span>, or deploy on <span style="color:#2DD4DD;white-space:nowrap;">specific hardware</span>?</div>
	<div style="font-size:13px;color:#9BA7B5;">We've got you covered.</div>
	</td>
	<td width="1%" style="vertical-align:middle;border:0;padding:0 0 0 18px;white-space:nowrap;text-align:right;background:transparent;">
	<a href="https://www.embedl.com/models" class="embedl-btn-secondary" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #2DD4DD;color:#2DD4DD;text-decoration:none;margin-right:8px;">Learn more</a>
	<a href="https://www.embedl.com/contact" class="embedl-btn-primary" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #2DD4DD;background:#2DD4DD;color:#0B1626;text-decoration:none;box-shadow:0 6px 18px rgba(45,212,221,0.28);">Get in touch →</a>
	</td>
	</tr>
	</table>
	</div>
	<!-- embedl-banner:end -->

	# Cosmos-Reason2-32B-W4A16-FlashHead

	[![GitHub](https://img.shields.io/badge/GitHub-flash--head-black?logo=github)](https://github.com/embedl/flash-head)

	Optimized version of [nvidia/Cosmos-Reason2-32B](https://huggingface.co/nvidia/Cosmos-Reason2-32B) using quantization and FlashHead, Embedl's efficient replacement for the language model head.

	Designed for low-latency inference on NVIDIA GPUs, leveraging:

	- FlashHead
	- Quantization (W4A16)
	- vLLM plugin via [`flash-head`](https://github.com/embedl/flash-head)

	---

	## Model Details

	\| Field \| Value \|
	\|---\|---\|
	\| Base Model \| [nvidia/Cosmos-Reason2-32B](https://huggingface.co/nvidia/Cosmos-Reason2-32B) \|
	\| Input / Output \| Text + Image / Video -> Text \|
	\| Optimizations \| FlashHead LM Head + Quantization (W4A16) \|
	\| Developers \| Embedl \|
	\| Licenses \| Upstream: [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license). <br>Optimized components: Embedl Models Community Licence v1.0 (no redistribution) \|

	---

	## Benchmarks

	Accuracy and on-device latency benchmarks can be explored on [embedl/Edge-Inference-Benchmarks](https://huggingface.co/spaces/embedl/Edge-Inference-Benchmarks).

	<a href="https://huggingface.co/spaces/embedl/Edge-Inference-Benchmarks" target="_blank" rel="noopener">
	<img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/Edge-Inference-Benchmarks/screenshot.png" alt="Screenshot Edge Inference Benchmarks" width="75%">
	</a>

	---

	## Installation

	```bash
	pip install flash-head
	```

	The [`flash-head`](https://github.com/embedl/flash-head) vLLM plugin is required. It activates automatically at startup.

	---

	## Usage Examples

	### vLLM Serve

	```bash
	vllm serve embedl/Cosmos-Reason2-32B-W4A16-FlashHead \
	--max-model-len 8192 \
	--gpu-memory-utilization 0.9
	```

	### vLLM Video Inference

	```python
	from vllm import LLM, SamplingParams

	if __name__ == "__main__":
	model = "embedl/Cosmos-Reason2-32B-W4A16-FlashHead"
	video_url = "https://nvidia-cosmos.github.io/cosmos-cookbook/gallery/vs_assets/clip_1_short.mp4"

	messages = [
	{
	"role": "system",
	"content": [{"type": "text", "text": "You are a helpful assistant."}],
	},
	{
	"role": "user",
	"content": [
	{"type": "video_url", "video_url": {"url": video_url, "fps": 4}},
	{"type": "text", "text": "Describe this video in detail."},
	],
	},
	]

	llm = LLM(
	model=model,
	limit_mm_per_prompt={
	"video": {"count": 1, "num_frames": 12, "width": 1280, "height": 720},
	"image": 0,
	"audio": 0,
	},
	media_io_kwargs={"video": {"num_frames": -1}},
	max_model_len=8192,
	mm_processor_kwargs={"truncation": False},
	gpu_memory_utilization=0.9,
	trust_remote_code=True,
	)

	output = llm.chat(messages, sampling_params=SamplingParams(temperature=0.0, max_tokens=256))
	print(output[0].outputs[0].text)
	```

	---

	## License

	- Upstream: [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license)
	- Optimized Components: Embedl Models Community Licence v1.0 (no redistribution)

	---

	## Contact

	- Enterprise and Commercial Inquiries: `models@embedl.com`
	- Technical Issues and Early Access: [`https://github.com/embedl/flash-head`](https://github.com/embedl/flash-head)
	- More Information and Model Releases: `https://embedl.com`

	<!-- embedl-discord-banner:start -->
	<style>
	.embedl-discord-btn { transition: background 160ms ease, box-shadow 160ms ease; }
	.embedl-discord-btn:hover { background: #6C77F5 !important; box-shadow: 0 8px 22px rgba(88,101,242,0.55) !important; }
	</style>
	<div style="background:radial-gradient(600px 220px at 0% 50%,rgba(88,101,242,0.22) 0%,rgba(88,101,242,0) 60%),radial-gradient(400px 180px at 100% 100%,rgba(88,101,242,0.10) 0%,rgba(88,101,242,0) 55%),linear-gradient(135deg,#0B1626 0%,#142338 100%);border:1px solid rgba(88,101,242,0.35);border-radius:12px;padding:22px 24px;margin:24px 0 0 0;color:#F2F6FA;box-shadow:0 4px 16px rgba(11,22,38,0.18);overflow:hidden;box-sizing:border-box;max-width:100%;">
	<table style="width:100%;border-collapse:collapse;border:0;background:transparent;">
	<tr style="background:transparent;">
	<td style="vertical-align:middle;border:0;padding:0;background:transparent;">
	<div style="display:inline-block;font-size:10px;letter-spacing:0.08em;text-transform:uppercase;font-weight:700;color:#A5B4FC;background:rgba(88,101,242,0.18);border:1px solid rgba(88,101,242,0.45);padding:4px 10px;border-radius:999px;margin-bottom:10px;white-space:nowrap;">Community & support</div>
	<div style="font-size:15px;font-weight:700;line-height:1.35;color:#F2F6FA;margin-bottom:4px;">Need help with this model? Chat with the Embedl team and other engineers on <span style="color:#A5B4FC;white-space:nowrap;">Discord</span>.</div>
	<div style="font-size:13px;color:#9BA7B5;">Quantization gotchas, hardware questions, fine-tuning tips — bring them all.</div>
	</td>
	<td width="1%" style="vertical-align:middle;border:0;padding:0 0 0 18px;white-space:nowrap;text-align:right;background:transparent;">
	<a href="https://discord.gg/MTbMWdKqE" class="embedl-discord-btn" style="display:inline-block;font-size:13px;font-weight:600;padding:9px 14px;border-radius:6px;border:1px solid #5865F2;background:#5865F2;color:#FFFFFF;text-decoration:none;box-shadow:0 6px 18px rgba(88,101,242,0.35);">Join our Discord →</a>
	</td>
	</tr>
	</table>
	</div>
	<!-- embedl-discord-banner:end -->