--- base_model: nvidia/Cosmos-Reason2-32B library_name: llama.cpp pipeline_tag: image-text-to-text tags: - gguf - qwen3-vl - cosmos - nvidia - multimodal - image-text-to-text - bf16 - q4_k_m - q5_k_m - q8_0 license: other license_name: nvidia-open-model-license license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license --- # Cosmos-Reason2-32B GGUF Pure GGUF conversion of `nvidia/Cosmos-Reason2-32B`. Built on NVIDIA Cosmos. ## Files - `Cosmos-Reason2-32B-BF16.gguf`: BF16 text backbone GGUF. - `Cosmos-Reason2-32B-Q4_K_M.gguf`: smaller 4-bit text backbone GGUF for lower memory use. - `Cosmos-Reason2-32B-Q5_K_M.gguf`: balanced 5-bit text backbone GGUF with better quality than Q4. - `Cosmos-Reason2-32B-Q8_0.gguf`: larger 8-bit text backbone GGUF for higher quality. - `mmproj-Cosmos-Reason2-32B-F16.gguf`: F16 multimodal projector / vision GGUF. Use one text backbone file together with the `mmproj` file for multimodal inference. ## Hardware estimates These are rough inference estimates for `llama.cpp` with batch size 1. Actual memory use depends on context length, image/video inputs, backend, and how many layers are offloaded to GPU. | Text backbone | File size | Text + mmproj | Suggested system RAM | Suggested VRAM for mostly/full GPU offload | Notes | | --- | ---: | ---: | ---: | ---: | --- | | `Q4_K_M` | 19.8 GB | 21.0 GB | 32 GB minimum, 48 GB comfortable | 24 GB tight, 32 GB comfortable | Best first choice for local use. | | `Q5_K_M` | 23.2 GB | 24.4 GB | 48 GB comfortable | 32 GB comfortable | Better quality than Q4 with moderate extra memory. | | `Q8_0` | 34.8 GB | 36.0 GB | 64 GB comfortable | 48 GB+ recommended | Higher quality, much larger. | | `BF16` | 65.5 GB | 66.7 GB | 96 GB+ recommended | 80 GB+ or multi-GPU | Original precision GGUF; not a practical default for most local machines. | KV cache adds roughly 2 GiB per 8k text tokens at fp16 cache precision, before additional image/video token overhead. Reduce `--ctx-size` or use partial CPU/GPU offload if memory is tight. ## Source Original model: https://huggingface.co/nvidia/Cosmos-Reason2-32B This GGUF conversion was produced with `llama.cpp` `convert_hf_to_gguf.py` from the original Hugging Face safetensors. ## Usage Use one text backbone file together with the multimodal projector in `llama.cpp`: ```bash llama-server \ -m Cosmos-Reason2-32B-Q4_K_M.gguf \ --mmproj mmproj-Cosmos-Reason2-32B-F16.gguf ``` BF16 and Q8_0 are large and may require CPU offload or a multi-GPU setup. ## License Licensed by NVIDIA Corporation under the NVIDIA Open Model License. See `NOTICE` and the original model card for license terms and usage requirements.