Qwen2.5-Omni-3B Decoder (GGUF)

Text-only decoder extracted from Qwen/Qwen2.5-Omni-3B.

Architecture

Type: Qwen2VL (text decoder)
Parameters: 3.4B (decoder only, excluding vision/audio/talker/token2wav)
Hidden size: 2048
Layers: 36
Attention heads: 16 (KV heads: 2, GQA)
FFN size: 11008
Vocab: 151,936
Context: 32,768 tokens

Files

File	Size	Description
`Qwen2.5-Omni-3B-decoder-F16.gguf`	6.4 GB	Full precision (FP16)

Usage with llama.cpp

llama-cli -m Qwen2.5-Omni-3B-decoder-F16.gguf -p "Hello" -n 100 -no-cnv

Extraction

Extracted using convert_hf_to_gguf.py from llama.cpp. The converter automatically strips thinker. prefix and drops vision/audio/talker/token2wav components, keeping only the text decoder (435 tensors).

Downloads last month: 30

GGUF

Model size

3B params

Architecture

qwen2vl

Hardware compatibility

16-bit