Qwen2.5-Omni-3B Decoder (GGUF)

Text-only decoder extracted from Qwen/Qwen2.5-Omni-3B.

Architecture

  • Type: Qwen2VL (text decoder)
  • Parameters: 3.4B (decoder only, excluding vision/audio/talker/token2wav)
  • Hidden size: 2048
  • Layers: 36
  • Attention heads: 16 (KV heads: 2, GQA)
  • FFN size: 11008
  • Vocab: 151,936
  • Context: 32,768 tokens

Files

File Size Description
Qwen2.5-Omni-3B-decoder-F16.gguf 6.4 GB Full precision (FP16)

Usage with llama.cpp

llama-cli -m Qwen2.5-Omni-3B-decoder-F16.gguf -p "Hello" -n 100 -no-cnv

Extraction

Extracted using convert_hf_to_gguf.py from llama.cpp. The converter automatically strips thinker. prefix and drops vision/audio/talker/token2wav components, keeping only the text decoder (435 tensors).

Downloads last month
30
GGUF
Model size
3B params
Architecture
qwen2vl
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support