Qwen3-Omni-30B-A3B-Instruct GGUF

Quantized GGUF models for Qwen3-Omni multimodal LLM.

Files

File Size Description
qwen3-omni-30B-Q8_0.gguf 31GB Main LLM (Q8_0 quantization)
mmproj-qwen3-omni-30B-F16-fixed.gguf 2.3GB Vision projector (F16)

Usage

Requires custom llama.cpp build with Qwen3-Omni support:

# Clone the fork with Qwen3-Omni support
git clone https://github.com/phnxsystms/llama.cpp.git
cd llama.cpp
git checkout qwen3omni

# Build
mkdir build && cd build
cmake .. -DGGML_CUDA=ON
cmake --build . -j

# Run text inference
./bin/llama-cli -m qwen3-omni-30B-Q8_0.gguf -p "Hello!" -ngl 99

# Run multimodal inference  
./bin/llama-mtmd-cli -m qwen3-omni-30B-Q8_0.gguf --mmproj mmproj-qwen3-omni-30B-F16-fixed.gguf --image your_image.jpg -p "Describe this image"

Model Info

  • Base model: Qwen3-Omni-30B-A3B-Instruct
  • Architecture: MoE-based multimodal (48 layers, 128 experts)
  • Capabilities: Text + Vision
  • Tested: Distributed inference on 5-GPU cluster (41-44 tok/s)

llama.cpp Fork

The Qwen3-Omni architecture support is available at:

License

Apache 2.0 (same as original Qwen3-Omni model)

Downloads last month
239
GGUF
Model size
31B params
Architecture
qwen3omni
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support