Qwen3-Omni-30B-A3B-Instruct GGUF
Quantized GGUF models for Qwen3-Omni multimodal LLM.
Files
| File | Size | Description |
|---|---|---|
qwen3-omni-30B-Q8_0.gguf |
31GB | Main LLM (Q8_0 quantization) |
mmproj-qwen3-omni-30B-F16-fixed.gguf |
2.3GB | Vision projector (F16) |
Usage
Requires custom llama.cpp build with Qwen3-Omni support:
# Clone the fork with Qwen3-Omni support
git clone https://github.com/phnxsystms/llama.cpp.git
cd llama.cpp
git checkout qwen3omni
# Build
mkdir build && cd build
cmake .. -DGGML_CUDA=ON
cmake --build . -j
# Run text inference
./bin/llama-cli -m qwen3-omni-30B-Q8_0.gguf -p "Hello!" -ngl 99
# Run multimodal inference
./bin/llama-mtmd-cli -m qwen3-omni-30B-Q8_0.gguf --mmproj mmproj-qwen3-omni-30B-F16-fixed.gguf --image your_image.jpg -p "Describe this image"
Model Info
- Base model: Qwen3-Omni-30B-A3B-Instruct
- Architecture: MoE-based multimodal (48 layers, 128 experts)
- Capabilities: Text + Vision
- Tested: Distributed inference on 5-GPU cluster (41-44 tok/s)
llama.cpp Fork
The Qwen3-Omni architecture support is available at:
- Repository: https://github.com/phnxsystms/llama.cpp
- Branch:
qwen3omni
License
Apache 2.0 (same as original Qwen3-Omni model)
- Downloads last month
- 239
Hardware compatibility
Log In to add your hardware
8-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support