How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "noctrex/Qwopus3.5-9B-Coder-MTP"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "noctrex/Qwopus3.5-9B-Coder-MTP",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'
Use Docker
docker model run hf.co/noctrex/Qwopus3.5-9B-Coder-MTP:
Quick Links

These are quantizations of the model Jackrong / Qwopus3.5-9B-Coder
I've added the MTP layer on it.
My personal speed improvement on my 7900XTX with the vulkan backend has been from ~80 tps to around ~120 tps. An imatrix has been calulated for coding tasks, as such it is specialized for coding.

Quick Start

  1. Download the latest release of llama.cpp.
  2. Download your preferred model variant from below.
Downloads last month
4,038
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for noctrex/Qwopus3.5-9B-Coder-MTP

Finetuned
Qwen/Qwen3.5-9B
Quantized
(2)
this model