How to use from
Hermes Agent
Start the llama.cpp server
# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf noctrex/Qwopus3.5-9B-Coder-MTP:
Configure Hermes
# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default noctrex/Qwopus3.5-9B-Coder-MTP:
Run Hermes
hermes
Quick Links

These are quantizations of the model Jackrong / Qwopus3.5-9B-Coder
I've added the MTP layer on it.
My personal speed improvement on my 7900XTX with the vulkan backend has been from ~80 tps to around ~120 tps. An imatrix has been calulated for coding tasks, as such it is specialized for coding.

Quick Start

  1. Download the latest release of llama.cpp.
  2. Download your preferred model variant from below.
Downloads last month
4,038
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for noctrex/Qwopus3.5-9B-Coder-MTP

Finetuned
Qwen/Qwen3.5-9B
Quantized
(2)
this model