Instructions to use tongrow/MLX-Qwopus3.5-9B-Coder-oQ4-fp16-mtp with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use tongrow/MLX-Qwopus3.5-9B-Coder-oQ4-fp16-mtp with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("tongrow/MLX-Qwopus3.5-9B-Coder-oQ4-fp16-mtp") config = load_config("tongrow/MLX-Qwopus3.5-9B-Coder-oQ4-fp16-mtp") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use tongrow/MLX-Qwopus3.5-9B-Coder-oQ4-fp16-mtp with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "tongrow/MLX-Qwopus3.5-9B-Coder-oQ4-fp16-mtp"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "tongrow/MLX-Qwopus3.5-9B-Coder-oQ4-fp16-mtp" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use tongrow/MLX-Qwopus3.5-9B-Coder-oQ4-fp16-mtp with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "tongrow/MLX-Qwopus3.5-9B-Coder-oQ4-fp16-mtp"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default tongrow/MLX-Qwopus3.5-9B-Coder-oQ4-fp16-mtp
Run Hermes
hermes
MLX-Qwopus3.5-9B-Coder-oQ4-fp16-mtp
oQ4 quantized MLX release of Qwopus3.5-9B-Coder optimized for Apple Silicon inference with Native MTP preserved.
Built with oMLX v0.3.9.dev2.
Quantization Details
Quantization method:
- oQ4
Non-quantized weight dtype:
- float16
Enabled options:
- Preserve MTP weights
This preserves:
mtp.*tensors- required config fields
allowing Native MTP to remain functional after quantization.
The resulting model includes the -mtp suffix accordingly.
Why float16?
float16 was selected instead of bfloat16 because Apple M1/M2 chips execute native fp16 especially efficiently during prefill workloads.
On Apple Silicon:
- fp16 generally provides faster prompt ingestion
- bf16 may offer slightly better numerical stability
- M3/M4 systems may benefit more from bf16
For this release, the priority was maximum real-world inference responsiveness on M1/M2 hardware.
Tested Hardware
Device:
- MacBook Pro M1
- 16GB unified memory
Runtime configuration:
- Native MTP: enabled
- Context window: 65536
- Temperature: 1
Integrated into:
- Hermes agent workflow
Observed performance:
- Prompt processing (excluding cached): ~219.3 tok/s
- Token generation: ~25.1 tok/s
Format
Format:
- MLX safetensors
Designed specifically for:
- Apple Silicon
- MLX runtimes
- Native MTP workflows
Compatibility
Tested with:
- oMLX
- LM Studio
Base Model
Base model by Jackrong:
All credit for the original architecture and training belongs to the upstream creators.
Notes
This release focuses on:
- Apple Silicon efficiency
- preserving Native MTP support
- practical local coding-agent workflows
- high context operation within 16GB unified memory constraints
- Downloads last month
- 970
4-bit
Model tree for tongrow/MLX-Qwopus3.5-9B-Coder-oQ4-fp16-mtp
Base model
Qwen/Qwen3.5-9B-Base