MLX-Qwopus3.5-9B-Coder-oQ4-fp16-mtp

oQ4 quantized MLX release of Qwopus3.5-9B-Coder optimized for Apple Silicon inference with Native MTP preserved.

Built with oMLX v0.3.9.dev2.


Quantization Details

Quantization method:

  • oQ4

Non-quantized weight dtype:

  • float16

Enabled options:

  • Preserve MTP weights

This preserves:

  • mtp.* tensors
  • required config fields

allowing Native MTP to remain functional after quantization.

The resulting model includes the -mtp suffix accordingly.


Why float16?

float16 was selected instead of bfloat16 because Apple M1/M2 chips execute native fp16 especially efficiently during prefill workloads.

On Apple Silicon:

  • fp16 generally provides faster prompt ingestion
  • bf16 may offer slightly better numerical stability
  • M3/M4 systems may benefit more from bf16

For this release, the priority was maximum real-world inference responsiveness on M1/M2 hardware.


Tested Hardware

Device:

  • MacBook Pro M1
  • 16GB unified memory

Runtime configuration:

  • Native MTP: enabled
  • Context window: 65536
  • Temperature: 1

Integrated into:

  • Hermes agent workflow

Observed performance:

  • Prompt processing (excluding cached): ~219.3 tok/s
  • Token generation: ~25.1 tok/s

Format

Format:

  • MLX safetensors

Designed specifically for:

  • Apple Silicon
  • MLX runtimes
  • Native MTP workflows

Compatibility

Tested with:

  • oMLX
  • LM Studio

Base Model

Base model by Jackrong:

All credit for the original architecture and training belongs to the upstream creators.


Notes

This release focuses on:

  • Apple Silicon efficiency
  • preserving Native MTP support
  • practical local coding-agent workflows
  • high context operation within 16GB unified memory constraints
Downloads last month
970
Safetensors
Model size
2B params
Tensor type
F16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tongrow/MLX-Qwopus3.5-9B-Coder-oQ4-fp16-mtp

Finetuned
Qwen/Qwen3.5-9B
Quantized
(2)
this model