Carnice-27b-MLX-oQ8

This repo is a straight MLX oQ8 quant of kai-os/Carnice-27b for local Apple Silicon inference.

No other edits, additions, merges, or behavioral changes have been made to the model beyond the quantization/export step.

M1 Ultra Mac Studio Throughput

Measured on a Mac Studio with Apple M1 Ultra and 128 GB unified memory.

Carnice-27b full weights: 10.982 tokens/sec average generation, 53.984 GB peak memory
Carnice-27b-MLX-oQ8: 25.002 tokens/sec average generation, 18.123 GB peak memory

This is a mixed 4/8 MLX quant.

The text below is carried over from kai-os/Carnice-27b, with the quant-specific notes above added for this MLX release.

Carnice-27b is the merged full-model release of the Trinity Hermes-Agent training run on top of Qwen/Qwen3.5-27B.

This repo contains the quantized MLX export of that model.

This work would not have been possible without Zachary Mueller, Lambda, Teknium, and Nous Research.

Trained using traces from lambda/hermes-agent-reasoning-traces

Carnice-27b is tuned for Hermes-Agent style terminal, file, browser, repo, debugging, and multi-step tool workflows.

Reproducible benchmark runs are not attached yet. They will be added only after the dedicated benchmark box run is complete.

python -m mlx_lm.generate \
  --model /path/to/Carnice-27b-MLX-oQ8 \
  --prompt "Write a bash command to list large files recursively."

Safetensors

Model size

27B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Base model

Finetuned

Quantized

(4)

this model