---
language:
- en
license: apache-2.0
library_name: mlx
tags:
- mlx
- qwen3.5
- reasoning
- chain-of-thought
- self-correction
- tool-calling
- agent
- hermes
- unsloth
- conversational
base_model: DJLougen/Harmonic-Hermes-9B
datasets:
- lambda/hermes-agent-reasoning-traces
---

> ## ☕ Support This Work
>
> I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. It's a hobby that got out of hand. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.
>
> **[☕ ko-fi.com/djlougen](https://ko-fi.com/djlougen)**

# Harmonic-Hermes-9B-MLX-8bit

![Harmonic-Hermes-9B](hhMLX.jpeg)

8-bit MLX conversion of [Harmonic-Hermes-9B](https://huggingface.co/DJLougen/Harmonic-Hermes-9B) for local inference on Apple Silicon with [mlx-lm](https://github.com/ml-explore/mlx-examples/tree/main/llms).

| Quantization | Size | Use Case |
|---|---|---|
| **8-bit** | **~8.9 GB** | Near-lossless quality, 16GB+ unified memory |

### Other formats

| Format | Repo |
|---|---|
| GGUF (all quants) | [Harmonic-Hermes-9B-GGUF](https://huggingface.co/DJLougen/Harmonic-Hermes-9B-GGUF) |
| MLX 4-bit | [Harmonic-Hermes-9B-MLX-4bit](https://huggingface.co/DJLougen/Harmonic-Hermes-9B-MLX-4bit) |
| MLX 8-bit | [Harmonic-Hermes-9B-MLX-8bit](https://huggingface.co/DJLougen/Harmonic-Hermes-9B-MLX-8bit) |
| MLX BF16 | [Harmonic-Hermes-9B-MLX-bf16](https://huggingface.co/DJLougen/Harmonic-Hermes-9B-MLX-bf16) |
| Full weights | [Harmonic-Hermes-9B](https://huggingface.co/DJLougen/Harmonic-Hermes-9B) |

---

Harmonic-Hermes-9B is the **Stage 2 agentic fine-tune** of [Harmonic-9B](https://huggingface.co/DJLougen/Harmonic-9B) — a dedicated tool-calling and agent model built on top of a strong reasoning backbone.

Where Harmonic-9B teaches the model *how to think*, Harmonic-Hermes-9B teaches it *how to act* — structured tool use, multi-turn agent workflows, and function calling, all grounded in the reasoning depth from Stage 1.

> **Stage 1** — [Harmonic-9B](https://huggingface.co/DJLougen/Harmonic-9B): Heavy reasoning fine-tune on privately generated, structurally validated data. Every row passes strict quality gates. The thinking backbone.
>
> **Stage 2** (this model): Agentic fine-tune on [hermes-agent-traces-filtered](https://huggingface.co/datasets/DJLougen/hermes-agent-traces-filtered) — 3,679 structurally validated agent traces with deep reasoning, tool calling, and multi-turn workflows.

## Usage

```bash
pip install mlx-lm

# Generate
mlx_lm.generate --model DJLougen/Harmonic-Hermes-9B-MLX-8bit --prompt "Use the available tools to..."

# Chat
mlx_lm.chat --model DJLougen/Harmonic-Hermes-9B-MLX-8bit
```

### Python API

```python
from mlx_lm import load, generate

model, tokenizer = load("DJLougen/Harmonic-Hermes-9B-MLX-8bit")
response = generate(model, tokenizer, prompt="Use the available tools to check the weather.", max_tokens=512)
print(response)
```

### Reasoning + Tool Use

The model uses `<think>` blocks for reasoning before acting:

```
<think>
The user wants to check the weather in Toronto. I have a get_weather tool available.
Let me call it with the right parameters...
</think>

<tool_call>
{"name": "get_weather", "arguments": {"location": "Toronto, Canada"}}
</tool_call>
```

## How Our Training Data Compares

### Quality Comparison

![Quality Comparison](quality_comparison.png)

### Metrics Summary

![Metrics Summary](metrics_summary.png)

| Metric | **Harmonic Traces** (ours) | **Carnice GLM-5** (kai-os) |
|---|---|---|
| **Rows** | 3,679 | 1,627 |
| **Source model** | Multiple frontier models | GLM-5 via OpenRouter |
| **Think block depth** | **581 words avg** | 40 words avg |
| **Self-correction** | **63.0%** | 29.7% |
| **Verification** | **95.9%** | 63.7% |
| **Alternative exploration** | **43.7%** | 51.3% |
| **Valid JSON (all tool calls)** | **100%** | 100% |
| **Tool calls per conversation** | **18.5** | 5.4 |
| **Messages per conversation** | **32.1** | 12.1 |
| **Multi-turn (>5 messages)** | **97.8%** | 89.6% |

### Reasoning Flow

![Reasoning Flow](reasoning_flow.png)

### Conversation Structure

![Conversation Structure](conversation_structure.png)

### Category Distribution

![Categories](categories.png)

Training data: [DJLougen/hermes-agent-traces-filtered](https://huggingface.co/datasets/DJLougen/hermes-agent-traces-filtered)

## What This Model Does

- **Tool calling / function calling** — structured JSON tool use in the Hermes agent format
- **Multi-turn agent workflows** — maintains coherent state across extended tool-use conversations
- **Reasoning-grounded decisions** — inherits Harmonic-9B's self-correction, verification, and exploration before committing to actions

## Architecture

- **Base**: [Harmonic-9B](https://huggingface.co/DJLougen/Harmonic-9B) (Stage 1 reasoning fine-tune of Qwen 3.5 9B)
- **Parameters**: 9.65B
- **Training**: LoRA fine-tuning, merged into base weights
- **Context**: 8192 tokens

## License

Apache 2.0 — same as the base model. Fully commercial use permitted.