--- language: - en license: apache-2.0 library_name: mlx tags: - mlx - qwen3.5 - reasoning - chain-of-thought - self-correction - tool-calling - agent - hermes - unsloth - conversational base_model: DJLougen/Harmonic-Hermes-9B datasets: - lambda/hermes-agent-reasoning-traces --- > ## ☕ Support This Work > > I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. It's a hobby that got out of hand. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running. > > **[☕ ko-fi.com/djlougen](https://ko-fi.com/djlougen)** # Harmonic-Hermes-9B-MLX-8bit ![Harmonic-Hermes-9B](hhMLX.jpeg) 8-bit MLX conversion of [Harmonic-Hermes-9B](https://huggingface.co/DJLougen/Harmonic-Hermes-9B) for local inference on Apple Silicon with [mlx-lm](https://github.com/ml-explore/mlx-examples/tree/main/llms). | Quantization | Size | Use Case | |---|---|---| | **8-bit** | **~8.9 GB** | Near-lossless quality, 16GB+ unified memory | ### Other formats | Format | Repo | |---|---| | GGUF (all quants) | [Harmonic-Hermes-9B-GGUF](https://huggingface.co/DJLougen/Harmonic-Hermes-9B-GGUF) | | MLX 4-bit | [Harmonic-Hermes-9B-MLX-4bit](https://huggingface.co/DJLougen/Harmonic-Hermes-9B-MLX-4bit) | | MLX 8-bit | [Harmonic-Hermes-9B-MLX-8bit](https://huggingface.co/DJLougen/Harmonic-Hermes-9B-MLX-8bit) | | MLX BF16 | [Harmonic-Hermes-9B-MLX-bf16](https://huggingface.co/DJLougen/Harmonic-Hermes-9B-MLX-bf16) | | Full weights | [Harmonic-Hermes-9B](https://huggingface.co/DJLougen/Harmonic-Hermes-9B) | --- Harmonic-Hermes-9B is the **Stage 2 agentic fine-tune** of [Harmonic-9B](https://huggingface.co/DJLougen/Harmonic-9B) — a dedicated tool-calling and agent model built on top of a strong reasoning backbone. Where Harmonic-9B teaches the model *how to think*, Harmonic-Hermes-9B teaches it *how to act* — structured tool use, multi-turn agent workflows, and function calling, all grounded in the reasoning depth from Stage 1. > **Stage 1** — [Harmonic-9B](https://huggingface.co/DJLougen/Harmonic-9B): Heavy reasoning fine-tune on privately generated, structurally validated data. Every row passes strict quality gates. The thinking backbone. > > **Stage 2** (this model): Agentic fine-tune on [hermes-agent-traces-filtered](https://huggingface.co/datasets/DJLougen/hermes-agent-traces-filtered) — 3,679 structurally validated agent traces with deep reasoning, tool calling, and multi-turn workflows. ## Usage ```bash pip install mlx-lm # Generate mlx_lm.generate --model DJLougen/Harmonic-Hermes-9B-MLX-8bit --prompt "Use the available tools to..." # Chat mlx_lm.chat --model DJLougen/Harmonic-Hermes-9B-MLX-8bit ``` ### Python API ```python from mlx_lm import load, generate model, tokenizer = load("DJLougen/Harmonic-Hermes-9B-MLX-8bit") response = generate(model, tokenizer, prompt="Use the available tools to check the weather.", max_tokens=512) print(response) ``` ### Reasoning + Tool Use The model uses `` blocks for reasoning before acting: ``` The user wants to check the weather in Toronto. I have a get_weather tool available. Let me call it with the right parameters... {"name": "get_weather", "arguments": {"location": "Toronto, Canada"}} ``` ## How Our Training Data Compares ### Quality Comparison ![Quality Comparison](quality_comparison.png) ### Metrics Summary ![Metrics Summary](metrics_summary.png) | Metric | **Harmonic Traces** (ours) | **Carnice GLM-5** (kai-os) | |---|---|---| | **Rows** | 3,679 | 1,627 | | **Source model** | Multiple frontier models | GLM-5 via OpenRouter | | **Think block depth** | **581 words avg** | 40 words avg | | **Self-correction** | **63.0%** | 29.7% | | **Verification** | **95.9%** | 63.7% | | **Alternative exploration** | **43.7%** | 51.3% | | **Valid JSON (all tool calls)** | **100%** | 100% | | **Tool calls per conversation** | **18.5** | 5.4 | | **Messages per conversation** | **32.1** | 12.1 | | **Multi-turn (>5 messages)** | **97.8%** | 89.6% | ### Reasoning Flow ![Reasoning Flow](reasoning_flow.png) ### Conversation Structure ![Conversation Structure](conversation_structure.png) ### Category Distribution ![Categories](categories.png) Training data: [DJLougen/hermes-agent-traces-filtered](https://huggingface.co/datasets/DJLougen/hermes-agent-traces-filtered) ## What This Model Does - **Tool calling / function calling** — structured JSON tool use in the Hermes agent format - **Multi-turn agent workflows** — maintains coherent state across extended tool-use conversations - **Reasoning-grounded decisions** — inherits Harmonic-9B's self-correction, verification, and exploration before committing to actions ## Architecture - **Base**: [Harmonic-9B](https://huggingface.co/DJLougen/Harmonic-9B) (Stage 1 reasoning fine-tune of Qwen 3.5 9B) - **Parameters**: 9.65B - **Training**: LoRA fine-tuning, merged into base weights - **Context**: 8192 tokens ## License Apache 2.0 — same as the base model. Fully commercial use permitted.