Carnice MoE 35B-A3B — Hermes-Focused Agentic Model (GGUF)
QLoRA fine-tune of Qwen3.5-35B-A3B (MoE, 3B active parameters) optimized for agentic workflows and Hermes Agent runtime. Two-stage training adapted from kai-os/Carnice-9b.
Credits
Training methodology adapted from kai-os/Carnice-9b — same two-stage approach and datasets, applied to the larger MoE architecture. Key inspiration: training on actual Hermes Agent execution traces for native agentic behavior.
Available Quantizations
| Quantization | Size | BPW | Min VRAM |
|---|---|---|---|
| Q8_0 | 35 GB | 8.52 | 1x 48GB GPU |
| Q6_K | 27 GB | 6.58 | 1x 32GB GPU |
| Q5_K_M | 24 GB | 5.70 | 1x 32GB GPU |
| Q4_K_M | 20 GB | 4.87 | 1x 24GB GPU |
| MXFP4_MOE | 19 GB | 4.39 | 1x 24GB GPU |
For BF16 safetensors, see samuelcardillo/Carnice-MoE-35B-A3B.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen3.5-35B-A3B |
| Architecture | Mixture of Experts (MoE) |
| Total Parameters | ~35B |
| Active Parameters | ~3B per token |
What Makes This Different
Unlike generic reasoning distillation, this model was trained on actual Hermes Agent execution traces — real conversations where an AI agent:
- Executes terminal commands and processes output
- Performs file editing operations
- Chains multi-step tool calls with results feeding back
- Uses browser-assisted workflows
- Makes decisions based on environmental feedback
This teaches the model the exact conversation patterns Hermes expects, rather than just generic reasoning.
Training Details
Two-Stage Approach
Stage A — Reasoning Repair (1 epoch)
- Strengthens base model reasoning before agent-specific training
- Loss: 0.4159
| Dataset | Examples |
|---|---|
| bespokelabs/Bespoke-Stratos-17k | 16,710 |
| AI-MO/NuminaMath-CoT | 17,000 (capped) |
Stage B — Hermes Traces (2 epochs)
- Agent-specific behavioral training on real execution traces
- Loss: 0.3115
| Dataset | Examples |
|---|---|
| kai-os/carnice-glm5-hermes-traces | 1,627 (high quality) |
| open-thoughts/OpenThoughts-Agent-v1-SFT | 15,209 |
Training Configuration
| Parameter | Stage A | Stage B |
|---|---|---|
| LoRA Rank | 64 | 64 |
| LoRA Alpha | 64 | 64 |
| LoRA Targets | q, k, v, o projections | q, k, v, o projections |
| Learning Rate | 2e-5 (linear) | 1e-5 (cosine) |
| Epochs | 1 | 2 |
| Effective Batch | 12 | 12 |
| Context Length | 4096 | 4096 |
| Precision | 4-bit QLoRA + BF16 adapters | Same |
| GPU | RTX PRO 6000 Blackwell (96GB) | Same |
| Total Training Time | ~44 hours (both stages) |
Trainable Parameters
6,881,280 (0.02% of 35B total)
Usage with llama.cpp
llama-server \
--model Carnice-MoE-35B-A3B-Q8_0.gguf \
--n-gpu-layers -1 \
--ctx-size 131072 \
--host 0.0.0.0 --port 8082
Acknowledgements
- kai-os — Carnice training methodology and Hermes traces dataset
- open-thoughts — Agent SFT dataset
- bespokelabs — Bespoke-Stratos reasoning dataset
- Unsloth — QLoRA training framework
- Qwen — Base model
- Downloads last month
- 2,556
4-bit
5-bit
6-bit
8-bit