| --- |
| license: apache-2.0 |
| base_model: Qwen/Qwen3-8B-Base |
| library_name: llama.cpp |
| pipeline_tag: text-generation |
| --- |
| # Hermes-Bonsai Karpathy Self-Improving Agent Loop |
|
|
| Stage 2 checkpoint for the Hermes/Bonsai Karpathy auto-research loop. |
|
|
| Last updated: 2026-04-05 |
|
|
| This release is inspired by Andrej Karpathy's framing of self-improving training loops and auto-research. It contains the model artifact that worked, plus a concise model card explaining how it was produced and how to run it. |
|
|
| ## Overview |
|
|
| - **Base model:** Qwen3-8B-Base |
| - **Training method:** supervised fine-tuning via the Hermes/Karpathy loop |
| - **Stage:** Stage 2 — the checkpoint that worked |
| - **Known limitation:** Stage 3 exposed a learned-helplessness pattern on some tasks; that behavior is documented in the GitHub methodology repo |
| - **License:** Apache-2.0 for this release; the underlying base model license also applies to the inherited Qwen3-8B-Base components |
|
|
| ## What went into this checkpoint |
|
|
| - The loop-produced training curriculum and trace distillation pipeline |
| - 140 verified raw passes used as positive reinforcement for curriculum rebalancing and trace selection |
| - These are Bonsai's own unedited outputs that passed teacher evaluation |
| - 10 domains covered across the build |
| - Validation signal from a mixed-domain batch |
|
|
| ## Domains covered |
|
|
| - memory_integration |
| - refusal_redirect |
| - self_correction |
| - agent_routing |
| - devops |
| - logic_puzzle |
| - code_debugging |
| - math |
| - architecture |
| - research_synthesis |
| |
| ## Strongest domains |
| |
| Best performance concentrated in: |
| - memory_integration |
| - refusal_redirect |
| - self_correction |
|
|
| ## Validation metrics |
|
|
| - Mixed-domain batch: **13/50 raw passes** |
| - Raw pass rate: **26%** |
| - This checkpoint is the stage 2 model that produced those verified passes |
|
|
| ## What's novel |
|
|
| Trained via a graduation protocol with teacher-guided validation, raw-pass reinforcement, and frontier failure analysis. The interesting contribution is the loop methodology; see GitHub for the full curriculum and training workflow. |
|
|
| ## GitHub methodology |
|
|
| The training loop, curriculum design, graduation protocol, and detailed methodology live here: |
|
|
| https://github.com/aurous37-lang/Hermes-Bonsai-Self-Improving-Agent-Loop |
|
|
| ## Files in this Hugging Face repo |
|
|
| - `bonsai-8b-stage2-post-curriculum-q8.gguf` — the shipped stage 2 checkpoint |
| - `README.md` — this model card |
| - `LICENSE` — Apache-2.0 license |
|
|
| ## How to use |
|
|
| Recommended working config from the stable local run: |
| - `--ctx-size 40960` |
| - `--n-gpu-layers 37` |
|
|
| ### llama.cpp |
|
|
| ```bash |
| ./llama-cli -m bonsai-8b-stage2-post-curriculum-q8.gguf \ |
| --ctx-size 40960 \ |
| -p "Explain the CAP theorem for a backend engineer." |
| ``` |
|
|
| ### llama-server |
|
|
| ```bash |
| ./llama-server -m bonsai-8b-stage2-post-curriculum-q8.gguf \ |
| --ctx-size 40960 \ |
| --n-gpu-layers 37 \ |
| --host 0.0.0.0 --port 8080 |
| ``` |
|
|
| Then point your client at the local OpenAI-compatible endpoint exposed by `llama-server`. |
|
|
| ## Notes |
|
|
| - This is a release checkpoint, not the full training corpus. |
| - The GitHub repo contains the code and documentation needed to reproduce the loop. |
| - The Hugging Face repo contains the model artifact that ships from that loop. |
|
|