--- license: apache-2.0 base_model: Qwen/Qwen3-8B-Base library_name: llama.cpp pipeline_tag: text-generation --- # Hermes-Bonsai Karpathy Self-Improving Agent Loop Stage 2 checkpoint for the Hermes/Bonsai Karpathy auto-research loop. Last updated: 2026-04-05 This release is inspired by Andrej Karpathy's framing of self-improving training loops and auto-research. It contains the model artifact that worked, plus a concise model card explaining how it was produced and how to run it. ## Overview - **Base model:** Qwen3-8B-Base - **Training method:** supervised fine-tuning via the Hermes/Karpathy loop - **Stage:** Stage 2 — the checkpoint that worked - **Known limitation:** Stage 3 exposed a learned-helplessness pattern on some tasks; that behavior is documented in the GitHub methodology repo - **License:** Apache-2.0 for this release; the underlying base model license also applies to the inherited Qwen3-8B-Base components ## What went into this checkpoint - The loop-produced training curriculum and trace distillation pipeline - 140 verified raw passes used as positive reinforcement for curriculum rebalancing and trace selection - These are Bonsai's own unedited outputs that passed teacher evaluation - 10 domains covered across the build - Validation signal from a mixed-domain batch ## Domains covered - memory_integration - refusal_redirect - self_correction - agent_routing - devops - logic_puzzle - code_debugging - math - architecture - research_synthesis ## Strongest domains Best performance concentrated in: - memory_integration - refusal_redirect - self_correction ## Validation metrics - Mixed-domain batch: **13/50 raw passes** - Raw pass rate: **26%** - This checkpoint is the stage 2 model that produced those verified passes ## What's novel Trained via a graduation protocol with teacher-guided validation, raw-pass reinforcement, and frontier failure analysis. The interesting contribution is the loop methodology; see GitHub for the full curriculum and training workflow. ## GitHub methodology The training loop, curriculum design, graduation protocol, and detailed methodology live here: https://github.com/aurous37-lang/Hermes-Bonsai-Self-Improving-Agent-Loop ## Files in this Hugging Face repo - `bonsai-8b-stage2-post-curriculum-q8.gguf` — the shipped stage 2 checkpoint - `README.md` — this model card - `LICENSE` — Apache-2.0 license ## How to use Recommended working config from the stable local run: - `--ctx-size 40960` - `--n-gpu-layers 37` ### llama.cpp ```bash ./llama-cli -m bonsai-8b-stage2-post-curriculum-q8.gguf \ --ctx-size 40960 \ -p "Explain the CAP theorem for a backend engineer." ``` ### llama-server ```bash ./llama-server -m bonsai-8b-stage2-post-curriculum-q8.gguf \ --ctx-size 40960 \ --n-gpu-layers 37 \ --host 0.0.0.0 --port 8080 ``` Then point your client at the local OpenAI-compatible endpoint exposed by `llama-server`. ## Notes - This is a release checkpoint, not the full training corpus. - The GitHub repo contains the code and documentation needed to reproduce the loop. - The Hugging Face repo contains the model artifact that ships from that loop.