--- title: OpenSleuth Colab emoji: 🕵️ colorFrom: indigo colorTo: green sdk: gradio sdk_version: 4.44.0 app_file: app.py pinned: false license: apache-2.0 --- # OpenSleuth — Colab quickstart Space This Space is a thin landing page for the [`train_opensleuth_grpo.ipynb`](./train_opensleuth_grpo.ipynb) notebook — the **minimum reproducible Colab** for training an OpenSleuth agent end-to-end against the live env Space. ## What is OpenSleuth? An **Algorithmic Detective** RL environment. An LLM agent reverse-engineers an unknown black-box Python function by **probing** it with inputs and then **submitting** a Python replica. The environment scores submissions by domain-aware fuzz-testing against the hidden reference, with a complexity penalty so the agent can't just memorise its probes inside a giant `if/else`. ## Try it Click the badge to open the notebook in Google Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/#fileId=https%3A//huggingface.co/spaces/anugrah55/opensleuth-colab/blob/main/train_opensleuth_grpo.ipynb) Or download `train_opensleuth_grpo.ipynb` from the **Files** tab and upload it to Colab manually. Set the runtime to **GPU → T4** and hit **Runtime → Run all** — end-to-end training completes in roughly 15 – 25 minutes on a free-tier T4 with the default Qwen2.5-0.5B-Instruct config. ## What the notebook does 1. Pip-installs the pinned trainer stack (`transformers==4.51.3`, `trl==0.16.1`, `peft==0.14.0`, `accelerate==1.4.0`, `bitsandbytes==0.45.5`, `datasets==3.3.2`). 2. Hits the live env Space [`anugrah55/opensleuth-env-gemini-cli`](https://huggingface.co/spaces/anugrah55/opensleuth-env-gemini-cli) at `https://anugrah55-opensleuth-env-gemini-cli.hf.space` to discover all 15 tasks (9 builtins + 6 from the Hub task dataset). 3. Builds a synthesis dataset where each row is `(signature + observed probes) → expected python implementation`. 4. Loads `Qwen2.5-0.5B-Instruct` in 4-bit + LoRA so it fits on a T4. 5. Trains with HF TRL's `GRPOTrainer` using a two-part reward: - **env-verifier reward**: real fuzz-tested correctness against the hidden reference, with a complexity penalty. - **format reward**: tiny shaping signal for emitting a fenced ```python``` code block with the right function name. 6. Optionally pushes the trained LoRA adapter to your own Hub account. 7. Runs a 3-episode smoke eval and prints the agent's emitted code. ## Links - **Env Space (REST API the notebook calls):** https://huggingface.co/spaces/anugrah55/opensleuth-env-gemini-cli - **Training Space (full 3B retrain):** https://huggingface.co/spaces/anugrah55/opensleuth-training-gemini-cli - **Open-ended task catalog (Hub dataset):** https://huggingface.co/datasets/anugrah55/opensleuth-tasks ## License Apache-2.0.