opensleuth-colab / README.md
anugrah55's picture
Initial commit: OpenSleuth Colab quickstart notebook + Gradio landing page
e8f2f91 verified
---
title: OpenSleuth Colab
emoji: πŸ•΅οΈ
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
---
# OpenSleuth β€” Colab quickstart Space
This Space is a thin landing page for the [`train_opensleuth_grpo.ipynb`](./train_opensleuth_grpo.ipynb) notebook β€” the **minimum reproducible Colab** for training an OpenSleuth agent end-to-end against the live env Space.
## What is OpenSleuth?
An **Algorithmic Detective** RL environment. An LLM agent reverse-engineers an unknown black-box Python function by **probing** it with inputs and then **submitting** a Python replica. The environment scores submissions by domain-aware fuzz-testing against the hidden reference, with a complexity penalty so the agent can't just memorise its probes inside a giant `if/else`.
## Try it
Click the badge to open the notebook in Google Colab:
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/#fileId=https%3A//huggingface.co/spaces/anugrah55/opensleuth-colab/blob/main/train_opensleuth_grpo.ipynb)
Or download `train_opensleuth_grpo.ipynb` from the **Files** tab and upload it to Colab manually. Set the runtime to **GPU β†’ T4** and hit **Runtime β†’ Run all** β€” end-to-end training completes in roughly 15 – 25 minutes on a free-tier T4 with the default Qwen2.5-0.5B-Instruct config.
## What the notebook does
1. Pip-installs the pinned trainer stack (`transformers==4.51.3`, `trl==0.16.1`, `peft==0.14.0`, `accelerate==1.4.0`, `bitsandbytes==0.45.5`, `datasets==3.3.2`).
2. Hits the live env Space [`anugrah55/opensleuth-env-gemini-cli`](https://huggingface.co/spaces/anugrah55/opensleuth-env-gemini-cli) at `https://anugrah55-opensleuth-env-gemini-cli.hf.space` to discover all 15 tasks (9 builtins + 6 from the Hub task dataset).
3. Builds a synthesis dataset where each row is `(signature + observed probes) β†’ expected python implementation`.
4. Loads `Qwen2.5-0.5B-Instruct` in 4-bit + LoRA so it fits on a T4.
5. Trains with HF TRL's `GRPOTrainer` using a two-part reward:
- **env-verifier reward**: real fuzz-tested correctness against the hidden reference, with a complexity penalty.
- **format reward**: tiny shaping signal for emitting a fenced ```python``` code block with the right function name.
6. Optionally pushes the trained LoRA adapter to your own Hub account.
7. Runs a 3-episode smoke eval and prints the agent's emitted code.
## Links
- **Env Space (REST API the notebook calls):** https://huggingface.co/spaces/anugrah55/opensleuth-env-gemini-cli
- **Training Space (full 3B retrain):** https://huggingface.co/spaces/anugrah55/opensleuth-training-gemini-cli
- **Open-ended task catalog (Hub dataset):** https://huggingface.co/datasets/anugrah55/opensleuth-tasks
## License
Apache-2.0.