Spaces:
Runtime error
Runtime error
| title: OpenSleuth Colab | |
| emoji: π΅οΈ | |
| colorFrom: indigo | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 4.44.0 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| # OpenSleuth β Colab quickstart Space | |
| This Space is a thin landing page for the [`train_opensleuth_grpo.ipynb`](./train_opensleuth_grpo.ipynb) notebook β the **minimum reproducible Colab** for training an OpenSleuth agent end-to-end against the live env Space. | |
| ## What is OpenSleuth? | |
| An **Algorithmic Detective** RL environment. An LLM agent reverse-engineers an unknown black-box Python function by **probing** it with inputs and then **submitting** a Python replica. The environment scores submissions by domain-aware fuzz-testing against the hidden reference, with a complexity penalty so the agent can't just memorise its probes inside a giant `if/else`. | |
| ## Try it | |
| Click the badge to open the notebook in Google Colab: | |
| [](https://colab.research.google.com/#fileId=https%3A//huggingface.co/spaces/anugrah55/opensleuth-colab/blob/main/train_opensleuth_grpo.ipynb) | |
| Or download `train_opensleuth_grpo.ipynb` from the **Files** tab and upload it to Colab manually. Set the runtime to **GPU β T4** and hit **Runtime β Run all** β end-to-end training completes in roughly 15 β 25 minutes on a free-tier T4 with the default Qwen2.5-0.5B-Instruct config. | |
| ## What the notebook does | |
| 1. Pip-installs the pinned trainer stack (`transformers==4.51.3`, `trl==0.16.1`, `peft==0.14.0`, `accelerate==1.4.0`, `bitsandbytes==0.45.5`, `datasets==3.3.2`). | |
| 2. Hits the live env Space [`anugrah55/opensleuth-env-gemini-cli`](https://huggingface.co/spaces/anugrah55/opensleuth-env-gemini-cli) at `https://anugrah55-opensleuth-env-gemini-cli.hf.space` to discover all 15 tasks (9 builtins + 6 from the Hub task dataset). | |
| 3. Builds a synthesis dataset where each row is `(signature + observed probes) β expected python implementation`. | |
| 4. Loads `Qwen2.5-0.5B-Instruct` in 4-bit + LoRA so it fits on a T4. | |
| 5. Trains with HF TRL's `GRPOTrainer` using a two-part reward: | |
| - **env-verifier reward**: real fuzz-tested correctness against the hidden reference, with a complexity penalty. | |
| - **format reward**: tiny shaping signal for emitting a fenced ```python``` code block with the right function name. | |
| 6. Optionally pushes the trained LoRA adapter to your own Hub account. | |
| 7. Runs a 3-episode smoke eval and prints the agent's emitted code. | |
| ## Links | |
| - **Env Space (REST API the notebook calls):** https://huggingface.co/spaces/anugrah55/opensleuth-env-gemini-cli | |
| - **Training Space (full 3B retrain):** https://huggingface.co/spaces/anugrah55/opensleuth-training-gemini-cli | |
| - **Open-ended task catalog (Hub dataset):** https://huggingface.co/datasets/anugrah55/opensleuth-tasks | |
| ## License | |
| Apache-2.0. | |