Spaces:
Runtime error
A newer version of the Gradio SDK is available: 6.14.0
title: OpenSleuth Colab
emoji: 🕵️
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
OpenSleuth — Colab quickstart Space
This Space is a thin landing page for the train_opensleuth_grpo.ipynb notebook — the minimum reproducible Colab for training an OpenSleuth agent end-to-end against the live env Space.
What is OpenSleuth?
An Algorithmic Detective RL environment. An LLM agent reverse-engineers an unknown black-box Python function by probing it with inputs and then submitting a Python replica. The environment scores submissions by domain-aware fuzz-testing against the hidden reference, with a complexity penalty so the agent can't just memorise its probes inside a giant if/else.
Try it
Click the badge to open the notebook in Google Colab:
Or download train_opensleuth_grpo.ipynb from the Files tab and upload it to Colab manually. Set the runtime to GPU → T4 and hit Runtime → Run all — end-to-end training completes in roughly 15 – 25 minutes on a free-tier T4 with the default Qwen2.5-0.5B-Instruct config.
What the notebook does
- Pip-installs the pinned trainer stack (
transformers==4.51.3,trl==0.16.1,peft==0.14.0,accelerate==1.4.0,bitsandbytes==0.45.5,datasets==3.3.2). - Hits the live env Space
anugrah55/opensleuth-env-gemini-cliathttps://anugrah55-opensleuth-env-gemini-cli.hf.spaceto discover all 15 tasks (9 builtins + 6 from the Hub task dataset). - Builds a synthesis dataset where each row is
(signature + observed probes) → expected python implementation. - Loads
Qwen2.5-0.5B-Instructin 4-bit + LoRA so it fits on a T4. - Trains with HF TRL's
GRPOTrainerusing a two-part reward:- env-verifier reward: real fuzz-tested correctness against the hidden reference, with a complexity penalty.
- format reward: tiny shaping signal for emitting a fenced
pythoncode block with the right function name.
- Optionally pushes the trained LoRA adapter to your own Hub account.
- Runs a 3-episode smoke eval and prints the agent's emitted code.
Links
- Env Space (REST API the notebook calls): https://huggingface.co/spaces/anugrah55/opensleuth-env-gemini-cli
- Training Space (full 3B retrain): https://huggingface.co/spaces/anugrah55/opensleuth-training-gemini-cli
- Open-ended task catalog (Hub dataset): https://huggingface.co/datasets/anugrah55/opensleuth-tasks
License
Apache-2.0.