---
title: OpenSleuth Colab
emoji: 🕵️
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
---

# OpenSleuth — Colab quickstart Space

This Space is a thin landing page for the [`train_opensleuth_grpo.ipynb`](./train_opensleuth_grpo.ipynb) notebook — the **minimum reproducible Colab** for training an OpenSleuth agent end-to-end against the live env Space.

## What is OpenSleuth?

An **Algorithmic Detective** RL environment. An LLM agent reverse-engineers an unknown black-box Python function by **probing** it with inputs and then **submitting** a Python replica. The environment scores submissions by domain-aware fuzz-testing against the hidden reference, with a complexity penalty so the agent can't just memorise its probes inside a giant `if/else`.

## Try it

Click the badge to open the notebook in Google Colab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/#fileId=https%3A//huggingface.co/spaces/anugrah55/opensleuth-colab/blob/main/train_opensleuth_grpo.ipynb)

Or download `train_opensleuth_grpo.ipynb` from the **Files** tab and upload it to Colab manually. Set the runtime to **GPU → T4** and hit **Runtime → Run all** — end-to-end training completes in roughly 15 – 25 minutes on a free-tier T4 with the default Qwen2.5-0.5B-Instruct config.

## What the notebook does

1. Pip-installs the pinned trainer stack (`transformers==4.51.3`, `trl==0.16.1`, `peft==0.14.0`, `accelerate==1.4.0`, `bitsandbytes==0.45.5`, `datasets==3.3.2`).
2. Hits the live env Space [`anugrah55/opensleuth-env-gemini-cli`](https://huggingface.co/spaces/anugrah55/opensleuth-env-gemini-cli) at `https://anugrah55-opensleuth-env-gemini-cli.hf.space` to discover all 15 tasks (9 builtins + 6 from the Hub task dataset).
3. Builds a synthesis dataset where each row is `(signature + observed probes) → expected python implementation`.
4. Loads `Qwen2.5-0.5B-Instruct` in 4-bit + LoRA so it fits on a T4.
5. Trains with HF TRL's `GRPOTrainer` using a two-part reward:
   - **env-verifier reward**: real fuzz-tested correctness against the hidden reference, with a complexity penalty.
   - **format reward**: tiny shaping signal for emitting a fenced ```python``` code block with the right function name.
6. Optionally pushes the trained LoRA adapter to your own Hub account.
7. Runs a 3-episode smoke eval and prints the agent's emitted code.

## Links

- **Env Space (REST API the notebook calls):** https://huggingface.co/spaces/anugrah55/opensleuth-env-gemini-cli
- **Training Space (full 3B retrain):** https://huggingface.co/spaces/anugrah55/opensleuth-training-gemini-cli
- **Open-ended task catalog (Hub dataset):** https://huggingface.co/datasets/anugrah55/opensleuth-tasks

## License

Apache-2.0.