Spaces:

anugrah55
/

opensleuth-colab

Runtime error

App Files Files Community

opensleuth-colab / README.md

anugrah55

Initial commit: OpenSleuth Colab quickstart notebook + Gradio landing page

e8f2f91 verified 13 days ago

preview code

raw

history blame contribute delete

2.85 kB

	---
	title: OpenSleuth Colab
	emoji: 🕵️
	colorFrom: indigo
	colorTo: green
	sdk: gradio
	sdk_version: 4.44.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	---

	# OpenSleuth — Colab quickstart Space

	This Space is a thin landing page for the [`train_opensleuth_grpo.ipynb`](./train_opensleuth_grpo.ipynb) notebook — the minimum reproducible Colab for training an OpenSleuth agent end-to-end against the live env Space.

	## What is OpenSleuth?

	An Algorithmic Detective RL environment. An LLM agent reverse-engineers an unknown black-box Python function by probing it with inputs and then submitting a Python replica. The environment scores submissions by domain-aware fuzz-testing against the hidden reference, with a complexity penalty so the agent can't just memorise its probes inside a giant `if/else`.

	## Try it

	Click the badge to open the notebook in Google Colab:

	[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/#fileId=https%3A//huggingface.co/spaces/anugrah55/opensleuth-colab/blob/main/train_opensleuth_grpo.ipynb)

	Or download `train_opensleuth_grpo.ipynb` from the Files tab and upload it to Colab manually. Set the runtime to GPU → T4 and hit Runtime → Run all — end-to-end training completes in roughly 15 – 25 minutes on a free-tier T4 with the default Qwen2.5-0.5B-Instruct config.

	## What the notebook does

	1. Pip-installs the pinned trainer stack (`transformers==4.51.3`, `trl==0.16.1`, `peft==0.14.0`, `accelerate==1.4.0`, `bitsandbytes==0.45.5`, `datasets==3.3.2`).
	2. Hits the live env Space [`anugrah55/opensleuth-env-gemini-cli`](https://huggingface.co/spaces/anugrah55/opensleuth-env-gemini-cli) at `https://anugrah55-opensleuth-env-gemini-cli.hf.space` to discover all 15 tasks (9 builtins + 6 from the Hub task dataset).
	3. Builds a synthesis dataset where each row is `(signature + observed probes) → expected python implementation`.
	4. Loads `Qwen2.5-0.5B-Instruct` in 4-bit + LoRA so it fits on a T4.
	5. Trains with HF TRL's `GRPOTrainer` using a two-part reward:
	- env-verifier reward: real fuzz-tested correctness against the hidden reference, with a complexity penalty.
	- format reward: tiny shaping signal for emitting a fenced ```python``` code block with the right function name.
	6. Optionally pushes the trained LoRA adapter to your own Hub account.
	7. Runs a 3-episode smoke eval and prints the agent's emitted code.

	## Links

	- Env Space (REST API the notebook calls): https://huggingface.co/spaces/anugrah55/opensleuth-env-gemini-cli
	- Training Space (full 3B retrain): https://huggingface.co/spaces/anugrah55/opensleuth-training-gemini-cli
	- Open-ended task catalog (Hub dataset): https://huggingface.co/datasets/anugrah55/opensleuth-tasks

	## License

	Apache-2.0.