Spaces:

anugrah55
/

opensleuth-colab

Runtime error

App Files Files Community

opensleuth-colab / README.md

anugrah55

Initial commit: OpenSleuth Colab quickstart notebook + Gradio landing page

e8f2f91 verified 13 days ago

preview code

raw

history blame contribute delete

2.85 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

metadata

title: OpenSleuth Colab
emoji: 🕵️
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0

OpenSleuth — Colab quickstart Space

This Space is a thin landing page for the train_opensleuth_grpo.ipynb notebook — the minimum reproducible Colab for training an OpenSleuth agent end-to-end against the live env Space.

What is OpenSleuth?

An Algorithmic Detective RL environment. An LLM agent reverse-engineers an unknown black-box Python function by probing it with inputs and then submitting a Python replica. The environment scores submissions by domain-aware fuzz-testing against the hidden reference, with a complexity penalty so the agent can't just memorise its probes inside a giant if/else.

Try it

Click the badge to open the notebook in Google Colab:

Or download train_opensleuth_grpo.ipynb from the Files tab and upload it to Colab manually. Set the runtime to GPU → T4 and hit Runtime → Run all — end-to-end training completes in roughly 15 – 25 minutes on a free-tier T4 with the default Qwen2.5-0.5B-Instruct config.

What the notebook does

Pip-installs the pinned trainer stack (transformers==4.51.3, trl==0.16.1, peft==0.14.0, accelerate==1.4.0, bitsandbytes==0.45.5, datasets==3.3.2).
Hits the live env Space anugrah55/opensleuth-env-gemini-cli at https://anugrah55-opensleuth-env-gemini-cli.hf.space to discover all 15 tasks (9 builtins + 6 from the Hub task dataset).
Builds a synthesis dataset where each row is (signature + observed probes) → expected python implementation.
Loads Qwen2.5-0.5B-Instruct in 4-bit + LoRA so it fits on a T4.
Trains with HF TRL's GRPOTrainer using a two-part reward:
- env-verifier reward: real fuzz-tested correctness against the hidden reference, with a complexity penalty.
- format reward: tiny shaping signal for emitting a fenced python code block with the right function name.
Optionally pushes the trained LoRA adapter to your own Hub account.
Runs a 3-episode smoke eval and prints the agent's emitted code.

License

Apache-2.0.

OpenSleuth — Colab quickstart Space

What is OpenSleuth?

Try it

What the notebook does

Links

License