Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available: 6.14.0
metadata
title: OpenSleuth — Live Agent Demo
emoji: 🕵
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
suggested_hardware: cpu-basic
suggested_storage: small
short_description: Watch an LLM reverse-engineer a hidden Python fn live
OpenSleuth — live agent demo
Pick a hidden black-box Python function from the OpenSleuth catalog (15
tasks: easy → hard, mix of builtin and Hub-pushed). Pick an agent backend
(oracle, base Qwen 0.5B, trained Qwen 0.5B (LoRA), trained Qwen 3B (LoRA)). Watch the agent:
- Probe the env (6 inputs drawn from the same auto-fuzzer the verifier
uses), one at a time, with each
(input → output)pair streamed live. - Submit a Python replica of the hidden function.
- Get verified by the env's domain-aware fuzzer: 100 random inputs + the spec's must-pass edge cases, with stratified pass-rates and a reward breakdown (execution / edge / complexity / hack penalties / perfect bonus).
The submitted code is shown syntax-highlighted, and an optional accordion
runs a quick oracle vs trained-0.5b head-to-head reward comparison on
the selected task.
Backends
| Backend | Source | Notes |
|---|---|---|
oracle |
oracle.py reference impl |
Always +100; sanity-checks the env. |
base Qwen 0.5B |
Qwen/Qwen2.5-0.5B-Instruct |
No fine-tuning. |
trained Qwen 0.5B (LoRA) |
anugrah55/opensleuth-qwen2.5-0.5b-grpo |
GRPO LoRA on top of base 0.5B. |
trained Qwen 3B (LoRA) |
anugrah55/opensleuth-qwen2.5-3b-grpo-v2 |
3B GRPO run; falls back to "adapter not yet trained" if the repo has no weights yet. |
Architecture
[demo Space] ──HTTP──> [env Space]
│ /tasks, /tasks/{name}/sample_inputs,
│ /reset, /step (probe + submit)
│
└─ HF model load (lazy, cached): base + optional LoRA on CPU
- The env Space is
anugrah55/opensleuth-env-gemini-cli. - The task catalog is
anugrah55/opensleuth-tasks.
CPU-basic notes
The demo runs on CPU-basic. First generation per backend cold-loads the model (~30–90s for 0.5B). To keep latency bounded:
MAX_NEW_TOKENS=192- Models are cached across runs (in-process LRU).
- The 3B backend will only attempt a real load if the adapter repo has weights pushed; otherwise it short-circuits to a clear UI message.
Configure with env vars:
| Env var | Default |
|---|---|
OPENSLEUTH_ENV_URL |
https://anugrah55-opensleuth-env-gemini-cli.hf.space |
BASE_MODEL_ID |
Qwen/Qwen2.5-0.5B-Instruct |
BASE_MODEL_3B_ID |
Qwen/Qwen2.5-3B-Instruct |
ADAPTER_05B_ID |
anugrah55/opensleuth-qwen2.5-0.5b-grpo |
ADAPTER_3B_ID |
anugrah55/opensleuth-qwen2.5-3b-grpo-v2 |
MAX_NEW_TOKENS |
192 |
N_PROBES |
6 |
HF_TOKEN |
(optional, set as Space secret for gated models) |