title: OpenSleuth Env
emoji: π΅οΈ
colorFrom: indigo
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
suggested_hardware: cpu-basic
OpenSleuth β Environment
FastAPI service that exposes an OpenEnv-style /reset + /step API for the
Algorithmic Detective task. An agent has to figure out an unknown Python
function by probing it, then submit Python source that replicates it.
Endpoints
| Method | Path | Body | Notes |
|---|---|---|---|
| GET | /health |
β | Liveness probe (also reports Hub-catalog status). |
| GET | /functions |
optional ?difficulty=easy|medium|hard |
Catalogue of the 9 builtin black-boxes (back-compat shape). |
| GET | /tasks |
optional ?source=builtin|hub|all |
Open-ended catalog (Level 2): builtins + Hub-loaded rows. |
| POST | /reset |
{"target_name": "fibonacci", "seed": 0} or {"target_code": "...", "target_function_name": "..."} |
Starts an episode. Caller-supplied target_code wins over target_name. |
| POST | /step |
{"episode_id": "...", "action": {...}} |
One agent action. |
| GET | /state/{eid} |
β | Inspect the live state of an episode (debug). |
Action shapes
{"action_type": "probe", "input_repr": "5"} // input_repr is parsed via ast.literal_eval
{"action_type": "submit", "code": "def fibonacci(n):..."}
Reward (v0.3 β paper-driven update)
Inspired by Masud et al. 2026 (Reward Engineering for RL in Software Tasks, arXiv:2601.19100) and Ibrahim et al. 2024 (Comprehensive Overview of Reward Engineering and Shaping, arXiv:2408.10215).
- Probe:
-1step cost, plus+2per newly-seen output,+5per newly-seen exception type, and+0.5per newly-explored input bucket (CovRL-Fuzz / SimHash-style coverage bonus). - Submit (terminal):
execution_reward β complexity_penalty β reward_hack_penalty β floor_penalty (+50 perfect bonus if 100% match)where:execution_rewardβ[0, 100]is computed over stratified fuzz inputs: spec-definededge_casesare always tested in addition to the random fuzz batch, and the per-category match counts are returned ininfo["matches_by_category"].floor_penaltyis a hard-25for sub-50% match-rate submissions (Vul-R2 style; Wen et al. 2025), preventing agents from learning that emitting any function pays out.reward_hack_penaltyfires for static import-of-reference attempts (+25) and for "constant-output" collapse against a diverse reference (+15). The sandbox additionally blocks__import__,open,eval,exec,compile, etc.
Open-ended tasks (Level 2)
The env resolves a target function from three sources, in priority order:
Caller-supplied β
POST /resetwithtarget_code+target_function_name(and optionallyedge_cases+fuzz_spec). The source is compiled in the same hardened sandbox the verifier uses for agent submissions; static-import ofopensleuth_*is rejected up front. This lets a trainer hand the env an arbitrary unseen task per rollout without any redeploy.Hub dataset β
anugrah55/opensleuth-tasks. Loaded lazily on first/reset, cached in-process. Each row has{name, target_function_name, signature, description, difficulty, source_code, edge_cases_json, fuzz_spec_json}.Builtin registry β the original 9 functions in
black_box.pyare kept as the safety-net so the in-flight trainer keeps working unchanged. Builtins win by name over Hub copies, sotarget_name="fibonacci"always resolves to the in-process oracle.
Adding new tasks
Per-reset (one-shot): pass
target_code+target_function_nameto/reset. Multi-arg signatures are supported via the auto-fuzzer (which introspectsinspect.signature+typing.get_type_hints); passedge_casesas a list of Python literal strings andfuzz_specas a per-parameter override map.Persistent: append a row to the Hub dataset and the env will pick it up on its next process-start. The bootstrap script (
opensleuth_env/scripts/bootstrap_tasks_dataset.py) is idempotent β re-running it overwrites the dataset with the latest builtin + curated rows.
# Push the curated 9 + 6 = 15-task seed catalog.
PYTHONPATH=. python -m opensleuth_env.scripts.bootstrap_tasks_dataset
Backwards compatibility
Existing trainer / eval clients only read info["execution_reward"],
info["matches"], info["fuzz_count"] and resp["reward"] β all preserved
with the same meaning. New fields (difficulty, coverage_buckets_seen,
matches_by_category, edge_pass_rate, reward_hack_penalty,
floor_penalty, perfect_bonus) are additive and ignored by older clients.
/reset retains its v0.3 shape: {"target_name": "fibonacci", "seed": 0, "max_steps": 25} works exactly as before. The four new optional fields
(target_code, target_function_name, edge_cases, fuzz_spec) are
additive. /functions returns the same shape as before (with one additive
source field). Open-ended/Hub tasks are exposed via the new /tasks
endpoint so older clients aren't surprised.
OpenEnv conformance
This Space targets the meta-pytorch / OpenEnv
v0.2.3 spec (pip install openenv-core==0.2.3). The OpenEnv-conformant
surface is mounted at /openenv/* alongside (not on top of) the legacy
endpoints listed above so the in-flight trainer keeps working unchanged.
| OpenEnv route | Path | Notes |
|---|---|---|
GET /health |
/openenv/health |
{"status": "healthy"} |
GET /metadata |
/openenv/metadata |
EnvironmentMetadata (name, description, version, ...) |
GET /schema |
/openenv/schema |
JSON schemas for action, observation, state |
GET /state |
/openenv/state |
Episode State (episode_id, step_count, ...) |
POST /reset |
/openenv/reset |
Returns {"observation", "reward", "done"} envelope |
POST /step |
/openenv/step |
Body: `{"action": {"action_type": "probe" |
WS /ws |
/openenv/ws |
Persistent session: reset β step* β state β close |
OpenSleuthEnvironment (in opensleuth_env/openenv_adapter.py) subclasses
openenv.core.env_server.interfaces.Environment, so any OpenEnv-aware
harness (openenv CLI, GenericEnvClient, TRL/torchforge integrations,
LightningAI Studio, ...) can pick it up via standard introspection.
Talking to it as an OpenEnv client
import asyncio
from openenv import GenericEnvClient, GenericAction
async def main():
base = "https://anugrah55-opensleuth-env-gemini-cli.hf.space/openenv"
async with GenericEnvClient(base_url=base) as env:
result = await env.reset(target_name="fibonacci", max_steps=8)
result = await env.step(GenericAction(action_type="probe", input_repr="10"))
print(result.observation["probe_history"][-1])
asyncio.run(main())
A runnable end-to-end example lives in example_client.py.
What is not yet conformant
- No MCP tool surface (RFC 003). Our actions are typed Pydantic models, not
MCP tools, because the underlying probe/submit semantics map cleanly to a
single
OpenSleuthActiondiscriminator. Adding MCP would be additive. - No Rubric/EvalHarness integration (RFC 004) β reward shaping lives in
opensleuth_env/env.pyand is intentionally not split into a separate rubric for now.
Hardware
CPU-only β cpu-basic is plenty. Do not assign GPU to this Space.
Running locally
pip install -r requirements.txt
uvicorn server:app --port 7860 --reload
# legacy contract: http://localhost:7860/{health,reset,step,state/{eid}}
# OpenEnv-conformant surface: http://localhost:7860/openenv/{health,reset,step,state,schema,metadata,ws}
To run only the OpenEnv conformance tests:
PYTHONPATH=. python -m pytest tests/test_openenv_conformance.py -v