File size: 8,723 Bytes
07e79cc d3cd20c 07e79cc d3cd20c 07e79cc d3cd20c 07e79cc d3cd20c 07e79cc d3cd20c 77e65fb d3cd20c 536dda7 d3cd20c 536dda7 77e65fb 536dda7 d3cd20c 77e65fb 31715b5 d3cd20c 31715b5 d3cd20c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | ---
title: OpenSleuth Env
emoji: π΅οΈ
colorFrom: indigo
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
suggested_hardware: cpu-basic
---
# OpenSleuth β Environment
FastAPI service that exposes an OpenEnv-style `/reset` + `/step` API for the
**Algorithmic Detective** task. An agent has to figure out an unknown Python
function by probing it, then submit Python source that replicates it.
## Endpoints
| Method | Path | Body | Notes |
|-------:|---------------|----------------------------------------|----------------------------------------|
| GET | `/health` | β | Liveness probe (also reports Hub-catalog status). |
| GET | `/functions` | optional `?difficulty=easy\|medium\|hard` | Catalogue of the 9 builtin black-boxes (back-compat shape). |
| GET | `/tasks` | optional `?source=builtin\|hub\|all` | Open-ended catalog (Level 2): builtins + Hub-loaded rows. |
| POST | `/reset` | `{"target_name": "fibonacci", "seed": 0}` *or* `{"target_code": "...", "target_function_name": "..."}` | Starts an episode. Caller-supplied target_code wins over target_name. |
| POST | `/step` | `{"episode_id": "...", "action": {...}}` | One agent action. |
| GET | `/state/{eid}`| β | Inspect the live state of an episode (debug). |
### Action shapes
```json
{"action_type": "probe", "input_repr": "5"} // input_repr is parsed via ast.literal_eval
{"action_type": "submit", "code": "def fibonacci(n):..."}
```
### Reward (v0.3 β paper-driven update)
Inspired by Masud et al. 2026 (*Reward Engineering for RL in Software Tasks*,
arXiv:2601.19100) and Ibrahim et al. 2024 (*Comprehensive Overview of Reward
Engineering and Shaping*, arXiv:2408.10215).
* **Probe:** `-1` step cost, plus `+2` per newly-seen output, `+5` per
newly-seen exception type, **and `+0.5` per newly-explored input bucket**
(CovRL-Fuzz / SimHash-style coverage bonus).
* **Submit (terminal):**
`execution_reward β complexity_penalty β reward_hack_penalty β floor_penalty
(+50 perfect bonus if 100% match)` where:
* `execution_reward` β `[0, 100]` is computed over **stratified** fuzz
inputs: spec-defined `edge_cases` are *always* tested in addition to the
random fuzz batch, and the per-category match counts are returned in
`info["matches_by_category"]`.
* `floor_penalty` is a hard `-25` for sub-50% match-rate submissions
(Vul-R2 style; Wen et al. 2025), preventing agents from learning that
emitting *any* function pays out.
* `reward_hack_penalty` fires for static import-of-reference attempts
(`+25`) and for "constant-output" collapse against a diverse reference
(`+15`). The sandbox additionally **blocks** `__import__`, `open`,
`eval`, `exec`, `compile`, etc.
### Open-ended tasks (Level 2)
The env resolves a target function from three sources, in priority order:
1. **Caller-supplied** β `POST /reset` with `target_code` + `target_function_name`
(and optionally `edge_cases` + `fuzz_spec`). The source is compiled in the
same hardened sandbox the verifier uses for agent submissions; static-import
of `opensleuth_*` is rejected up front. This lets a trainer hand the env an
arbitrary unseen task per rollout without any redeploy.
2. **Hub dataset** β [`anugrah55/opensleuth-tasks`](https://huggingface.co/datasets/anugrah55/opensleuth-tasks).
Loaded lazily on first `/reset`, cached in-process. Each row has
`{name, target_function_name, signature, description, difficulty,
source_code, edge_cases_json, fuzz_spec_json}`.
3. **Builtin registry** β the original 9 functions in `black_box.py` are kept
as the safety-net so the in-flight trainer keeps working unchanged. Builtins
*win* by name over Hub copies, so `target_name="fibonacci"` always resolves
to the in-process oracle.
#### Adding new tasks
* **Per-reset (one-shot)**: pass `target_code` + `target_function_name` to
`/reset`. Multi-arg signatures are supported via the auto-fuzzer (which
introspects `inspect.signature` + `typing.get_type_hints`); pass
`edge_cases` as a list of Python literal strings and `fuzz_spec` as a
per-parameter override map.
* **Persistent**: append a row to the Hub dataset and the env will pick it
up on its next process-start. The bootstrap script
(`opensleuth_env/scripts/bootstrap_tasks_dataset.py`) is idempotent β
re-running it overwrites the dataset with the latest builtin + curated
rows.
```bash
# Push the curated 9 + 6 = 15-task seed catalog.
PYTHONPATH=. python -m opensleuth_env.scripts.bootstrap_tasks_dataset
```
### Backwards compatibility
Existing trainer / eval clients only read `info["execution_reward"]`,
`info["matches"]`, `info["fuzz_count"]` and `resp["reward"]` β all preserved
with the same meaning. New fields (`difficulty`, `coverage_buckets_seen`,
`matches_by_category`, `edge_pass_rate`, `reward_hack_penalty`,
`floor_penalty`, `perfect_bonus`) are additive and ignored by older clients.
`/reset` retains its v0.3 shape: `{"target_name": "fibonacci", "seed": 0,
"max_steps": 25}` works exactly as before. The four new optional fields
(`target_code`, `target_function_name`, `edge_cases`, `fuzz_spec`) are
additive. `/functions` returns the same shape as before (with one *additive*
`source` field). Open-ended/Hub tasks are exposed via the new `/tasks`
endpoint so older clients aren't surprised.
## OpenEnv conformance
This Space targets the [meta-pytorch / OpenEnv](https://github.com/meta-pytorch/OpenEnv)
v0.2.3 spec (`pip install openenv-core==0.2.3`). The OpenEnv-conformant
surface is mounted at **`/openenv/*`** alongside (not on top of) the legacy
endpoints listed above so the in-flight trainer keeps working unchanged.
| OpenEnv route | Path | Notes |
|--------------------------|-----------------------|----------------------------------------------------------|
| `GET /health` | `/openenv/health` | `{"status": "healthy"}` |
| `GET /metadata` | `/openenv/metadata` | `EnvironmentMetadata` (name, description, version, ...) |
| `GET /schema` | `/openenv/schema` | JSON schemas for `action`, `observation`, `state` |
| `GET /state` | `/openenv/state` | Episode `State` (episode_id, step_count, ...) |
| `POST /reset` | `/openenv/reset` | Returns `{"observation", "reward", "done"}` envelope |
| `POST /step` | `/openenv/step` | Body: `{"action": {"action_type": "probe"|"submit", ...}}` |
| `WS /ws` | `/openenv/ws` | Persistent session: `reset` β `step`* β `state` β `close` |
`OpenSleuthEnvironment` (in `opensleuth_env/openenv_adapter.py`) subclasses
`openenv.core.env_server.interfaces.Environment`, so any OpenEnv-aware
harness (`openenv` CLI, `GenericEnvClient`, TRL/torchforge integrations,
LightningAI Studio, ...) can pick it up via standard introspection.
### Talking to it as an OpenEnv client
```python
import asyncio
from openenv import GenericEnvClient, GenericAction
async def main():
base = "https://anugrah55-opensleuth-env-gemini-cli.hf.space/openenv"
async with GenericEnvClient(base_url=base) as env:
result = await env.reset(target_name="fibonacci", max_steps=8)
result = await env.step(GenericAction(action_type="probe", input_repr="10"))
print(result.observation["probe_history"][-1])
asyncio.run(main())
```
A runnable end-to-end example lives in [`example_client.py`](example_client.py).
### What is *not* yet conformant
* No MCP tool surface (RFC 003). Our actions are typed Pydantic models, not
MCP tools, because the underlying probe/submit semantics map cleanly to a
single `OpenSleuthAction` discriminator. Adding MCP would be additive.
* No Rubric/EvalHarness integration (RFC 004) β reward shaping lives in
`opensleuth_env/env.py` and is intentionally not split into a separate
rubric for now.
## Hardware
CPU-only β `cpu-basic` is plenty. Do **not** assign GPU to this Space.
## Running locally
```bash
pip install -r requirements.txt
uvicorn server:app --port 7860 --reload
# legacy contract: http://localhost:7860/{health,reset,step,state/{eid}}
# OpenEnv-conformant surface: http://localhost:7860/openenv/{health,reset,step,state,schema,metadata,ws}
```
To run only the OpenEnv conformance tests:
```bash
PYTHONPATH=. python -m pytest tests/test_openenv_conformance.py -v
```
|