Spaces:
Sleeping
Sleeping
File size: 11,982 Bytes
0139454 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 | # Environment API
Qubit-Medic exposes an OpenEnv-compliant HTTP server built on top of
`openenv.core.create_fastapi_app`. The server wraps an internal
`DecoderEnvironment` (Stim + PyMatching) through the standard
`Action` / `Observation` / `State` Pydantic shapes.
> **Simulation substrate.** Surface-code syndromes are generated with
> **Stim** ([Gidney 2021](https://arxiv.org/abs/2103.02202), *Quantum*
> 5:497), the field-standard Clifford simulator for quantum error
> correction. This is the same simulation engine used by AlphaQubit
> (Bausch et al., *Nature* 2024) and Willow (Acharya et al., 2024) β
> training data is drawn from the same physical model as the published
> benchmarks, not a homemade simulator.
Source files:
- `qubit_medic/server/openenv_adapter.py`
- `qubit_medic/server/app.py`
- `qubit_medic/server/environment.py`
## OpenEnv contract
| Method | Path | Request model | Response model |
|--------|------|---------------|----------------|
| POST | `/reset` | `openenv.core.types.ResetRequest` | `openenv.core.types.ResetResponse` |
| POST | `/step` | `openenv.core.types.StepRequest` | `openenv.core.types.StepResponse` |
| GET | `/state` | (none) | `qubit_medic.server.openenv_adapter.QubitMedicState` |
| POST | `/state` | (none) | `dict` (mirror of GET; compliance audit 2026-04) |
| POST | `/close` | (none) | `{"ok": True, "closed": True}` |
| GET | `/schema` | (none) | JSON Schema for action/observation models |
| GET | `/metadata` | (none) | `EnvironmentMetadata` |
| GET | `/health` | (none) | liveness payload |
| GET | `/healthz` | (none) | versions probe (Stim, PyMatching, openenv, Python) |
| POST | `/decode` | `{"syndrome": [int], "level": str}` | PyMatching baseline result |
The OpenEnv canonical routes (`/reset`, `/step`, `/state`, `/health`,
`/schema`, `/metadata`, `/mcp`) are wired automatically by
`create_fastapi_app`. The `/healthz`, `/decode`, `POST /state`,
`POST /close`, and `/` (HTML landing) routes are mounted on top by
`qubit_medic/server/app.py`.
Server entry point: `python -m qubit_medic.server.app` or
`uvicorn qubit_medic.server.app:app --host 0.0.0.0 --port 7860`.
## Action dataclass
```python
class QubitMedicAction(Action):
"""LLM-emitted action: the raw text the model generated."""
raw_response: str = Field(
default="",
description="Raw LLM completion text. Server parses to x/z error lists.",
)
parsed_x_errors: Optional[list[int]] = Field(
default=None,
description="Optional pre-parsed X-error qubit ids (LLM-space). "
"When provided, the server skips text parsing.",
)
parsed_z_errors: Optional[list[int]] = Field(
default=None,
description="Optional pre-parsed Z-error qubit ids (LLM-space).",
)
episode_id: Optional[int] = Field(
default=None,
description="Server-assigned episode id from the matching reset(). "
"If omitted, the most-recent active episode is used.",
)
```
Field-level notes:
- `raw_response`: the canonical wire format. The server runs
`qubit_medic.prompts.parse_action(raw_response, num_data_qubits)` to
recover both error lists. Keeping the wire format as raw text means the
server retains full control over parsing, and unparseable outputs surface
cleanly via `format_compliance = 0`.
- `parsed_x_errors` / `parsed_z_errors`: a trainer-only escape hatch for
baseline policies and unit tests. When set, the server formats a
synthetic `<answer>X: ... | Z: ...</answer>` string before parsing β the
same parser path runs either way, so reward semantics are identical.
- `episode_id`: must match the `episode_id` returned by the matching
`reset()` call. If `None`, the adapter falls back to the most recent
active episode (`self._last_episode_id`). Stale or unknown ids raise
`ValueError` from `DecoderEnvironment.step` (compliance audit 2026-04).
## Observation dataclass
```python
class QubitMedicObservation(Observation):
"""OpenEnv observation - mirrors DecoderObservation plus done/reward."""
model_config = ConfigDict(extra="forbid", validate_assignment=True,
arbitrary_types_allowed=True)
prompt: str = Field(default="", description="Pre-formatted LLM prompt.")
syndrome_bits: list[int] = Field(default_factory=list,
description="Detector activations (0/1).")
distance: int = Field(default=0, description="Code distance for this episode.")
rounds: int = Field(default=0, description="Number of stabilizer rounds.")
p: float = Field(default=0.0, description="SI1000 base error rate.")
curriculum_level: str = Field(default="",
description="Curriculum level name.")
episode_id: int = Field(default=0,
description="Server-assigned episode counter.")
dem_digest: str = Field(default="",
description="Short hash of the detector error model.")
info: dict[str, Any] = Field(default_factory=dict,
description="Per-step extras (reward "
"breakdown, ground-truth flip, "
"PyMatching baseline, etc.).")
```
Plus the standard inherited OpenEnv fields:
- `done: bool` β `True` after every `step` (single-step episodes).
- `reward: Optional[float]` β `None` on `reset`, the weighted total in
`[0, 1]` after `step`.
`info` payload (after `step`) carries:
| Key | Type | Meaning |
|-----|------|---------|
| `rewards` | `dict[str, float]` | Per-component breakdown (`logical_correction`, `syndrome_consistency`, `hamming_overlap`, `format_compliance`, `pymatching_beat`, `total`) |
| `parsed_action` | `dict` | Deserialised `DecoderAction` (parsed x/z lists, `parse_success`) |
| `actual_observable_flip` | `int` | Stim ground-truth flip of the logical Z observable |
| `pymatching_observable_pred` | `int` | PyMatching's predicted observable flip |
| `pymatching_x_errors` | `list[int]` | PyMatching reference Pauli frame, X axis |
| `pymatching_z_errors` | `list[int]` | PyMatching reference Pauli frame, Z axis |
| `elapsed_seconds` | `float` | Wall time between `reset` and `step` |
| `timed_out` | `bool` | `True` iff `elapsed > EPISODE_TIMEOUT_SECONDS` |
| `curriculum_stats` | `dict` | Live promotion-tracker counters |
## State dataclass
```python
class QubitMedicState(State):
"""Externally-visible state. Physics-truth fields stay server-side."""
model_config = ConfigDict(extra="allow", validate_assignment=True,
arbitrary_types_allowed=True)
episodes_started: int = 0
active_episodes: int = 0
cached_levels: list[str] = Field(default_factory=list)
curriculum: dict[str, Any] = Field(default_factory=dict)
last_reward_breakdown: Optional[dict[str, float]] = None
```
The adapter populates a few inherited base-class fields too: `episode_id`
(stringified) and `step_count` (which equals `episodes_started`).
Crucially, `QubitMedicState` deliberately omits the ground-truth fields
held by the inner `DecoderState`: `true_x_errors`, `true_z_errors`,
`actual_observable_flip`, `pymatching_observable_pred`, `circuit_text`,
`dem_text`. Those are visible only inside the reward functions β see
`docs/REWARD_HACKING.md`.
## Episode lifecycle
Single-step episodes (`done=True` after every `step`):
```
client server
------ ------
POST /reset βββββββββββββΊ scheduler.sample(level)
_cache_for(level) (compile Stim circuit
and PyMatching matrix
once per level)
sample_episode(seed) (Stim shot ->
syndrome bits +
observable flip)
build_prompt(...)
βββββββββββββ Observation { prompt,
syndrome_bits,
distance, rounds, p,
curriculum_level,
episode_id,
dem_digest,
done=False,
reward=None }
POST /step (action) βββββββββββββΊ parse_action(raw_response)
compute_all_rewards(...)
scheduler.update(...) (curriculum promotion)
βββββββββββββ Observation { ..., done=True,
reward=total,
info={rewards: {...},
...} }
```
Calling `step()` with an unknown `episode_id` raises `ValueError` (turned
into HTTP 400). Calling `step()` after `EPISODE_TIMEOUT_SECONDS` returns
all-zero rewards and `info["timed_out"] = True`.
## Reward computation
After parsing, the env converts predicted qubit IDs from LLM-space
(`0..num_data_qubits-1`) into Stim's internal coordinate system via
`layout.llm_to_stim`, then runs `compute_all_rewards`
(`qubit_medic/server/rewards.py`). Each of the five rewards is a pure
function over `(parsed, sample, layout, final_detector_supports)`; the
combined total is a weighted sum (weights in
`qubit_medic.config.REWARD_WEIGHTS`, mirrored in `openenv.yaml`) clamped
to `[0, 1]`. The breakdown is exposed in `info["rewards"]`, the curriculum
scheduler is updated using only `logical_correction`, and the episode
bookkeeping is dropped (`self._active.pop(episode_id)`). See
`docs/REWARD_HACKING.md` for the per-reward semantics.
## Curriculum
Source: `openenv.yaml` (`curriculum:` block) plus
`qubit_medic.server.curriculum.CurriculumScheduler`.
| Level | Distance | Rounds | p (SI1000) | Promotion threshold |
|-------|----------|--------|------------|---------------------|
| `L1_warmup` | 3 | 1 | 0.0001 | 0.80 |
| `L2_target` | 3 | 3 | 0.001 | 0.70 |
| `L3_stretch` | 5 | 5 | 0.001 | 0.30 |
The scheduler samples a level on each `reset()`. Promotion thresholds
gate progression via the running `logical_correction` rate at the current
level. Levels `L1_warmup` and `L2_target` are pre-warmed at server boot
(`_get_shared_inner` in the adapter calls `_cache_for` on both);
`L3_stretch` compiles lazily on first selection.
## Local rollout example
```python
from qubit_medic.server.openenv_adapter import (
QubitMedicAction,
QubitMedicEnvironment,
)
env = QubitMedicEnvironment()
obs = env.reset(seed=42) # QubitMedicObservation
print("level:", obs.curriculum_level, "syndrome bits:", len(obs.syndrome_bits))
print("prompt preview:", obs.prompt[:120], "...")
# Pretend the LLM emitted nothing useful: the parser will return empty
# lists, format_compliance = 0, syndrome_consistency capped at 0.5.
action = QubitMedicAction(
raw_response="X_ERRORS=[]\nZ_ERRORS=[]",
episode_id=obs.episode_id,
)
result = env.step(action)
print("reward:", result.reward, "done:", result.done)
print("breakdown:", result.info["rewards"])
print("pymatching reference frame:", result.info["pymatching_x_errors"],
result.info["pymatching_z_errors"])
```
For HTTP usage, hit the live server with `curl` against `/reset` then
`/step` (see the swagger UI at `/docs`), or use any OpenEnv-compatible
client.
|