Spaces:

ronitraj
/

QuantumScribe

Sleeping

App Files Files Community

ronitraj commited on 13 days ago

Commit

0139454

verified ·

1 Parent(s): ae72eb9

Upload docs/ENVIRONMENT_API.md with huggingface_hub

Browse files

Files changed (1) hide show

docs/ENVIRONMENT_API.md +256 -0

docs/ENVIRONMENT_API.md ADDED Viewed

	@@ -0,0 +1,256 @@

+# Environment API
+Qubit-Medic exposes an OpenEnv-compliant HTTP server built on top of
+`openenv.core.create_fastapi_app`. The server wraps an internal
+`DecoderEnvironment` (Stim + PyMatching) through the standard
+`Action` / `Observation` / `State` Pydantic shapes.
+> **Simulation substrate.** Surface-code syndromes are generated with
+> **Stim** ([Gidney 2021](https://arxiv.org/abs/2103.02202), *Quantum*
+> 5:497), the field-standard Clifford simulator for quantum error
+> correction. This is the same simulation engine used by AlphaQubit
+> (Bausch et al., *Nature* 2024) and Willow (Acharya et al., 2024) —
+> training data is drawn from the same physical model as the published
+> benchmarks, not a homemade simulator.
+Source files:
+- `qubit_medic/server/openenv_adapter.py`
+- `qubit_medic/server/app.py`
+- `qubit_medic/server/environment.py`
+## OpenEnv contract
+| Method | Path | Request model | Response model |
+|--------|------|---------------|----------------|
+| POST | `/reset` | `openenv.core.types.ResetRequest` | `openenv.core.types.ResetResponse` |
+| POST | `/step` | `openenv.core.types.StepRequest` | `openenv.core.types.StepResponse` |
+| GET | `/state` | (none) | `qubit_medic.server.openenv_adapter.QubitMedicState` |
+| POST | `/state` | (none) | `dict` (mirror of GET; compliance audit 2026-04) |
+| POST | `/close` | (none) | `{"ok": True, "closed": True}` |
+| GET | `/schema` | (none) | JSON Schema for action/observation models |
+| GET | `/metadata` | (none) | `EnvironmentMetadata` |
+| GET | `/health` | (none) | liveness payload |
+| GET | `/healthz` | (none) | versions probe (Stim, PyMatching, openenv, Python) |
+| POST | `/decode` | `{"syndrome": [int], "level": str}` | PyMatching baseline result |
+The OpenEnv canonical routes (`/reset`, `/step`, `/state`, `/health`,
+`/schema`, `/metadata`, `/mcp`) are wired automatically by
+`create_fastapi_app`. The `/healthz`, `/decode`, `POST /state`,
+`POST /close`, and `/` (HTML landing) routes are mounted on top by
+`qubit_medic/server/app.py`.
+Server entry point: `python -m qubit_medic.server.app` or
+`uvicorn qubit_medic.server.app:app --host 0.0.0.0 --port 7860`.
+## Action dataclass
+```python
+class QubitMedicAction(Action):
+    """LLM-emitted action: the raw text the model generated."""
+    raw_response: str = Field(
+        default="",
+        description="Raw LLM completion text. Server parses to x/z error lists.",
+    )
+    parsed_x_errors: Optional[list[int]] = Field(
+        default=None,
+        description="Optional pre-parsed X-error qubit ids (LLM-space). "
+                    "When provided, the server skips text parsing.",
+    )
+    parsed_z_errors: Optional[list[int]] = Field(
+        default=None,
+        description="Optional pre-parsed Z-error qubit ids (LLM-space).",
+    )
+    episode_id: Optional[int] = Field(
+        default=None,
+        description="Server-assigned episode id from the matching reset(). "
+                    "If omitted, the most-recent active episode is used.",
+    )
+```
+Field-level notes:
+- `raw_response`: the canonical wire format. The server runs
+  `qubit_medic.prompts.parse_action(raw_response, num_data_qubits)` to
+  recover both error lists. Keeping the wire format as raw text means the
+  server retains full control over parsing, and unparseable outputs surface
+  cleanly via `format_compliance = 0`.
+- `parsed_x_errors` / `parsed_z_errors`: a trainer-only escape hatch for
+  baseline policies and unit tests. When set, the server formats a
+  synthetic `<answer>X: ... | Z: ...</answer>` string before parsing — the
+  same parser path runs either way, so reward semantics are identical.
+- `episode_id`: must match the `episode_id` returned by the matching
+  `reset()` call. If `None`, the adapter falls back to the most recent
+  active episode (`self._last_episode_id`). Stale or unknown ids raise
+  `ValueError` from `DecoderEnvironment.step` (compliance audit 2026-04).
+## Observation dataclass
+```python
+class QubitMedicObservation(Observation):
+    """OpenEnv observation - mirrors DecoderObservation plus done/reward."""
+    model_config = ConfigDict(extra="forbid", validate_assignment=True,
+                              arbitrary_types_allowed=True)
+    prompt: str = Field(default="", description="Pre-formatted LLM prompt.")
+    syndrome_bits: list[int] = Field(default_factory=list,
+                                     description="Detector activations (0/1).")
+    distance: int = Field(default=0, description="Code distance for this episode.")
+    rounds: int = Field(default=0, description="Number of stabilizer rounds.")
+    p: float = Field(default=0.0, description="SI1000 base error rate.")
+    curriculum_level: str = Field(default="",
+                                  description="Curriculum level name.")
+    episode_id: int = Field(default=0,
+                            description="Server-assigned episode counter.")
+    dem_digest: str = Field(default="",
+                            description="Short hash of the detector error model.")
+    info: dict[str, Any] = Field(default_factory=dict,
+                                 description="Per-step extras (reward "
+                                             "breakdown, ground-truth flip, "
+                                             "PyMatching baseline, etc.).")
+```
+Plus the standard inherited OpenEnv fields:
+- `done: bool` — `True` after every `step` (single-step episodes).
+- `reward: Optional[float]` — `None` on `reset`, the weighted total in
+  `[0, 1]` after `step`.
+`info` payload (after `step`) carries:
+| Key | Type | Meaning |
+|-----|------|---------|
+| `rewards` | `dict[str, float]` | Per-component breakdown (`logical_correction`, `syndrome_consistency`, `hamming_overlap`, `format_compliance`, `pymatching_beat`, `total`) |
+| `parsed_action` | `dict` | Deserialised `DecoderAction` (parsed x/z lists, `parse_success`) |
+| `actual_observable_flip` | `int` | Stim ground-truth flip of the logical Z observable |
+| `pymatching_observable_pred` | `int` | PyMatching's predicted observable flip |
+| `pymatching_x_errors` | `list[int]` | PyMatching reference Pauli frame, X axis |
+| `pymatching_z_errors` | `list[int]` | PyMatching reference Pauli frame, Z axis |
+| `elapsed_seconds` | `float` | Wall time between `reset` and `step` |
+| `timed_out` | `bool` | `True` iff `elapsed > EPISODE_TIMEOUT_SECONDS` |
+| `curriculum_stats` | `dict` | Live promotion-tracker counters |
+## State dataclass
+```python
+class QubitMedicState(State):
+    """Externally-visible state. Physics-truth fields stay server-side."""
+    model_config = ConfigDict(extra="allow", validate_assignment=True,
+                              arbitrary_types_allowed=True)
+    episodes_started: int = 0
+    active_episodes: int = 0
+    cached_levels: list[str] = Field(default_factory=list)
+    curriculum: dict[str, Any] = Field(default_factory=dict)
+    last_reward_breakdown: Optional[dict[str, float]] = None
+```
+The adapter populates a few inherited base-class fields too: `episode_id`
+(stringified) and `step_count` (which equals `episodes_started`).
+Crucially, `QubitMedicState` deliberately omits the ground-truth fields
+held by the inner `DecoderState`: `true_x_errors`, `true_z_errors`,
+`actual_observable_flip`, `pymatching_observable_pred`, `circuit_text`,
+`dem_text`. Those are visible only inside the reward functions — see
+`docs/REWARD_HACKING.md`.
+## Episode lifecycle
+Single-step episodes (`done=True` after every `step`):
+```
+client                                 server
+------                                 ------
+POST /reset            ────────────►   scheduler.sample(level)
+                                       _cache_for(level)            (compile Stim circuit
+                                                                     and PyMatching matrix
+                                                                     once per level)
+                                       sample_episode(seed)         (Stim shot ->
+                                                                     syndrome bits +
+                                                                     observable flip)
+                                       build_prompt(...)
+                       ◄────────────   Observation { prompt,
+                                                     syndrome_bits,
+                                                     distance, rounds, p,
+                                                     curriculum_level,
+                                                     episode_id,
+                                                     dem_digest,
+                                                     done=False,
+                                                     reward=None }
+POST /step (action)    ────────────►   parse_action(raw_response)
+                                       compute_all_rewards(...)
+                                       scheduler.update(...)        (curriculum promotion)
+                       ◄────────────   Observation { ..., done=True,
+                                                     reward=total,
+                                                     info={rewards: {...},
+                                                           ...} }
+```
+Calling `step()` with an unknown `episode_id` raises `ValueError` (turned
+into HTTP 400). Calling `step()` after `EPISODE_TIMEOUT_SECONDS` returns
+all-zero rewards and `info["timed_out"] = True`.
+## Reward computation
+After parsing, the env converts predicted qubit IDs from LLM-space
+(`0..num_data_qubits-1`) into Stim's internal coordinate system via
+`layout.llm_to_stim`, then runs `compute_all_rewards`
+(`qubit_medic/server/rewards.py`). Each of the five rewards is a pure
+function over `(parsed, sample, layout, final_detector_supports)`; the
+combined total is a weighted sum (weights in
+`qubit_medic.config.REWARD_WEIGHTS`, mirrored in `openenv.yaml`) clamped
+to `[0, 1]`. The breakdown is exposed in `info["rewards"]`, the curriculum
+scheduler is updated using only `logical_correction`, and the episode
+bookkeeping is dropped (`self._active.pop(episode_id)`). See
+`docs/REWARD_HACKING.md` for the per-reward semantics.
+## Curriculum
+Source: `openenv.yaml` (`curriculum:` block) plus
+`qubit_medic.server.curriculum.CurriculumScheduler`.
+| Level | Distance | Rounds | p (SI1000) | Promotion threshold |
+|-------|----------|--------|------------|---------------------|
+| `L1_warmup` | 3 | 1 | 0.0001 | 0.80 |
+| `L2_target` | 3 | 3 | 0.001 | 0.70 |
+| `L3_stretch` | 5 | 5 | 0.001 | 0.30 |
+The scheduler samples a level on each `reset()`. Promotion thresholds
+gate progression via the running `logical_correction` rate at the current
+level. Levels `L1_warmup` and `L2_target` are pre-warmed at server boot
+(`_get_shared_inner` in the adapter calls `_cache_for` on both);
+`L3_stretch` compiles lazily on first selection.
+## Local rollout example
+```python
+from qubit_medic.server.openenv_adapter import (
+    QubitMedicAction,
+    QubitMedicEnvironment,
+)
+env = QubitMedicEnvironment()
+obs = env.reset(seed=42)                 # QubitMedicObservation
+print("level:", obs.curriculum_level, "syndrome bits:", len(obs.syndrome_bits))
+print("prompt preview:", obs.prompt[:120], "...")
+# Pretend the LLM emitted nothing useful: the parser will return empty
+# lists, format_compliance = 0, syndrome_consistency capped at 0.5.
+action = QubitMedicAction(
+    raw_response="X_ERRORS=[]\nZ_ERRORS=[]",
+    episode_id=obs.episode_id,
+)
+result = env.step(action)
+print("reward:", result.reward, "done:", result.done)
+print("breakdown:", result.info["rewards"])
+print("pymatching reference frame:", result.info["pymatching_x_errors"],
+      result.info["pymatching_z_errors"])
+```
+For HTTP usage, hit the live server with `curl` against `/reset` then
+`/step` (see the swagger UI at `/docs`), or use any OpenEnv-compatible
+client.