Spaces:

ronitraj
/

QuantumScribe

Sleeping

App Files Files Community

QuantumScribe / docs /ENVIRONMENT_API.md

ronitraj

deploy via scripts/deploy_to_space.py

1b1af24 verified 12 days ago

preview code

raw

history blame contribute delete

12 kB

Environment API

QuantumScribe exposes an OpenEnv-compliant HTTP server built on top of openenv.core.create_fastapi_app. The server wraps an internal DecoderEnvironment (Stim + PyMatching) through the standard Action / Observation / State Pydantic shapes.

Simulation substrate. Surface-code syndromes are generated with Stim (Gidney 2021, Quantum 5:497), the field-standard Clifford simulator for quantum error correction. This is the same simulation engine used by AlphaQubit (Bausch et al., Nature 2024) and Willow (Acharya et al., 2024) — training data is drawn from the same physical model as the published benchmarks, not a homemade simulator.

Source files:

qubit_medic/server/openenv_adapter.py
qubit_medic/server/app.py
qubit_medic/server/environment.py

OpenEnv contract

Method	Path	Request model	Response model
POST	`/reset`	`openenv.core.types.ResetRequest`	`openenv.core.types.ResetResponse`
POST	`/step`	`openenv.core.types.StepRequest`	`openenv.core.types.StepResponse`
GET	`/state`	(none)	`qubit_medic.server.openenv_adapter.QubitMedicState`
POST	`/state`	(none)	`dict` (mirror of GET; compliance audit 2026-04)
POST	`/close`	(none)	`{"ok": True, "closed": True}`
GET	`/schema`	(none)	JSON Schema for action/observation models
GET	`/metadata`	(none)	`EnvironmentMetadata`
GET	`/health`	(none)	liveness payload
GET	`/healthz`	(none)	versions probe (Stim, PyMatching, openenv, Python)
POST	`/decode`	`{"syndrome": [int], "level": str}`	PyMatching baseline result

The OpenEnv canonical routes (/reset, /step, /state, /health, /schema, /metadata, /mcp) are wired automatically by create_fastapi_app. The /healthz, /decode, POST /state, POST /close, and / (HTML landing) routes are mounted on top by qubit_medic/server/app.py.

Server entry point: python -m qubit_medic.server.app or uvicorn qubit_medic.server.app:app --host 0.0.0.0 --port 7860.

Action dataclass

class QubitMedicAction(Action):
    """LLM-emitted action: the raw text the model generated."""

    raw_response: str = Field(
        default="",
        description="Raw LLM completion text. Server parses to x/z error lists.",
    )
    parsed_x_errors: Optional[list[int]] = Field(
        default=None,
        description="Optional pre-parsed X-error qubit ids (LLM-space). "
                    "When provided, the server skips text parsing.",
    )
    parsed_z_errors: Optional[list[int]] = Field(
        default=None,
        description="Optional pre-parsed Z-error qubit ids (LLM-space).",
    )
    episode_id: Optional[int] = Field(
        default=None,
        description="Server-assigned episode id from the matching reset(). "
                    "If omitted, the most-recent active episode is used.",
    )

Field-level notes:

raw_response: the canonical wire format. The server runs qubit_medic.prompts.parse_action(raw_response, num_data_qubits) to recover both error lists. Keeping the wire format as raw text means the server retains full control over parsing, and unparseable outputs surface cleanly via format_compliance = 0.
parsed_x_errors / parsed_z_errors: a trainer-only escape hatch for baseline policies and unit tests. When set, the server formats a synthetic <answer>X: ... | Z: ...</answer> string before parsing — the same parser path runs either way, so reward semantics are identical.
episode_id: must match the episode_id returned by the matching reset() call. If None, the adapter falls back to the most recent active episode (self._last_episode_id). Stale or unknown ids raise ValueError from DecoderEnvironment.step (compliance audit 2026-04).

Observation dataclass

class QubitMedicObservation(Observation):
    """OpenEnv observation - mirrors DecoderObservation plus done/reward."""

    model_config = ConfigDict(extra="forbid", validate_assignment=True,
                              arbitrary_types_allowed=True)

    prompt: str = Field(default="", description="Pre-formatted LLM prompt.")
    syndrome_bits: list[int] = Field(default_factory=list,
                                     description="Detector activations (0/1).")
    distance: int = Field(default=0, description="Code distance for this episode.")
    rounds: int = Field(default=0, description="Number of stabilizer rounds.")
    p: float = Field(default=0.0, description="SI1000 base error rate.")
    curriculum_level: str = Field(default="",
                                  description="Curriculum level name.")
    episode_id: int = Field(default=0,
                            description="Server-assigned episode counter.")
    dem_digest: str = Field(default="",
                            description="Short hash of the detector error model.")
    info: dict[str, Any] = Field(default_factory=dict,
                                 description="Per-step extras (reward "
                                             "breakdown, ground-truth flip, "
                                             "PyMatching baseline, etc.).")

Plus the standard inherited OpenEnv fields:

done: bool — True after every step (single-step episodes).
reward: Optional[float] — None on reset, the weighted total in [0, 1] after step.

info payload (after step) carries:

Key	Type	Meaning
`rewards`	`dict[str, float]`	Per-component breakdown (`logical_correction`, `syndrome_consistency`, `hamming_overlap`, `format_compliance`, `pymatching_beat`, `total`)
`parsed_action`	`dict`	Deserialised `DecoderAction` (parsed x/z lists, `parse_success`)
`actual_observable_flip`	`int`	Stim ground-truth flip of the logical Z observable
`pymatching_observable_pred`	`int`	PyMatching's predicted observable flip
`pymatching_x_errors`	`list[int]`	PyMatching reference Pauli frame, X axis
`pymatching_z_errors`	`list[int]`	PyMatching reference Pauli frame, Z axis
`elapsed_seconds`	`float`	Wall time between `reset` and `step`
`timed_out`	`bool`	`True` iff `elapsed > EPISODE_TIMEOUT_SECONDS`
`curriculum_stats`	`dict`	Live promotion-tracker counters

State dataclass

class QubitMedicState(State):
    """Externally-visible state. Physics-truth fields stay server-side."""

    model_config = ConfigDict(extra="allow", validate_assignment=True,
                              arbitrary_types_allowed=True)

    episodes_started: int = 0
    active_episodes: int = 0
    cached_levels: list[str] = Field(default_factory=list)
    curriculum: dict[str, Any] = Field(default_factory=dict)
    last_reward_breakdown: Optional[dict[str, float]] = None

The adapter populates a few inherited base-class fields too: episode_id (stringified) and step_count (which equals episodes_started).

Crucially, QubitMedicState deliberately omits the ground-truth fields held by the inner DecoderState: true_x_errors, true_z_errors, actual_observable_flip, pymatching_observable_pred, circuit_text, dem_text. Those are visible only inside the reward functions — see docs/REWARD_HACKING.md.

Episode lifecycle

Single-step episodes (done=True after every step):

client                                 server
------                                 ------
POST /reset            ────────────►   scheduler.sample(level)
                                       _cache_for(level)            (compile Stim circuit
                                                                     and PyMatching matrix
                                                                     once per level)
                                       sample_episode(seed)         (Stim shot ->
                                                                     syndrome bits +
                                                                     observable flip)
                                       build_prompt(...)
                       ◄────────────   Observation { prompt,
                                                     syndrome_bits,
                                                     distance, rounds, p,
                                                     curriculum_level,
                                                     episode_id,
                                                     dem_digest,
                                                     done=False,
                                                     reward=None }

POST /step (action)    ────────────►   parse_action(raw_response)
                                       compute_all_rewards(...)
                                       scheduler.update(...)        (curriculum promotion)
                       ◄────────────   Observation { ..., done=True,
                                                     reward=total,
                                                     info={rewards: {...},
                                                           ...} }

Calling step() with an unknown episode_id raises ValueError (turned into HTTP 400). Calling step() after EPISODE_TIMEOUT_SECONDS returns all-zero rewards and info["timed_out"] = True.

Reward computation

After parsing, the env converts predicted qubit IDs from LLM-space (0..num_data_qubits-1) into Stim's internal coordinate system via layout.llm_to_stim, then runs compute_all_rewards (qubit_medic/server/rewards.py). Each of the five rewards is a pure function over (parsed, sample, layout, final_detector_supports); the combined total is a weighted sum (weights in qubit_medic.config.REWARD_WEIGHTS, mirrored in openenv.yaml) clamped to [0, 1]. The breakdown is exposed in info["rewards"], the curriculum scheduler is updated using only logical_correction, and the episode bookkeeping is dropped (self._active.pop(episode_id)). See docs/REWARD_HACKING.md for the per-reward semantics.

Curriculum

Source: openenv.yaml (curriculum: block) plus qubit_medic.server.curriculum.CurriculumScheduler.

Level	Distance	Rounds	p (SI1000)	Promotion threshold
`L1_warmup`	3	1	0.0001	0.80
`L2_target`	3	3	0.001	0.70
`L3_stretch`	5	5	0.001	0.30

The scheduler samples a level on each reset(). Promotion thresholds gate progression via the running logical_correction rate at the current level. Levels L1_warmup and L2_target are pre-warmed at server boot (_get_shared_inner in the adapter calls _cache_for on both); L3_stretch compiles lazily on first selection.

Local rollout example

from qubit_medic.server.openenv_adapter import (
    QubitMedicAction,
    QubitMedicEnvironment,
)

env = QubitMedicEnvironment()
obs = env.reset(seed=42)                 # QubitMedicObservation
print("level:", obs.curriculum_level, "syndrome bits:", len(obs.syndrome_bits))
print("prompt preview:", obs.prompt[:120], "...")

# Pretend the LLM emitted nothing useful: the parser will return empty
# lists, format_compliance = 0, syndrome_consistency capped at 0.5.
action = QubitMedicAction(
    raw_response="X_ERRORS=[]\nZ_ERRORS=[]",
    episode_id=obs.episode_id,
)
result = env.step(action)
print("reward:", result.reward, "done:", result.done)
print("breakdown:", result.info["rewards"])
print("pymatching reference frame:", result.info["pymatching_x_errors"],
      result.info["pymatching_z_errors"])

For HTTP usage, hit the live server with curl against /reset then /step (see the swagger UI at /docs), or use any OpenEnv-compatible client.