Spaces:
Sleeping
Sleeping
| title: StabilizerForge — Quantum-Code Synthesis Environment | |
| emoji: ⚛️ | |
| colorFrom: indigo | |
| colorTo: green | |
| sdk: docker | |
| pinned: false | |
| app_port: 8000 | |
| base_path: /web | |
| tags: | |
| - openenv | |
| - quantum-error-correction | |
| - stim | |
| - stabilizer-codes | |
| - rlvr | |
| # StabilizerForge — OpenEnv environment for Clifford circuit synthesis | |
| An RL environment that scores candidate Clifford encoding circuits against a target stabilizer code in **polynomial time** using [Stim](https://github.com/quantumlib/Stim)'s tableau simulator (Aaronson–Gottesman). Built for training small LLMs to do automated quantum-error-correction code synthesis with verifier-grounded rewards (RLVR). | |
| The environment ships with **29 training tasks + 10 held-out eval tasks** across three difficulty tiers — from Bell states (2 qubits, 2 gates) up to distance-5 surface, color, and Golay codes (≤25 qubits, 100+ gates). | |
| ## Action / Observation / Reward | |
| **Action** — one Clifford gate per step (or `FINALIZE`): | |
| ```python | |
| from stabilizer_forge import StabilizerAction | |
| StabilizerAction(op="H", qubits=[0]) # Hadamard on qubit 0 | |
| StabilizerAction(op="S", qubits=[3]) # phase gate on qubit 3 | |
| StabilizerAction(op="CX", qubits=[0, 1]) # CNOT 0 -> 1 | |
| StabilizerAction(op="FINALIZE") # end episode, deliver terminal reward | |
| ``` | |
| Pydantic validation; malformed actions get a format penalty and are treated as no-ops. After 5 consecutive format violations the episode terminates. | |
| **Observation** — full state for the current episode: | |
| | field | type | meaning | | |
| |-------|------|---------| | |
| | `task_id` | str | which task this episode is running | | |
| | `target_stabilizers` | `list[str]` | Pauli strings, e.g. `["XZZXI", "IXZZX", ...]` | | |
| | `n_qubits` | int | number of physical qubits | | |
| | `gates_so_far` | `list[str]` | Stim instructions applied this episode | | |
| | `current_circuit` | str | concatenated Stim text | | |
| | `current_match` | `list[bool]` | per-stabilizer preservation under the current circuit (live from Stim) | | |
| | `match_fraction` | float | fraction of target stabilizers preserved (0..1) | | |
| | `gates_emitted` | int | valid gates applied so far | | |
| | `cnot_count` | int | CX count | | |
| | `nonadj_cnot_count` | int | CXs across non-adjacent qubits | | |
| | `gate_budget` / `gate_budget_remaining` | int | hard cap (`2 × benchmark_optimum`) | | |
| | `benchmark_optimum` / `benchmark_optimum_2q` | int | reference encoder's gate counts | | |
| | `connectivity_edges` | `list[list[int]] \| None` | None = all-to-all | | |
| | `format_violations`, `consecutive_violations` | int | error tracking | | |
| | `last_action_valid`, `last_action_error` | bool, str | parser feedback | | |
| | `step_count`, `finalized` | | | | |
| **Reward** — delivered at `FINALIZE` plus dense per-step shaping: | |
| | component | weight | what it measures | | |
| |-----------|--------|------------------| | |
| | stabilizer-match fraction | **0.40** | primary correctness signal | | |
| | gate-count efficiency `max(0, 1 − gates / (1.5 × bench_opt))` | 0.20 | volume vs. reference | | |
| | two-qubit-gate efficiency | 0.20 | CXs are expensive on real hardware | | |
| | connectivity respect | 0.10 | −1 per CX across non-adjacent qubits | | |
| | format compliance | 0.10 | −1 per malformed action | | |
| | `0.05 × Δmatch_fraction` | per step | dense gradient before FINALIZE is learned | | |
| ## Quick start (sync HTTP) | |
| ```python | |
| from stabilizer_forge import StabilizerAction, StabilizerForgeEnv | |
| client = StabilizerForgeEnv(base_url="http://localhost:8000") | |
| with client.sync() as env: | |
| r = env.reset(task_id="steane") # pass any task_id from tasks.jsonl, or omit for random | |
| print(r.observation.target_stabilizers) # 6 Pauli strings on 7 qubits | |
| for op, qs in [("H",[0]), ("CX",[0,1]), ("CX",[0,2]), ("CX",[0,3])]: | |
| r = env.step(StabilizerAction(op=op, qubits=qs)) | |
| print(f" match_fraction={r.observation.match_fraction:.2f}") | |
| r = env.step(StabilizerAction(op="FINALIZE")) | |
| print(f"terminal reward={r.reward:+.3f} done={r.done}") | |
| ``` | |
| To start the server locally: | |
| ```bash | |
| python -m stabilizer_forge.server.app --port 8000 | |
| ``` | |
| Or via Docker (see below). | |
| ## Tasks | |
| The env loads tasks from `stabilizer_forge/tasks.jsonl` by default. Override with the `STABILIZER_FORGE_TASKS` environment variable. Each task carries: | |
| ```json | |
| { | |
| "task_id": "steane", | |
| "source_code": "Steane [[7,1,3]]", | |
| "n_qubits": 7, | |
| "target_stabilizers": ["XXIIXXI", "XIXIXIX", "IIIXXXX", "ZZIIZZI", "ZIZIZIZ", "IIIZZZZ"], | |
| "connectivity_edges": null, | |
| "gate_budget": 78, | |
| "benchmark_optimum": 26, | |
| "benchmark_optimum_2q": 23, | |
| "tier": 2 | |
| } | |
| ``` | |
| Tier 1 (12 tasks): Bell, GHZ-3..8, [[4,2,2]] iceberg/detector, hypercube `l=1`, iceberg `m=4`. | |
| Tier 2 (12 tasks): Perfect [[5,1,3]], Steane, Shor, surface `d=3`, hex/square-octagon color `d=3`, GHZ-9..13, Carbon. | |
| Tier 3 (5 tasks): Tetrahedral, Hamming, surface `d=5`, hex/square-octagon color `d=5`. | |
| ## Verifier | |
| The match-fraction comes from Stim's `TableauSimulator.peek_observable_expectation`. For each target stabilizer `S_i`, we apply the candidate circuit to `|0⟩^n`, then check whether the resulting state has `+1` eigenvalue under `S_i`. This is exact and polynomial — there's no false-positive risk and no statistical noise. Vendored from [uw-math-ai/quantum-ai/tools/check_stabilizers.py](https://github.com/uw-math-ai/quantum-ai/blob/main/tools/check_stabilizers.py). | |
| ## Deploy | |
| ```bash | |
| openenv push | |
| # requires `huggingface-cli login` first | |
| ``` | |
| The deployed Space exposes `/health`, `/reset`, `/step`, `/state`, `/schema` over HTTP, and `/ws` for low-latency persistent sessions. Use `StabilizerForgeEnv(base_url="https://<your-space>.hf.space")` to connect. | |
| ## Building the Docker image manually | |
| ```bash | |
| docker build -t stabilizer-forge-env:latest -f server/Dockerfile . | |
| docker run -p 8000:8000 stabilizer-forge-env:latest | |
| ``` | |
| ## Files | |
| ``` | |
| stabilizer_forge/ | |
| ├── __init__.py | |
| ├── client.py # StabilizerForgeEnv (sync/async HTTP client) | |
| ├── models.py # StabilizerAction, StabilizerObservation | |
| ├── tasks.jsonl # 29 training tasks | |
| ├── eval_tasks.jsonl # 10 held-out eval tasks | |
| ├── pyproject.toml | |
| ├── openenv.yaml | |
| └── server/ | |
| ├── stabilizer_forge_environment.py # core env (reward, termination, verifier wrap) | |
| ├── verifier.py # Stim-based check_stabilizers, match_fraction | |
| ├── app.py # FastAPI; max_concurrent_envs=64 | |
| └── Dockerfile | |
| ``` | |
| ## Citation / credits | |
| - Verifier and benchmark catalog adapted from [uw-math-ai/quantum-ai](https://github.com/uw-math-ai/quantum-ai) (StabilizerBench, arXiv:2604.21287, April 2026). | |
| - Built on [OpenEnv](https://github.com/meta-pytorch/OpenEnv) and [Stim](https://github.com/quantumlib/Stim). | |