Spaces:
Sleeping
Sleeping
File size: 6,801 Bytes
3bdfd1b b1100bc 3bdfd1b b1100bc 3bdfd1b b1100bc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | ---
title: StabilizerForge — Quantum-Code Synthesis Environment
emoji: ⚛️
colorFrom: indigo
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
- quantum-error-correction
- stim
- stabilizer-codes
- rlvr
---
# StabilizerForge — OpenEnv environment for Clifford circuit synthesis
An RL environment that scores candidate Clifford encoding circuits against a target stabilizer code in **polynomial time** using [Stim](https://github.com/quantumlib/Stim)'s tableau simulator (Aaronson–Gottesman). Built for training small LLMs to do automated quantum-error-correction code synthesis with verifier-grounded rewards (RLVR).
The environment ships with **29 training tasks + 10 held-out eval tasks** across three difficulty tiers — from Bell states (2 qubits, 2 gates) up to distance-5 surface, color, and Golay codes (≤25 qubits, 100+ gates).
## Action / Observation / Reward
**Action** — one Clifford gate per step (or `FINALIZE`):
```python
from stabilizer_forge import StabilizerAction
StabilizerAction(op="H", qubits=[0]) # Hadamard on qubit 0
StabilizerAction(op="S", qubits=[3]) # phase gate on qubit 3
StabilizerAction(op="CX", qubits=[0, 1]) # CNOT 0 -> 1
StabilizerAction(op="FINALIZE") # end episode, deliver terminal reward
```
Pydantic validation; malformed actions get a format penalty and are treated as no-ops. After 5 consecutive format violations the episode terminates.
**Observation** — full state for the current episode:
| field | type | meaning |
|-------|------|---------|
| `task_id` | str | which task this episode is running |
| `target_stabilizers` | `list[str]` | Pauli strings, e.g. `["XZZXI", "IXZZX", ...]` |
| `n_qubits` | int | number of physical qubits |
| `gates_so_far` | `list[str]` | Stim instructions applied this episode |
| `current_circuit` | str | concatenated Stim text |
| `current_match` | `list[bool]` | per-stabilizer preservation under the current circuit (live from Stim) |
| `match_fraction` | float | fraction of target stabilizers preserved (0..1) |
| `gates_emitted` | int | valid gates applied so far |
| `cnot_count` | int | CX count |
| `nonadj_cnot_count` | int | CXs across non-adjacent qubits |
| `gate_budget` / `gate_budget_remaining` | int | hard cap (`2 × benchmark_optimum`) |
| `benchmark_optimum` / `benchmark_optimum_2q` | int | reference encoder's gate counts |
| `connectivity_edges` | `list[list[int]] \| None` | None = all-to-all |
| `format_violations`, `consecutive_violations` | int | error tracking |
| `last_action_valid`, `last_action_error` | bool, str | parser feedback |
| `step_count`, `finalized` | | |
**Reward** — delivered at `FINALIZE` plus dense per-step shaping:
| component | weight | what it measures |
|-----------|--------|------------------|
| stabilizer-match fraction | **0.40** | primary correctness signal |
| gate-count efficiency `max(0, 1 − gates / (1.5 × bench_opt))` | 0.20 | volume vs. reference |
| two-qubit-gate efficiency | 0.20 | CXs are expensive on real hardware |
| connectivity respect | 0.10 | −1 per CX across non-adjacent qubits |
| format compliance | 0.10 | −1 per malformed action |
| `0.05 × Δmatch_fraction` | per step | dense gradient before FINALIZE is learned |
## Quick start (sync HTTP)
```python
from stabilizer_forge import StabilizerAction, StabilizerForgeEnv
client = StabilizerForgeEnv(base_url="http://localhost:8000")
with client.sync() as env:
r = env.reset(task_id="steane") # pass any task_id from tasks.jsonl, or omit for random
print(r.observation.target_stabilizers) # 6 Pauli strings on 7 qubits
for op, qs in [("H",[0]), ("CX",[0,1]), ("CX",[0,2]), ("CX",[0,3])]:
r = env.step(StabilizerAction(op=op, qubits=qs))
print(f" match_fraction={r.observation.match_fraction:.2f}")
r = env.step(StabilizerAction(op="FINALIZE"))
print(f"terminal reward={r.reward:+.3f} done={r.done}")
```
To start the server locally:
```bash
python -m stabilizer_forge.server.app --port 8000
```
Or via Docker (see below).
## Tasks
The env loads tasks from `stabilizer_forge/tasks.jsonl` by default. Override with the `STABILIZER_FORGE_TASKS` environment variable. Each task carries:
```json
{
"task_id": "steane",
"source_code": "Steane [[7,1,3]]",
"n_qubits": 7,
"target_stabilizers": ["XXIIXXI", "XIXIXIX", "IIIXXXX", "ZZIIZZI", "ZIZIZIZ", "IIIZZZZ"],
"connectivity_edges": null,
"gate_budget": 78,
"benchmark_optimum": 26,
"benchmark_optimum_2q": 23,
"tier": 2
}
```
Tier 1 (12 tasks): Bell, GHZ-3..8, [[4,2,2]] iceberg/detector, hypercube `l=1`, iceberg `m=4`.
Tier 2 (12 tasks): Perfect [[5,1,3]], Steane, Shor, surface `d=3`, hex/square-octagon color `d=3`, GHZ-9..13, Carbon.
Tier 3 (5 tasks): Tetrahedral, Hamming, surface `d=5`, hex/square-octagon color `d=5`.
## Verifier
The match-fraction comes from Stim's `TableauSimulator.peek_observable_expectation`. For each target stabilizer `S_i`, we apply the candidate circuit to `|0⟩^n`, then check whether the resulting state has `+1` eigenvalue under `S_i`. This is exact and polynomial — there's no false-positive risk and no statistical noise. Vendored from [uw-math-ai/quantum-ai/tools/check_stabilizers.py](https://github.com/uw-math-ai/quantum-ai/blob/main/tools/check_stabilizers.py).
## Deploy
```bash
openenv push
# requires `huggingface-cli login` first
```
The deployed Space exposes `/health`, `/reset`, `/step`, `/state`, `/schema` over HTTP, and `/ws` for low-latency persistent sessions. Use `StabilizerForgeEnv(base_url="https://<your-space>.hf.space")` to connect.
## Building the Docker image manually
```bash
docker build -t stabilizer-forge-env:latest -f server/Dockerfile .
docker run -p 8000:8000 stabilizer-forge-env:latest
```
## Files
```
stabilizer_forge/
├── __init__.py
├── client.py # StabilizerForgeEnv (sync/async HTTP client)
├── models.py # StabilizerAction, StabilizerObservation
├── tasks.jsonl # 29 training tasks
├── eval_tasks.jsonl # 10 held-out eval tasks
├── pyproject.toml
├── openenv.yaml
└── server/
├── stabilizer_forge_environment.py # core env (reward, termination, verifier wrap)
├── verifier.py # Stim-based check_stabilizers, match_fraction
├── app.py # FastAPI; max_concurrent_envs=64
└── Dockerfile
```
## Citation / credits
- Verifier and benchmark catalog adapted from [uw-math-ai/quantum-ai](https://github.com/uw-math-ai/quantum-ai) (StabilizerBench, arXiv:2604.21287, April 2026).
- Built on [OpenEnv](https://github.com/meta-pytorch/OpenEnv) and [Stim](https://github.com/quantumlib/Stim).
|