Spaces:
Sleeping
title: StabilizerForge — Quantum-Code Synthesis Environment
emoji: ⚛️
colorFrom: indigo
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
- quantum-error-correction
- stim
- stabilizer-codes
- rlvr
StabilizerForge — OpenEnv environment for Clifford circuit synthesis
An RL environment that scores candidate Clifford encoding circuits against a target stabilizer code in polynomial time using Stim's tableau simulator (Aaronson–Gottesman). Built for training small LLMs to do automated quantum-error-correction code synthesis with verifier-grounded rewards (RLVR).
The environment ships with 29 training tasks + 10 held-out eval tasks across three difficulty tiers — from Bell states (2 qubits, 2 gates) up to distance-5 surface, color, and Golay codes (≤25 qubits, 100+ gates).
Action / Observation / Reward
Action — one Clifford gate per step (or FINALIZE):
from stabilizer_forge import StabilizerAction
StabilizerAction(op="H", qubits=[0]) # Hadamard on qubit 0
StabilizerAction(op="S", qubits=[3]) # phase gate on qubit 3
StabilizerAction(op="CX", qubits=[0, 1]) # CNOT 0 -> 1
StabilizerAction(op="FINALIZE") # end episode, deliver terminal reward
Pydantic validation; malformed actions get a format penalty and are treated as no-ops. After 5 consecutive format violations the episode terminates.
Observation — full state for the current episode:
| field | type | meaning |
|---|---|---|
task_id |
str | which task this episode is running |
target_stabilizers |
list[str] |
Pauli strings, e.g. ["XZZXI", "IXZZX", ...] |
n_qubits |
int | number of physical qubits |
gates_so_far |
list[str] |
Stim instructions applied this episode |
current_circuit |
str | concatenated Stim text |
current_match |
list[bool] |
per-stabilizer preservation under the current circuit (live from Stim) |
match_fraction |
float | fraction of target stabilizers preserved (0..1) |
gates_emitted |
int | valid gates applied so far |
cnot_count |
int | CX count |
nonadj_cnot_count |
int | CXs across non-adjacent qubits |
gate_budget / gate_budget_remaining |
int | hard cap (2 × benchmark_optimum) |
benchmark_optimum / benchmark_optimum_2q |
int | reference encoder's gate counts |
connectivity_edges |
list[list[int]] | None |
None = all-to-all |
format_violations, consecutive_violations |
int | error tracking |
last_action_valid, last_action_error |
bool, str | parser feedback |
step_count, finalized |
Reward — delivered at FINALIZE plus dense per-step shaping:
| component | weight | what it measures |
|---|---|---|
| stabilizer-match fraction | 0.40 | primary correctness signal |
gate-count efficiency max(0, 1 − gates / (1.5 × bench_opt)) |
0.20 | volume vs. reference |
| two-qubit-gate efficiency | 0.20 | CXs are expensive on real hardware |
| connectivity respect | 0.10 | −1 per CX across non-adjacent qubits |
| format compliance | 0.10 | −1 per malformed action |
0.05 × Δmatch_fraction |
per step | dense gradient before FINALIZE is learned |
Quick start (sync HTTP)
from stabilizer_forge import StabilizerAction, StabilizerForgeEnv
client = StabilizerForgeEnv(base_url="http://localhost:8000")
with client.sync() as env:
r = env.reset(task_id="steane") # pass any task_id from tasks.jsonl, or omit for random
print(r.observation.target_stabilizers) # 6 Pauli strings on 7 qubits
for op, qs in [("H",[0]), ("CX",[0,1]), ("CX",[0,2]), ("CX",[0,3])]:
r = env.step(StabilizerAction(op=op, qubits=qs))
print(f" match_fraction={r.observation.match_fraction:.2f}")
r = env.step(StabilizerAction(op="FINALIZE"))
print(f"terminal reward={r.reward:+.3f} done={r.done}")
To start the server locally:
python -m stabilizer_forge.server.app --port 8000
Or via Docker (see below).
Tasks
The env loads tasks from stabilizer_forge/tasks.jsonl by default. Override with the STABILIZER_FORGE_TASKS environment variable. Each task carries:
{
"task_id": "steane",
"source_code": "Steane [[7,1,3]]",
"n_qubits": 7,
"target_stabilizers": ["XXIIXXI", "XIXIXIX", "IIIXXXX", "ZZIIZZI", "ZIZIZIZ", "IIIZZZZ"],
"connectivity_edges": null,
"gate_budget": 78,
"benchmark_optimum": 26,
"benchmark_optimum_2q": 23,
"tier": 2
}
Tier 1 (12 tasks): Bell, GHZ-3..8, [[4,2,2]] iceberg/detector, hypercube l=1, iceberg m=4.
Tier 2 (12 tasks): Perfect [[5,1,3]], Steane, Shor, surface d=3, hex/square-octagon color d=3, GHZ-9..13, Carbon.
Tier 3 (5 tasks): Tetrahedral, Hamming, surface d=5, hex/square-octagon color d=5.
Verifier
The match-fraction comes from Stim's TableauSimulator.peek_observable_expectation. For each target stabilizer S_i, we apply the candidate circuit to |0⟩^n, then check whether the resulting state has +1 eigenvalue under S_i. This is exact and polynomial — there's no false-positive risk and no statistical noise. Vendored from uw-math-ai/quantum-ai/tools/check_stabilizers.py.
Deploy
openenv push
# requires `huggingface-cli login` first
The deployed Space exposes /health, /reset, /step, /state, /schema over HTTP, and /ws for low-latency persistent sessions. Use StabilizerForgeEnv(base_url="https://<your-space>.hf.space") to connect.
Building the Docker image manually
docker build -t stabilizer-forge-env:latest -f server/Dockerfile .
docker run -p 8000:8000 stabilizer-forge-env:latest
Files
stabilizer_forge/
├── __init__.py
├── client.py # StabilizerForgeEnv (sync/async HTTP client)
├── models.py # StabilizerAction, StabilizerObservation
├── tasks.jsonl # 29 training tasks
├── eval_tasks.jsonl # 10 held-out eval tasks
├── pyproject.toml
├── openenv.yaml
└── server/
├── stabilizer_forge_environment.py # core env (reward, termination, verifier wrap)
├── verifier.py # Stim-based check_stabilizers, match_fraction
├── app.py # FastAPI; max_concurrent_envs=64
└── Dockerfile
Citation / credits
- Verifier and benchmark catalog adapted from uw-math-ai/quantum-ai (StabilizerBench, arXiv:2604.21287, April 2026).
- Built on OpenEnv and Stim.