File size: 6,801 Bytes
3bdfd1b
b1100bc
 
 
 
3bdfd1b
 
b1100bc
 
 
 
 
 
 
 
3bdfd1b
 
b1100bc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
title: StabilizerForge  Quantum-Code Synthesis Environment
emoji: ⚛️
colorFrom: indigo
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv
  - quantum-error-correction
  - stim
  - stabilizer-codes
  - rlvr
---

# StabilizerForge — OpenEnv environment for Clifford circuit synthesis

An RL environment that scores candidate Clifford encoding circuits against a target stabilizer code in **polynomial time** using [Stim](https://github.com/quantumlib/Stim)'s tableau simulator (Aaronson–Gottesman). Built for training small LLMs to do automated quantum-error-correction code synthesis with verifier-grounded rewards (RLVR).

The environment ships with **29 training tasks + 10 held-out eval tasks** across three difficulty tiers — from Bell states (2 qubits, 2 gates) up to distance-5 surface, color, and Golay codes (≤25 qubits, 100+ gates).

## Action / Observation / Reward

**Action** — one Clifford gate per step (or `FINALIZE`):

```python
from stabilizer_forge import StabilizerAction
StabilizerAction(op="H",  qubits=[0])      # Hadamard on qubit 0
StabilizerAction(op="S",  qubits=[3])      # phase gate on qubit 3
StabilizerAction(op="CX", qubits=[0, 1])   # CNOT 0 -> 1
StabilizerAction(op="FINALIZE")            # end episode, deliver terminal reward
```

Pydantic validation; malformed actions get a format penalty and are treated as no-ops. After 5 consecutive format violations the episode terminates.

**Observation** — full state for the current episode:

| field | type | meaning |
|-------|------|---------|
| `task_id` | str | which task this episode is running |
| `target_stabilizers` | `list[str]` | Pauli strings, e.g. `["XZZXI", "IXZZX", ...]` |
| `n_qubits` | int | number of physical qubits |
| `gates_so_far` | `list[str]` | Stim instructions applied this episode |
| `current_circuit` | str | concatenated Stim text |
| `current_match` | `list[bool]` | per-stabilizer preservation under the current circuit (live from Stim) |
| `match_fraction` | float | fraction of target stabilizers preserved (0..1) |
| `gates_emitted` | int | valid gates applied so far |
| `cnot_count` | int | CX count |
| `nonadj_cnot_count` | int | CXs across non-adjacent qubits |
| `gate_budget` / `gate_budget_remaining` | int | hard cap (`2 × benchmark_optimum`) |
| `benchmark_optimum` / `benchmark_optimum_2q` | int | reference encoder's gate counts |
| `connectivity_edges` | `list[list[int]] \| None` | None = all-to-all |
| `format_violations`, `consecutive_violations` | int | error tracking |
| `last_action_valid`, `last_action_error` | bool, str | parser feedback |
| `step_count`, `finalized` | | |

**Reward** — delivered at `FINALIZE` plus dense per-step shaping:

| component | weight | what it measures |
|-----------|--------|------------------|
| stabilizer-match fraction | **0.40** | primary correctness signal |
| gate-count efficiency `max(0, 1 − gates / (1.5 × bench_opt))` | 0.20 | volume vs. reference |
| two-qubit-gate efficiency | 0.20 | CXs are expensive on real hardware |
| connectivity respect | 0.10 | −1 per CX across non-adjacent qubits |
| format compliance | 0.10 | −1 per malformed action |
| `0.05 × Δmatch_fraction` | per step | dense gradient before FINALIZE is learned |

## Quick start (sync HTTP)

```python
from stabilizer_forge import StabilizerAction, StabilizerForgeEnv

client = StabilizerForgeEnv(base_url="http://localhost:8000")
with client.sync() as env:
    r = env.reset(task_id="steane")          # pass any task_id from tasks.jsonl, or omit for random
    print(r.observation.target_stabilizers)  # 6 Pauli strings on 7 qubits

    for op, qs in [("H",[0]), ("CX",[0,1]), ("CX",[0,2]), ("CX",[0,3])]:
        r = env.step(StabilizerAction(op=op, qubits=qs))
        print(f"  match_fraction={r.observation.match_fraction:.2f}")

    r = env.step(StabilizerAction(op="FINALIZE"))
    print(f"terminal reward={r.reward:+.3f} done={r.done}")
```

To start the server locally:

```bash
python -m stabilizer_forge.server.app --port 8000
```

Or via Docker (see below).

## Tasks

The env loads tasks from `stabilizer_forge/tasks.jsonl` by default. Override with the `STABILIZER_FORGE_TASKS` environment variable. Each task carries:

```json
{
  "task_id": "steane",
  "source_code": "Steane [[7,1,3]]",
  "n_qubits": 7,
  "target_stabilizers": ["XXIIXXI", "XIXIXIX", "IIIXXXX", "ZZIIZZI", "ZIZIZIZ", "IIIZZZZ"],
  "connectivity_edges": null,
  "gate_budget": 78,
  "benchmark_optimum": 26,
  "benchmark_optimum_2q": 23,
  "tier": 2
}
```

Tier 1 (12 tasks): Bell, GHZ-3..8, [[4,2,2]] iceberg/detector, hypercube `l=1`, iceberg `m=4`.
Tier 2 (12 tasks): Perfect [[5,1,3]], Steane, Shor, surface `d=3`, hex/square-octagon color `d=3`, GHZ-9..13, Carbon.
Tier 3 (5 tasks): Tetrahedral, Hamming, surface `d=5`, hex/square-octagon color `d=5`.

## Verifier

The match-fraction comes from Stim's `TableauSimulator.peek_observable_expectation`. For each target stabilizer `S_i`, we apply the candidate circuit to `|0⟩^n`, then check whether the resulting state has `+1` eigenvalue under `S_i`. This is exact and polynomial — there's no false-positive risk and no statistical noise. Vendored from [uw-math-ai/quantum-ai/tools/check_stabilizers.py](https://github.com/uw-math-ai/quantum-ai/blob/main/tools/check_stabilizers.py).

## Deploy

```bash
openenv push
# requires `huggingface-cli login` first
```

The deployed Space exposes `/health`, `/reset`, `/step`, `/state`, `/schema` over HTTP, and `/ws` for low-latency persistent sessions. Use `StabilizerForgeEnv(base_url="https://<your-space>.hf.space")` to connect.

## Building the Docker image manually

```bash
docker build -t stabilizer-forge-env:latest -f server/Dockerfile .
docker run -p 8000:8000 stabilizer-forge-env:latest
```

## Files

```
stabilizer_forge/
├── __init__.py
├── client.py              # StabilizerForgeEnv (sync/async HTTP client)
├── models.py              # StabilizerAction, StabilizerObservation
├── tasks.jsonl            # 29 training tasks
├── eval_tasks.jsonl       # 10 held-out eval tasks
├── pyproject.toml
├── openenv.yaml
└── server/
    ├── stabilizer_forge_environment.py   # core env (reward, termination, verifier wrap)
    ├── verifier.py                       # Stim-based check_stabilizers, match_fraction
    ├── app.py                            # FastAPI; max_concurrent_envs=64
    └── Dockerfile
```

## Citation / credits

- Verifier and benchmark catalog adapted from [uw-math-ai/quantum-ai](https://github.com/uw-math-ai/quantum-ai) (StabilizerBench, arXiv:2604.21287, April 2026).
- Built on [OpenEnv](https://github.com/meta-pytorch/OpenEnv) and [Stim](https://github.com/quantumlib/Stim).