Spaces:

ronitraj
/

stabilizer-forge

Sleeping

App Files Files Community

stabilizer-forge / README.md

ronitraj

Upload folder using huggingface_hub

b1100bc verified 13 days ago

preview code

raw

history blame contribute delete

6.8 kB

	---
	title: StabilizerForge — Quantum-Code Synthesis Environment
	emoji: ⚛️
	colorFrom: indigo
	colorTo: green
	sdk: docker
	pinned: false
	app_port: 8000
	base_path: /web
	tags:
	- openenv
	- quantum-error-correction
	- stim
	- stabilizer-codes
	- rlvr
	---

	# StabilizerForge — OpenEnv environment for Clifford circuit synthesis

	An RL environment that scores candidate Clifford encoding circuits against a target stabilizer code in polynomial time using [Stim](https://github.com/quantumlib/Stim)'s tableau simulator (Aaronson–Gottesman). Built for training small LLMs to do automated quantum-error-correction code synthesis with verifier-grounded rewards (RLVR).

	The environment ships with 29 training tasks + 10 held-out eval tasks across three difficulty tiers — from Bell states (2 qubits, 2 gates) up to distance-5 surface, color, and Golay codes (≤25 qubits, 100+ gates).

	## Action / Observation / Reward

	Action — one Clifford gate per step (or `FINALIZE`):

	```python
	from stabilizer_forge import StabilizerAction
	StabilizerAction(op="H", qubits=[0]) # Hadamard on qubit 0
	StabilizerAction(op="S", qubits=[3]) # phase gate on qubit 3
	StabilizerAction(op="CX", qubits=[0, 1]) # CNOT 0 -> 1
	StabilizerAction(op="FINALIZE") # end episode, deliver terminal reward
	```

	Pydantic validation; malformed actions get a format penalty and are treated as no-ops. After 5 consecutive format violations the episode terminates.

	Observation — full state for the current episode:

	\| field \| type \| meaning \|
	\|-------\|------\|---------\|
	\| `task_id` \| str \| which task this episode is running \|
	\| `target_stabilizers` \| `list[str]` \| Pauli strings, e.g. `["XZZXI", "IXZZX", ...]` \|
	\| `n_qubits` \| int \| number of physical qubits \|
	\| `gates_so_far` \| `list[str]` \| Stim instructions applied this episode \|
	\| `current_circuit` \| str \| concatenated Stim text \|
	\| `current_match` \| `list[bool]` \| per-stabilizer preservation under the current circuit (live from Stim) \|
	\| `match_fraction` \| float \| fraction of target stabilizers preserved (0..1) \|
	\| `gates_emitted` \| int \| valid gates applied so far \|
	\| `cnot_count` \| int \| CX count \|
	\| `nonadj_cnot_count` \| int \| CXs across non-adjacent qubits \|
	\| `gate_budget` / `gate_budget_remaining` \| int \| hard cap (`2 × benchmark_optimum`) \|
	\| `benchmark_optimum` / `benchmark_optimum_2q` \| int \| reference encoder's gate counts \|
	\| `connectivity_edges` \| `list[list[int]] \\| None` \| None = all-to-all \|
	\| `format_violations`, `consecutive_violations` \| int \| error tracking \|
	\| `last_action_valid`, `last_action_error` \| bool, str \| parser feedback \|
	\| `step_count`, `finalized` \| \| \|

	Reward — delivered at `FINALIZE` plus dense per-step shaping:

	\| component \| weight \| what it measures \|
	\|-----------\|--------\|------------------\|
	\| stabilizer-match fraction \| 0.40 \| primary correctness signal \|
	\| gate-count efficiency `max(0, 1 − gates / (1.5 × bench_opt))` \| 0.20 \| volume vs. reference \|
	\| two-qubit-gate efficiency \| 0.20 \| CXs are expensive on real hardware \|
	\| connectivity respect \| 0.10 \| −1 per CX across non-adjacent qubits \|
	\| format compliance \| 0.10 \| −1 per malformed action \|
	\| `0.05 × Δmatch_fraction` \| per step \| dense gradient before FINALIZE is learned \|

	## Quick start (sync HTTP)

	```python
	from stabilizer_forge import StabilizerAction, StabilizerForgeEnv

	client = StabilizerForgeEnv(base_url="http://localhost:8000")
	with client.sync() as env:
	r = env.reset(task_id="steane") # pass any task_id from tasks.jsonl, or omit for random
	print(r.observation.target_stabilizers) # 6 Pauli strings on 7 qubits

	for op, qs in [("H",[0]), ("CX",[0,1]), ("CX",[0,2]), ("CX",[0,3])]:
	r = env.step(StabilizerAction(op=op, qubits=qs))
	print(f" match_fraction={r.observation.match_fraction:.2f}")

	r = env.step(StabilizerAction(op="FINALIZE"))
	print(f"terminal reward={r.reward:+.3f} done={r.done}")
	```

	To start the server locally:

	```bash
	python -m stabilizer_forge.server.app --port 8000
	```

	Or via Docker (see below).

	## Tasks

	The env loads tasks from `stabilizer_forge/tasks.jsonl` by default. Override with the `STABILIZER_FORGE_TASKS` environment variable. Each task carries:

	```json
	{
	"task_id": "steane",
	"source_code": "Steane [[7,1,3]]",
	"n_qubits": 7,
	"target_stabilizers": ["XXIIXXI", "XIXIXIX", "IIIXXXX", "ZZIIZZI", "ZIZIZIZ", "IIIZZZZ"],
	"connectivity_edges": null,
	"gate_budget": 78,
	"benchmark_optimum": 26,
	"benchmark_optimum_2q": 23,
	"tier": 2
	}
	```

	Tier 1 (12 tasks): Bell, GHZ-3..8, [[4,2,2]] iceberg/detector, hypercube `l=1`, iceberg `m=4`.
	Tier 2 (12 tasks): Perfect [[5,1,3]], Steane, Shor, surface `d=3`, hex/square-octagon color `d=3`, GHZ-9..13, Carbon.
	Tier 3 (5 tasks): Tetrahedral, Hamming, surface `d=5`, hex/square-octagon color `d=5`.

	## Verifier

	The match-fraction comes from Stim's `TableauSimulator.peek_observable_expectation`. For each target stabilizer `S_i`, we apply the candidate circuit to `\|0⟩^n`, then check whether the resulting state has `+1` eigenvalue under `S_i`. This is exact and polynomial — there's no false-positive risk and no statistical noise. Vendored from [uw-math-ai/quantum-ai/tools/check_stabilizers.py](https://github.com/uw-math-ai/quantum-ai/blob/main/tools/check_stabilizers.py).

	## Deploy

	```bash
	openenv push
	# requires `huggingface-cli login` first
	```

	The deployed Space exposes `/health`, `/reset`, `/step`, `/state`, `/schema` over HTTP, and `/ws` for low-latency persistent sessions. Use `StabilizerForgeEnv(base_url="https://<your-space>.hf.space")` to connect.

	## Building the Docker image manually

	```bash
	docker build -t stabilizer-forge-env:latest -f server/Dockerfile .
	docker run -p 8000:8000 stabilizer-forge-env:latest
	```

	## Files

	```
	stabilizer_forge/
	├── __init__.py
	├── client.py # StabilizerForgeEnv (sync/async HTTP client)
	├── models.py # StabilizerAction, StabilizerObservation
	├── tasks.jsonl # 29 training tasks
	├── eval_tasks.jsonl # 10 held-out eval tasks
	├── pyproject.toml
	├── openenv.yaml
	└── server/
	├── stabilizer_forge_environment.py # core env (reward, termination, verifier wrap)
	├── verifier.py # Stim-based check_stabilizers, match_fraction
	├── app.py # FastAPI; max_concurrent_envs=64
	└── Dockerfile
	```

	## Citation / credits

	- Verifier and benchmark catalog adapted from [uw-math-ai/quantum-ai](https://github.com/uw-math-ai/quantum-ai) (StabilizerBench, arXiv:2604.21287, April 2026).
	- Built on [OpenEnv](https://github.com/meta-pytorch/OpenEnv) and [Stim](https://github.com/quantumlib/Stim).