Spaces:

ronitraj
/

stabilizer-forge

Sleeping

App Files Files Community

stabilizer-forge / README.md

ronitraj

Upload folder using huggingface_hub

b1100bc verified 12 days ago

preview code

raw

history blame contribute delete

6.8 kB

metadata

title: StabilizerForge — Quantum-Code Synthesis Environment
emoji: ⚛️
colorFrom: indigo
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv
  - quantum-error-correction
  - stim
  - stabilizer-codes
  - rlvr

StabilizerForge — OpenEnv environment for Clifford circuit synthesis

An RL environment that scores candidate Clifford encoding circuits against a target stabilizer code in polynomial time using Stim's tableau simulator (Aaronson–Gottesman). Built for training small LLMs to do automated quantum-error-correction code synthesis with verifier-grounded rewards (RLVR).

The environment ships with 29 training tasks + 10 held-out eval tasks across three difficulty tiers — from Bell states (2 qubits, 2 gates) up to distance-5 surface, color, and Golay codes (≤25 qubits, 100+ gates).

Action / Observation / Reward

Action — one Clifford gate per step (or FINALIZE):

from stabilizer_forge import StabilizerAction
StabilizerAction(op="H",  qubits=[0])      # Hadamard on qubit 0
StabilizerAction(op="S",  qubits=[3])      # phase gate on qubit 3
StabilizerAction(op="CX", qubits=[0, 1])   # CNOT 0 -> 1
StabilizerAction(op="FINALIZE")            # end episode, deliver terminal reward

Pydantic validation; malformed actions get a format penalty and are treated as no-ops. After 5 consecutive format violations the episode terminates.

Observation — full state for the current episode:

field	type	meaning
`task_id`	str	which task this episode is running
`target_stabilizers`	`list[str]`	Pauli strings, e.g. `["XZZXI", "IXZZX", ...]`
`n_qubits`	int	number of physical qubits
`gates_so_far`	`list[str]`	Stim instructions applied this episode
`current_circuit`	str	concatenated Stim text
`current_match`	`list[bool]`	per-stabilizer preservation under the current circuit (live from Stim)
`match_fraction`	float	fraction of target stabilizers preserved (0..1)
`gates_emitted`	int	valid gates applied so far
`cnot_count`	int	CX count
`nonadj_cnot_count`	int	CXs across non-adjacent qubits
`gate_budget` / `gate_budget_remaining`	int	hard cap (`2 × benchmark_optimum`)
`benchmark_optimum` / `benchmark_optimum_2q`	int	reference encoder's gate counts
`connectivity_edges`	`list[list[int]] \| None`	None = all-to-all
`format_violations`, `consecutive_violations`	int	error tracking
`last_action_valid`, `last_action_error`	bool, str	parser feedback
`step_count`, `finalized`

Reward — delivered at FINALIZE plus dense per-step shaping:

component	weight	what it measures
stabilizer-match fraction	0.40	primary correctness signal
gate-count efficiency `max(0, 1 − gates / (1.5 × bench_opt))`	0.20	volume vs. reference
two-qubit-gate efficiency	0.20	CXs are expensive on real hardware
connectivity respect	0.10	−1 per CX across non-adjacent qubits
format compliance	0.10	−1 per malformed action
`0.05 × Δmatch_fraction`	per step	dense gradient before FINALIZE is learned

Quick start (sync HTTP)

from stabilizer_forge import StabilizerAction, StabilizerForgeEnv

client = StabilizerForgeEnv(base_url="http://localhost:8000")
with client.sync() as env:
    r = env.reset(task_id="steane")          # pass any task_id from tasks.jsonl, or omit for random
    print(r.observation.target_stabilizers)  # 6 Pauli strings on 7 qubits

    for op, qs in [("H",[0]), ("CX",[0,1]), ("CX",[0,2]), ("CX",[0,3])]:
        r = env.step(StabilizerAction(op=op, qubits=qs))
        print(f"  match_fraction={r.observation.match_fraction:.2f}")

    r = env.step(StabilizerAction(op="FINALIZE"))
    print(f"terminal reward={r.reward:+.3f} done={r.done}")

To start the server locally:

python -m stabilizer_forge.server.app --port 8000

Or via Docker (see below).

Tasks

The env loads tasks from stabilizer_forge/tasks.jsonl by default. Override with the STABILIZER_FORGE_TASKS environment variable. Each task carries:

{
  "task_id": "steane",
  "source_code": "Steane [[7,1,3]]",
  "n_qubits": 7,
  "target_stabilizers": ["XXIIXXI", "XIXIXIX", "IIIXXXX", "ZZIIZZI", "ZIZIZIZ", "IIIZZZZ"],
  "connectivity_edges": null,
  "gate_budget": 78,
  "benchmark_optimum": 26,
  "benchmark_optimum_2q": 23,
  "tier": 2
}

Tier 1 (12 tasks): Bell, GHZ-3..8, [[4,2,2]] iceberg/detector, hypercube l=1, iceberg m=4. Tier 2 (12 tasks): Perfect [[5,1,3]], Steane, Shor, surface d=3, hex/square-octagon color d=3, GHZ-9..13, Carbon. Tier 3 (5 tasks): Tetrahedral, Hamming, surface d=5, hex/square-octagon color d=5.

Verifier

The match-fraction comes from Stim's TableauSimulator.peek_observable_expectation. For each target stabilizer S_i, we apply the candidate circuit to |0⟩^n, then check whether the resulting state has +1 eigenvalue under S_i. This is exact and polynomial — there's no false-positive risk and no statistical noise. Vendored from uw-math-ai/quantum-ai/tools/check_stabilizers.py.

Deploy

openenv push
# requires `huggingface-cli login` first

The deployed Space exposes /health, /reset, /step, /state, /schema over HTTP, and /ws for low-latency persistent sessions. Use StabilizerForgeEnv(base_url="https://<your-space>.hf.space") to connect.

Building the Docker image manually

docker build -t stabilizer-forge-env:latest -f server/Dockerfile .
docker run -p 8000:8000 stabilizer-forge-env:latest

Files

stabilizer_forge/
├── __init__.py
├── client.py              # StabilizerForgeEnv (sync/async HTTP client)
├── models.py              # StabilizerAction, StabilizerObservation
├── tasks.jsonl            # 29 training tasks
├── eval_tasks.jsonl       # 10 held-out eval tasks
├── pyproject.toml
├── openenv.yaml
└── server/
    ├── stabilizer_forge_environment.py   # core env (reward, termination, verifier wrap)
    ├── verifier.py                       # Stim-based check_stabilizers, match_fraction
    ├── app.py                            # FastAPI; max_concurrent_envs=64
    └── Dockerfile

Citation / credits

Verifier and benchmark catalog adapted from uw-math-ai/quantum-ai (StabilizerBench, arXiv:2604.21287, April 2026).
Built on OpenEnv and Stim.