Spaces:

ronitraj
/

QuantumScribe

Sleeping

App Files Files Community

QuantumScribe / docs /architecture.md

ronitraj

Upload docs/architecture.md with huggingface_hub

7e06782 verified 12 days ago

preview code

raw

history blame contribute delete

5.57 kB

	# Architecture - Qubit-Medic

	The system has three concentric layers, each behind a clean contract.

	```
	+-------------------------------------------------------------+
	\| LLM trainer \|
	\| (TRL GRPOTrainer + Unsloth) \|
	\| \|
	\| for each step: \|
	\| prompts = sample(prompt_pool) \|
	\| completions = model.generate(prompts, n=4) \|
	\| for c in completions: \|
	\| rewards = env_client.step(c).info["rewards"] \|
	+----------------------------+--------------------------------+
	\| HTTP (or in-process)
	v
	+-------------------------------------------------------------+
	\| FastAPI server: qubit_medic.server.app \|
	\| \|
	\| POST /reset -> DecoderObservation \|
	\| POST /step -> StepResult (reward + info breakdown) \|
	\| GET /health -> liveness + curriculum stats \|
	\| POST /decode -> baseline PyMatching prediction \|
	+----------------------------+--------------------------------+
	\|
	v
	+-------------------------------------------------------------+
	\| DecoderEnvironment (qubit_medic.server.environment) \|
	\| \|
	\| reset(): \|
	\| 1. CurriculumScheduler.sample() \|
	\| 2. cached: stim.Circuit + DEM + pymatching.Matching \|
	\| 3. compile_detector_sampler().sample(1) -> syndrome \|
	\| 4. build_prompt(...) -> DecoderObservation \|
	\| \|
	\| step(raw_response): \|
	\| 1. parse_action() -> ParseResult (X/Z error sets) \|
	\| 2. layout.llm_to_stim() remap to Stim qubit IDs \|
	\| 3. compute_all_rewards(): \|
	\| - logical_correction (Stim ground truth) \|
	\| - syndrome_consistency (final-round detectors) \|
	\| - hamming_overlap (vs PyMatching reference frame) \|
	\| - format_compliance (parser output) \|
	\| - pymatching_beat (LLM right & PM wrong) \|
	\| 4. CurriculumScheduler.update(level, logical_correct) \|
	\| 5. return StepResult \|
	+-------------------------------------------------------------+
	```

	## Trust boundaries

	```
	+-----------+ prompt + syndrome +--------------+
	\| LLM \| <-------------------------- \| Observation \|
	+-----------+ +--------------+
	\|
	v raw text
	+-----------+ parse + remap +-----------+
	\| Action \| --> [LLM ID space] -----> \| Stim ID \|
	+-----------+ +-----------+
	\|
	v scoring
	+-----------+
	\| State \|
	\| (server) \|
	+-----------+
	```

	The `DecoderState` (server-side) holds the ground-truth observable flip,
	the true error pattern (PyMatching reference frame), and the seed used for
	sampling. None of this is ever returned to the LLM. This is the
	participant guide's `"avoid unrestricted global state"` discipline made
	concrete by Pydantic schemas.

	## Why a terminal Pauli frame, and what it costs

	The LLM emits two integer lists: which data qubits suffered an X error and
	which suffered a Z error, at the moment of final measurement (a
	terminal Pauli frame). For the rotated `memory_z` task this is sufficient
	for the logical observable - the destructive Z measurement is exactly the
	Z observable, and an X error on a data qubit in the observable's support
	flips its measurement outcome.

	The trade-off is that an end-of-circuit Pauli frame only constrains the
	final-round detectors (the ones that incorporate the destructive Z
	measurement results). Earlier-round detectors fire only in response to
	errors that propagate through the stabilizer rounds, and a terminal frame
	cannot say anything about them. Reward 2 (syndrome consistency)
	explicitly grades only the final-round detector bits, which matches the
	representation's expressive power. The remaining detector bits are
	implicitly available in the prompt for the LLM to reason about, but
	unscored.

	## Why five rewards instead of one

	The participant guide is emphatic: *"use multiple independent reward
	functions, not just one."* Each of our five rewards is independently
	verifiable in well under a millisecond and disagrees with at least one
	other on degenerate inputs:

	* All-zeros agent on a syndrome with a logical-but-undetectable error:
	`logical_correction = 0` but `syndrome_consistency = 1`. The R2 - R1
	disagreement exposes the failure case.
	* Random-qubit agent that lands on the right observable parity by luck:
	`logical_correction = 1` but `syndrome_consistency` and
	`hamming_overlap` are both low. R1 alone over-rewards; the others
	expose the lack of understanding.

	This decomposition is what the guide calls *"hard to game by
	construction."*