Spaces:

ronitraj
/

QuantumScribe

Sleeping

App Files Files Community

QuantumScribe / docs /architecture.md

ronitraj

Upload docs/architecture.md with huggingface_hub

7e06782 verified 12 days ago

preview code

raw

history blame contribute delete

5.57 kB

Architecture - Qubit-Medic

The system has three concentric layers, each behind a clean contract.

+-------------------------------------------------------------+
|                       LLM trainer                          |
|                  (TRL GRPOTrainer + Unsloth)                |
|                                                             |
|   for each step:                                            |
|     prompts = sample(prompt_pool)                          |
|     completions = model.generate(prompts, n=4)             |
|     for c in completions:                                   |
|         rewards = env_client.step(c).info["rewards"]       |
+----------------------------+--------------------------------+
                             | HTTP (or in-process)
                             v
+-------------------------------------------------------------+
|              FastAPI server: qubit_medic.server.app         |
|                                                             |
|   POST /reset    -> DecoderObservation                      |
|   POST /step     -> StepResult (reward + info breakdown)    |
|   GET  /health   -> liveness + curriculum stats             |
|   POST /decode   -> baseline PyMatching prediction          |
+----------------------------+--------------------------------+
                             |
                             v
+-------------------------------------------------------------+
|         DecoderEnvironment (qubit_medic.server.environment) |
|                                                             |
|   reset():                                                  |
|     1. CurriculumScheduler.sample()                         |
|     2. cached: stim.Circuit + DEM + pymatching.Matching     |
|     3. compile_detector_sampler().sample(1) -> syndrome     |
|     4. build_prompt(...)  -> DecoderObservation             |
|                                                             |
|   step(raw_response):                                       |
|     1. parse_action()  ->  ParseResult (X/Z error sets)     |
|     2. layout.llm_to_stim()  remap to Stim qubit IDs        |
|     3. compute_all_rewards():                               |
|        - logical_correction (Stim ground truth)             |
|        - syndrome_consistency (final-round detectors)       |
|        - hamming_overlap (vs PyMatching reference frame)    |
|        - format_compliance (parser output)                  |
|        - pymatching_beat (LLM right & PM wrong)             |
|     4. CurriculumScheduler.update(level, logical_correct)   |
|     5. return StepResult                                    |
+-------------------------------------------------------------+

Trust boundaries

+-----------+     prompt + syndrome      +--------------+
|    LLM    | <-------------------------- | Observation  |
+-----------+                             +--------------+
      |
      v raw text
+-----------+     parse + remap          +-----------+
|  Action   | --> [LLM ID space] -----> | Stim ID    |
+-----------+                            +-----------+
                                              |
                                              v scoring
                                        +-----------+
                                        |   State   |
                                        | (server)  |
                                        +-----------+

The DecoderState (server-side) holds the ground-truth observable flip, the true error pattern (PyMatching reference frame), and the seed used for sampling. None of this is ever returned to the LLM. This is the participant guide's "avoid unrestricted global state" discipline made concrete by Pydantic schemas.

Why a terminal Pauli frame, and what it costs

The LLM emits two integer lists: which data qubits suffered an X error and which suffered a Z error, at the moment of final measurement (a terminal Pauli frame). For the rotated memory_z task this is sufficient for the logical observable - the destructive Z measurement is exactly the Z observable, and an X error on a data qubit in the observable's support flips its measurement outcome.

The trade-off is that an end-of-circuit Pauli frame only constrains the final-round detectors (the ones that incorporate the destructive Z measurement results). Earlier-round detectors fire only in response to errors that propagate through the stabilizer rounds, and a terminal frame cannot say anything about them. Reward 2 (syndrome consistency) explicitly grades only the final-round detector bits, which matches the representation's expressive power. The remaining detector bits are implicitly available in the prompt for the LLM to reason about, but unscored.

Why five rewards instead of one

The participant guide is emphatic: "use multiple independent reward functions, not just one." Each of our five rewards is independently verifiable in well under a millisecond and disagrees with at least one other on degenerate inputs:

All-zeros agent on a syndrome with a logical-but-undetectable error: logical_correction = 0 but syndrome_consistency = 1. The R2 - R1 disagreement exposes the failure case.
Random-qubit agent that lands on the right observable parity by luck: logical_correction = 1 but syndrome_consistency and hamming_overlap are both low. R1 alone over-rewards; the others expose the lack of understanding.

This decomposition is what the guide calls "hard to game by construction."