Spaces:

ronitraj
/

QuantumScribe

Sleeping

App Files Files Community

ronitraj commited on 12 days ago

Commit

7e06782

verified ·

1 Parent(s): 4693f9a

Upload docs/architecture.md with huggingface_hub

Browse files

Files changed (1) hide show

docs/architecture.md +111 -0

docs/architecture.md ADDED Viewed

	@@ -0,0 +1,111 @@

+# Architecture - Qubit-Medic
+The system has three concentric layers, each behind a clean contract.
+```
++-------------------------------------------------------------+
+|                       LLM trainer                          |
+|                  (TRL GRPOTrainer + Unsloth)                |
+|                                                             |
+|   for each step:                                            |
+|     prompts = sample(prompt_pool)                          |
+|     completions = model.generate(prompts, n=4)             |
+|     for c in completions:                                   |
+|         rewards = env_client.step(c).info["rewards"]       |
++----------------------------+--------------------------------+
+                             | HTTP (or in-process)
+                             v
++-------------------------------------------------------------+
+|              FastAPI server: qubit_medic.server.app         |
+|                                                             |
+|   POST /reset    -> DecoderObservation                      |
+|   POST /step     -> StepResult (reward + info breakdown)    |
+|   GET  /health   -> liveness + curriculum stats             |
+|   POST /decode   -> baseline PyMatching prediction          |
++----------------------------+--------------------------------+
+                             |
+                             v
++-------------------------------------------------------------+
+|         DecoderEnvironment (qubit_medic.server.environment) |
+|                                                             |
+|   reset():                                                  |
+|     1. CurriculumScheduler.sample()                         |
+|     2. cached: stim.Circuit + DEM + pymatching.Matching     |
+|     3. compile_detector_sampler().sample(1) -> syndrome     |
+|     4. build_prompt(...)  -> DecoderObservation             |
+|                                                             |
+|   step(raw_response):                                       |
+|     1. parse_action()  ->  ParseResult (X/Z error sets)     |
+|     2. layout.llm_to_stim()  remap to Stim qubit IDs        |
+|     3. compute_all_rewards():                               |
+|        - logical_correction (Stim ground truth)             |
+|        - syndrome_consistency (final-round detectors)       |
+|        - hamming_overlap (vs PyMatching reference frame)    |
+|        - format_compliance (parser output)                  |
+|        - pymatching_beat (LLM right & PM wrong)             |
+|     4. CurriculumScheduler.update(level, logical_correct)   |
+|     5. return StepResult                                    |
++-------------------------------------------------------------+
+```
+## Trust boundaries
+```
++-----------+     prompt + syndrome      +--------------+
+|    LLM    | <-------------------------- | Observation  |
++-----------+                             +--------------+
+      |
+      v raw text
++-----------+     parse + remap          +-----------+
+|  Action   | --> [LLM ID space] -----> | Stim ID    |
++-----------+                            +-----------+
+                                              |
+                                              v scoring
+                                        +-----------+
+                                        |   State   |
+                                        | (server)  |
+                                        +-----------+
+```
+The `DecoderState` (server-side) holds the ground-truth observable flip,
+the true error pattern (PyMatching reference frame), and the seed used for
+sampling. **None** of this is ever returned to the LLM. This is the
+participant guide's `"avoid unrestricted global state"` discipline made
+concrete by Pydantic schemas.
+## Why a terminal Pauli frame, and what it costs
+The LLM emits two integer lists: which data qubits suffered an X error and
+which suffered a Z error, **at the moment of final measurement** (a
+terminal Pauli frame). For the rotated `memory_z` task this is sufficient
+for the logical observable - the destructive Z measurement is exactly the
+Z observable, and an X error on a data qubit in the observable's support
+flips its measurement outcome.
+The trade-off is that an end-of-circuit Pauli frame *only* constrains the
+final-round detectors (the ones that incorporate the destructive Z
+measurement results). Earlier-round detectors fire only in response to
+errors that propagate through the stabilizer rounds, and a terminal frame
+cannot say anything about them. Reward 2 (syndrome consistency)
+explicitly grades only the final-round detector bits, which matches the
+representation's expressive power. The remaining detector bits are
+implicitly *available* in the prompt for the LLM to reason about, but
+unscored.
+## Why five rewards instead of one
+The participant guide is emphatic: *"use multiple independent reward
+functions, not just one."* Each of our five rewards is independently
+verifiable in well under a millisecond and disagrees with at least one
+other on degenerate inputs:
+* All-zeros agent on a syndrome with a logical-but-undetectable error:
+  `logical_correction = 0` but `syndrome_consistency = 1`. The R2 - R1
+  disagreement exposes the failure case.
+* Random-qubit agent that lands on the right observable parity by luck:
+  `logical_correction = 1` but `syndrome_consistency` and
+  `hamming_overlap` are both low. R1 alone over-rewards; the others
+  expose the lack of understanding.
+This decomposition is what the guide calls *"hard to game by
+construction."*