Spaces:
Sleeping
Sleeping
Upload docs/architecture.md with huggingface_hub
Browse files- docs/architecture.md +111 -0
docs/architecture.md
ADDED
|
@@ -0,0 +1,111 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Architecture - Qubit-Medic
|
| 2 |
+
|
| 3 |
+
The system has three concentric layers, each behind a clean contract.
|
| 4 |
+
|
| 5 |
+
```
|
| 6 |
+
+-------------------------------------------------------------+
|
| 7 |
+
| LLM trainer |
|
| 8 |
+
| (TRL GRPOTrainer + Unsloth) |
|
| 9 |
+
| |
|
| 10 |
+
| for each step: |
|
| 11 |
+
| prompts = sample(prompt_pool) |
|
| 12 |
+
| completions = model.generate(prompts, n=4) |
|
| 13 |
+
| for c in completions: |
|
| 14 |
+
| rewards = env_client.step(c).info["rewards"] |
|
| 15 |
+
+----------------------------+--------------------------------+
|
| 16 |
+
| HTTP (or in-process)
|
| 17 |
+
v
|
| 18 |
+
+-------------------------------------------------------------+
|
| 19 |
+
| FastAPI server: qubit_medic.server.app |
|
| 20 |
+
| |
|
| 21 |
+
| POST /reset -> DecoderObservation |
|
| 22 |
+
| POST /step -> StepResult (reward + info breakdown) |
|
| 23 |
+
| GET /health -> liveness + curriculum stats |
|
| 24 |
+
| POST /decode -> baseline PyMatching prediction |
|
| 25 |
+
+----------------------------+--------------------------------+
|
| 26 |
+
|
|
| 27 |
+
v
|
| 28 |
+
+-------------------------------------------------------------+
|
| 29 |
+
| DecoderEnvironment (qubit_medic.server.environment) |
|
| 30 |
+
| |
|
| 31 |
+
| reset(): |
|
| 32 |
+
| 1. CurriculumScheduler.sample() |
|
| 33 |
+
| 2. cached: stim.Circuit + DEM + pymatching.Matching |
|
| 34 |
+
| 3. compile_detector_sampler().sample(1) -> syndrome |
|
| 35 |
+
| 4. build_prompt(...) -> DecoderObservation |
|
| 36 |
+
| |
|
| 37 |
+
| step(raw_response): |
|
| 38 |
+
| 1. parse_action() -> ParseResult (X/Z error sets) |
|
| 39 |
+
| 2. layout.llm_to_stim() remap to Stim qubit IDs |
|
| 40 |
+
| 3. compute_all_rewards(): |
|
| 41 |
+
| - logical_correction (Stim ground truth) |
|
| 42 |
+
| - syndrome_consistency (final-round detectors) |
|
| 43 |
+
| - hamming_overlap (vs PyMatching reference frame) |
|
| 44 |
+
| - format_compliance (parser output) |
|
| 45 |
+
| - pymatching_beat (LLM right & PM wrong) |
|
| 46 |
+
| 4. CurriculumScheduler.update(level, logical_correct) |
|
| 47 |
+
| 5. return StepResult |
|
| 48 |
+
+-------------------------------------------------------------+
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
## Trust boundaries
|
| 52 |
+
|
| 53 |
+
```
|
| 54 |
+
+-----------+ prompt + syndrome +--------------+
|
| 55 |
+
| LLM | <-------------------------- | Observation |
|
| 56 |
+
+-----------+ +--------------+
|
| 57 |
+
|
|
| 58 |
+
v raw text
|
| 59 |
+
+-----------+ parse + remap +-----------+
|
| 60 |
+
| Action | --> [LLM ID space] -----> | Stim ID |
|
| 61 |
+
+-----------+ +-----------+
|
| 62 |
+
|
|
| 63 |
+
v scoring
|
| 64 |
+
+-----------+
|
| 65 |
+
| State |
|
| 66 |
+
| (server) |
|
| 67 |
+
+-----------+
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
The `DecoderState` (server-side) holds the ground-truth observable flip,
|
| 71 |
+
the true error pattern (PyMatching reference frame), and the seed used for
|
| 72 |
+
sampling. **None** of this is ever returned to the LLM. This is the
|
| 73 |
+
participant guide's `"avoid unrestricted global state"` discipline made
|
| 74 |
+
concrete by Pydantic schemas.
|
| 75 |
+
|
| 76 |
+
## Why a terminal Pauli frame, and what it costs
|
| 77 |
+
|
| 78 |
+
The LLM emits two integer lists: which data qubits suffered an X error and
|
| 79 |
+
which suffered a Z error, **at the moment of final measurement** (a
|
| 80 |
+
terminal Pauli frame). For the rotated `memory_z` task this is sufficient
|
| 81 |
+
for the logical observable - the destructive Z measurement is exactly the
|
| 82 |
+
Z observable, and an X error on a data qubit in the observable's support
|
| 83 |
+
flips its measurement outcome.
|
| 84 |
+
|
| 85 |
+
The trade-off is that an end-of-circuit Pauli frame *only* constrains the
|
| 86 |
+
final-round detectors (the ones that incorporate the destructive Z
|
| 87 |
+
measurement results). Earlier-round detectors fire only in response to
|
| 88 |
+
errors that propagate through the stabilizer rounds, and a terminal frame
|
| 89 |
+
cannot say anything about them. Reward 2 (syndrome consistency)
|
| 90 |
+
explicitly grades only the final-round detector bits, which matches the
|
| 91 |
+
representation's expressive power. The remaining detector bits are
|
| 92 |
+
implicitly *available* in the prompt for the LLM to reason about, but
|
| 93 |
+
unscored.
|
| 94 |
+
|
| 95 |
+
## Why five rewards instead of one
|
| 96 |
+
|
| 97 |
+
The participant guide is emphatic: *"use multiple independent reward
|
| 98 |
+
functions, not just one."* Each of our five rewards is independently
|
| 99 |
+
verifiable in well under a millisecond and disagrees with at least one
|
| 100 |
+
other on degenerate inputs:
|
| 101 |
+
|
| 102 |
+
* All-zeros agent on a syndrome with a logical-but-undetectable error:
|
| 103 |
+
`logical_correction = 0` but `syndrome_consistency = 1`. The R2 - R1
|
| 104 |
+
disagreement exposes the failure case.
|
| 105 |
+
* Random-qubit agent that lands on the right observable parity by luck:
|
| 106 |
+
`logical_correction = 1` but `syndrome_consistency` and
|
| 107 |
+
`hamming_overlap` are both low. R1 alone over-rewards; the others
|
| 108 |
+
expose the lack of understanding.
|
| 109 |
+
|
| 110 |
+
This decomposition is what the guide calls *"hard to game by
|
| 111 |
+
construction."*
|