Spaces:
Sleeping
Sleeping
File size: 4,116 Bytes
43bd457 3259ff7 43bd457 3259ff7 43bd457 3259ff7 43bd457 3259ff7 43bd457 3259ff7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | # Figures
All plots are saved as PNG (150 dpi unless noted) with axis labels carrying
explicit units. Reproduction commands are listed under each section.
## Money plot β before vs after RLHF training
* `before_after_comparison.png` β two side-by-side bar charts comparing the
four canonical conditions (Random baseline, Base Qwen2.5-3B, SFT-only,
SFT + GRPO) on `logical_correction_rate` (left, fraction of shots in
[0, 1]) and `pymatching_beat_rate` (right, fraction of shots in [0, 1]).
This is the headline judges-rubric "money plot": the SFT + GRPO bar
should clearly dominate the un-trained conditions on the left panel and
show a non-zero beat-rate on the right panel.
Reproduce (after running per-condition evals into `data/eval/*.json`):
```
python scripts/make_comparison_plot.py --eval-dir data/eval \
--out figures/before_after_comparison.png
```
The script prints a helpful error listing every expected JSON file if any
eval result is missing.
## Training trajectories (synthetic / baseline-anchored)
* `total_reward.png` β mean total episode reward (y, dimensionless 0-1
composite of logical/syndrome/hamming/format/beat sub-rewards) vs
training step (x, gradient updates). Horizontal lines mark Random,
All-zeros, and PyMatching-imitator reward floors so the trained-model
curve can be read against fixed baselines.
* `logical_correction.png` β logical correction rate (y, fraction of
shots in [0, 1]) vs training step (x). Reference lines show
PyMatching, AlphaQubit (Bausch et al., Nature 2024, ~0.973), and
All-zeros (~0.985) on the same axes for direct comparison.
* `pymatching_beat_rate.png` β fraction of syndromes (y, in [0, 1])
where the LLM corrects but PyMatching does not, vs training step (x).
This is the "we moved past pure imitation" diagnostic β non-zero is
the win condition.
Reproduce: `python -m scripts.plot_results --baselines data/baseline_results.json --out-dir figures`
## Data-driven summaries (from `data/*.json`)
* `eval_metrics_bars.png` β horizontal bars of held-out eval metrics
(logical correction, format, syndrome consistency, mean Hamming
overlap, mean total reward, etc.) for the trained model. X-axis is
score in [0, 1]; one row per metric. Sourced from
`data/eval_grpo.json`.
* `sft_curriculum_mix.png` β vertical bars showing rows-per-curriculum
level (y, integer counts) in the SFT training split (L1 warmup / L2
target / L3 stretch). Confirms the 40/50/10 curriculum mix used to
bootstrap the policy before GRPO.
Reproduce: `python -m scripts.plot_data_figures --out-dir figures`
## Scene / animation assets
* `grid_hero.png` β single-frame static visualisation of the distance-3
rotated surface-code data-qubit grid with one example error +
prediction overlay. Used in the README header. Axes are spatial qubit
coordinates (no numeric units; legend identifies data qubits, actual
errors, predicted corrections, and the logical-Z support).
* `grid_animation.gif` β short animated rollout of the same grid across
episodes, useful for talks and the README banner. Each frame shows
one syndrome β action β outcome cycle.
## Figure-by-figure rubric audit (2026-04)
| File | X-axis (units) | Y-axis (units) | Title | Thumbnail-legible |
| --- | --- | --- | --- | --- |
| `total_reward.png` | Training step (steps) | Mean total reward (0-1) | yes | yes |
| `logical_correction.png` | Training step (steps) | Logical correction rate (0-1) | yes | yes |
| `pymatching_beat_rate.png` | Training step (steps) | Fraction of syndromes where LLM beats PM (0-1) | yes | yes |
| `eval_metrics_bars.png` | Score (0-1) | metric labels (categorical) | yes | yes |
| `sft_curriculum_mix.png` | curriculum-level labels (categorical) | Rows in SFT train split (count) | yes | yes |
| `grid_hero.png` | spatial (legend) | spatial (legend) | yes (frame caption) | yes |
| `grid_animation.gif` | spatial (legend) | spatial (legend) | per-frame caption | yes |
| `before_after_comparison.png` | Decoder condition (categorical) | LCR / PM-beat (fraction, 0-1) | yes | yes (will be) |
|