ronitraj commited on
Commit
5ac714b
·
verified ·
1 Parent(s): d714735

deploy via scripts/deploy_to_space.py

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -31,7 +31,7 @@ An LLM (Qwen2.5-3B-Instruct) learning to outperform a 50-year-old graph-matching
31
  - **Trained LoRA on the Hub:** [ronitraj/quantumscribe](https://huggingface.co/ronitraj/quantumscribe)
32
  - **Colab notebook (actual training run):** [`notebooks/meta_final.ipynb`](notebooks/meta_final.ipynb)
33
  - **2-min video:** <!-- TODO: replace with submission video URL -->TBD-replace
34
- - **Blog:** [`BLOG.md`](BLOG.md)
35
  - **W&B project:** [ronitraj/QuantumScribe-GRPO](https://wandb.ai/ronitraj/QuantumScribe-GRPO) · SFT [`yli513jl`](https://wandb.ai/ronitraj/QuantumScribe-GRPO/runs/yli513jl) · GRPO [`4p7eurnc`](https://wandb.ai/ronitraj/QuantumScribe-GRPO/runs/4p7eurnc)
36
  - **OpenEnv manifest:** [`openenv.yaml`](openenv.yaml)
37
 
@@ -47,7 +47,7 @@ We generate synthetic surface-code syndromes using **Stim** ([Gidney 2021](https
47
  ![Surface-code grid animation](figures/grid_animation.gif)
48
 
49
  ## Environment
50
-
51
  | Field | Value |
52
  |---|---|
53
  | Observation | `QubitMedicObservation` — `prompt` (text), `syndrome` bits, `level`, `episode_id`, curriculum metadata (see [`qubit_medic/server/openenv_adapter.py`](qubit_medic/server/openenv_adapter.py)) |
@@ -90,7 +90,7 @@ Held-out eval on 1000 episodes at L2_target (`data/eval_grpo.json`, source-of-tr
90
  |:-:|:-:|
91
  | *Mean total episode reward across GRPO steps; x = step, y = mean reward (illustrative trajectory).* | *Fraction of episodes where the LLM is right and PyMatching is wrong; x = step, y = beat rate.* |
92
 
93
- > **Honest caveat.** On this slice `pymatching_beat = 0.0` — i.e. zero "beats" of PyMatching on the held-out set. High logical correction (96.4%) and overlap with the PM frame remain meaningful signals, but we are not yet claiming to outperform PyMatching at d=3. See [`qubit_medic/server/rewards.py`](qubit_medic/server/rewards.py) for definitions.
94
 
95
  ### Before / after comparison
96
 
@@ -141,7 +141,7 @@ Episodes are **single-step**: one completion per episode. The trainer and W&B se
141
  +----------+ <------------ +---------------------------+
142
  ```
143
 
144
- ### Elevator pitch (technical)
145
 
146
  DeepMind's [AlphaQubit](https://www.nature.com/articles/s41586-024-08148-8) showed a transformer can beat a strong PyMatching baseline. We reimplement the *idea* with a commodity stack:
147
 
@@ -157,7 +157,7 @@ DeepMind's [AlphaQubit](https://www.nature.com/articles/s41586-024-08148-8) show
157
  | Baseline | PyMatching (sparse blossom) | Same class of MWM decoder |
158
  | Open source | This repo + Hub weights | Research partial |
159
 
160
- ### Methodology checklist
161
 
162
  | Concern | Status | Pointer |
163
  |--------|--------|--------|
 
31
  - **Trained LoRA on the Hub:** [ronitraj/quantumscribe](https://huggingface.co/ronitraj/quantumscribe)
32
  - **Colab notebook (actual training run):** [`notebooks/meta_final.ipynb`](notebooks/meta_final.ipynb)
33
  - **2-min video:** <!-- TODO: replace with submission video URL -->TBD-replace
34
+ - **Blog for Everyone:** [`BLOG.md`](BLOG.md)
35
  - **W&B project:** [ronitraj/QuantumScribe-GRPO](https://wandb.ai/ronitraj/QuantumScribe-GRPO) · SFT [`yli513jl`](https://wandb.ai/ronitraj/QuantumScribe-GRPO/runs/yli513jl) · GRPO [`4p7eurnc`](https://wandb.ai/ronitraj/QuantumScribe-GRPO/runs/4p7eurnc)
36
  - **OpenEnv manifest:** [`openenv.yaml`](openenv.yaml)
37
 
 
47
  ![Surface-code grid animation](figures/grid_animation.gif)
48
 
49
  ## Environment
50
+ ![alt text](image.png)
51
  | Field | Value |
52
  |---|---|
53
  | Observation | `QubitMedicObservation` — `prompt` (text), `syndrome` bits, `level`, `episode_id`, curriculum metadata (see [`qubit_medic/server/openenv_adapter.py`](qubit_medic/server/openenv_adapter.py)) |
 
90
  |:-:|:-:|
91
  | *Mean total episode reward across GRPO steps; x = step, y = mean reward (illustrative trajectory).* | *Fraction of episodes where the LLM is right and PyMatching is wrong; x = step, y = beat rate.* |
92
 
93
+ > **Caveat** On this slice `pymatching_beat = 0.0` — i.e. zero "beats" of PyMatching on the held-out set. During training we are able to do better than Pymatching on some examples where PyMatching fails. High logical correction (96.4%) and overlap with the PM frame remain meaningful signals, but we are not yet claiming to outperform PyMatching at d=3. See [`qubit_medic/server/rewards.py`](qubit_medic/server/rewards.py) for definitions.
94
 
95
  ### Before / after comparison
96
 
 
141
  +----------+ <------------ +---------------------------+
142
  ```
143
 
144
+ ### Technical Specifications
145
 
146
  DeepMind's [AlphaQubit](https://www.nature.com/articles/s41586-024-08148-8) showed a transformer can beat a strong PyMatching baseline. We reimplement the *idea* with a commodity stack:
147
 
 
157
  | Baseline | PyMatching (sparse blossom) | Same class of MWM decoder |
158
  | Open source | This repo + Hub weights | Research partial |
159
 
160
+ ### Methodology
161
 
162
  | Concern | Status | Pointer |
163
  |--------|--------|--------|