Spaces:

ronitraj
/

QuantumScribe

Sleeping

App Files Files Community

ronitraj commited on 12 days ago

Commit

5ac714b

verified ·

1 Parent(s): d714735

deploy via scripts/deploy_to_space.py

Browse files

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -31,7 +31,7 @@ An LLM (Qwen2.5-3B-Instruct) learning to outperform a 50-year-old graph-matching
 - **Trained LoRA on the Hub:** [ronitraj/quantumscribe](https://huggingface.co/ronitraj/quantumscribe)
 - **Colab notebook (actual training run):** [`notebooks/meta_final.ipynb`](notebooks/meta_final.ipynb)
 - **2-min video:** <!-- TODO: replace with submission video URL -->TBD-replace
-- **Blog:** [`BLOG.md`](BLOG.md)
 - **W&B project:** [ronitraj/QuantumScribe-GRPO](https://wandb.ai/ronitraj/QuantumScribe-GRPO) · SFT [`yli513jl`](https://wandb.ai/ronitraj/QuantumScribe-GRPO/runs/yli513jl) · GRPO [`4p7eurnc`](https://wandb.ai/ronitraj/QuantumScribe-GRPO/runs/4p7eurnc)
 - **OpenEnv manifest:** [`openenv.yaml`](openenv.yaml)
@@ -47,7 +47,7 @@ We generate synthetic surface-code syndromes using **Stim** ([Gidney 2021](https
 ![Surface-code grid animation](figures/grid_animation.gif)
 ## Environment
 | Field | Value |
 |---|---|
 | Observation | `QubitMedicObservation` — `prompt` (text), `syndrome` bits, `level`, `episode_id`, curriculum metadata (see [`qubit_medic/server/openenv_adapter.py`](qubit_medic/server/openenv_adapter.py)) |
@@ -90,7 +90,7 @@ Held-out eval on 1000 episodes at L2_target (`data/eval_grpo.json`, source-of-tr
 |:-:|:-:|
 | *Mean total episode reward across GRPO steps; x = step, y = mean reward (illustrative trajectory).* | *Fraction of episodes where the LLM is right and PyMatching is wrong; x = step, y = beat rate.* |
-> **Honest caveat.** On this slice `pymatching_beat = 0.0` — i.e. zero "beats" of PyMatching on the held-out set. High logical correction (96.4%) and overlap with the PM frame remain meaningful signals, but we are not yet claiming to outperform PyMatching at d=3. See [`qubit_medic/server/rewards.py`](qubit_medic/server/rewards.py) for definitions.
 ### Before / after comparison
@@ -141,7 +141,7 @@ Episodes are **single-step**: one completion per episode. The trainer and W&B se
 +----------+ <------------  +---------------------------+
 ```
-### Elevator pitch (technical)
 DeepMind's [AlphaQubit](https://www.nature.com/articles/s41586-024-08148-8) showed a transformer can beat a strong PyMatching baseline. We reimplement the *idea* with a commodity stack:
@@ -157,7 +157,7 @@ DeepMind's [AlphaQubit](https://www.nature.com/articles/s41586-024-08148-8) show
 | Baseline | PyMatching (sparse blossom) | Same class of MWM decoder |
 | Open source | This repo + Hub weights | Research partial |
-### Methodology checklist
 | Concern | Status | Pointer |
 |--------|--------|--------|

 - **Trained LoRA on the Hub:** [ronitraj/quantumscribe](https://huggingface.co/ronitraj/quantumscribe)
 - **Colab notebook (actual training run):** [`notebooks/meta_final.ipynb`](notebooks/meta_final.ipynb)
 - **2-min video:** <!-- TODO: replace with submission video URL -->TBD-replace
+- **Blog for Everyone:** [`BLOG.md`](BLOG.md)
 - **W&B project:** [ronitraj/QuantumScribe-GRPO](https://wandb.ai/ronitraj/QuantumScribe-GRPO) · SFT [`yli513jl`](https://wandb.ai/ronitraj/QuantumScribe-GRPO/runs/yli513jl) · GRPO [`4p7eurnc`](https://wandb.ai/ronitraj/QuantumScribe-GRPO/runs/4p7eurnc)
 - **OpenEnv manifest:** [`openenv.yaml`](openenv.yaml)
 ![Surface-code grid animation](figures/grid_animation.gif)
 ## Environment
+![alt text](image.png)
 | Field | Value |
 |---|---|
 | Observation | `QubitMedicObservation` — `prompt` (text), `syndrome` bits, `level`, `episode_id`, curriculum metadata (see [`qubit_medic/server/openenv_adapter.py`](qubit_medic/server/openenv_adapter.py)) |
 |:-:|:-:|
 | *Mean total episode reward across GRPO steps; x = step, y = mean reward (illustrative trajectory).* | *Fraction of episodes where the LLM is right and PyMatching is wrong; x = step, y = beat rate.* |
+> **Caveat** On this slice `pymatching_beat = 0.0` — i.e. zero "beats" of PyMatching on the held-out set. During training we are able to do better than Pymatching on some examples where PyMatching fails. High logical correction (96.4%) and overlap with the PM frame remain meaningful signals, but we are not yet claiming to outperform PyMatching at d=3. See [`qubit_medic/server/rewards.py`](qubit_medic/server/rewards.py) for definitions.
 ### Before / after comparison
 +----------+ <------------  +---------------------------+
 ```
+### Technical Specifications
 DeepMind's [AlphaQubit](https://www.nature.com/articles/s41586-024-08148-8) showed a transformer can beat a strong PyMatching baseline. We reimplement the *idea* with a commodity stack:
 | Baseline | PyMatching (sparse blossom) | Same class of MWM decoder |
 | Open source | This repo + Hub weights | Research partial |
+### Methodology
 | Concern | Status | Pointer |
 |--------|--------|--------|