Spaces:
Sleeping
Sleeping
deploy via scripts/deploy_to_space.py
Browse files
README.md
CHANGED
|
@@ -31,7 +31,7 @@ An LLM (Qwen2.5-3B-Instruct) learning to outperform a 50-year-old graph-matching
|
|
| 31 |
- **Trained LoRA on the Hub:** [ronitraj/quantumscribe](https://huggingface.co/ronitraj/quantumscribe)
|
| 32 |
- **Colab notebook (actual training run):** [`notebooks/meta_final.ipynb`](notebooks/meta_final.ipynb)
|
| 33 |
- **2-min video:** <!-- TODO: replace with submission video URL -->TBD-replace
|
| 34 |
-
- **Blog:** [`BLOG.md`](BLOG.md)
|
| 35 |
- **W&B project:** [ronitraj/QuantumScribe-GRPO](https://wandb.ai/ronitraj/QuantumScribe-GRPO) · SFT [`yli513jl`](https://wandb.ai/ronitraj/QuantumScribe-GRPO/runs/yli513jl) · GRPO [`4p7eurnc`](https://wandb.ai/ronitraj/QuantumScribe-GRPO/runs/4p7eurnc)
|
| 36 |
- **OpenEnv manifest:** [`openenv.yaml`](openenv.yaml)
|
| 37 |
|
|
@@ -47,7 +47,7 @@ We generate synthetic surface-code syndromes using **Stim** ([Gidney 2021](https
|
|
| 47 |

|
| 48 |
|
| 49 |
## Environment
|
| 50 |
-
|
| 51 |
| Field | Value |
|
| 52 |
|---|---|
|
| 53 |
| Observation | `QubitMedicObservation` — `prompt` (text), `syndrome` bits, `level`, `episode_id`, curriculum metadata (see [`qubit_medic/server/openenv_adapter.py`](qubit_medic/server/openenv_adapter.py)) |
|
|
@@ -90,7 +90,7 @@ Held-out eval on 1000 episodes at L2_target (`data/eval_grpo.json`, source-of-tr
|
|
| 90 |
|:-:|:-:|
|
| 91 |
| *Mean total episode reward across GRPO steps; x = step, y = mean reward (illustrative trajectory).* | *Fraction of episodes where the LLM is right and PyMatching is wrong; x = step, y = beat rate.* |
|
| 92 |
|
| 93 |
-
> **
|
| 94 |
|
| 95 |
### Before / after comparison
|
| 96 |
|
|
@@ -141,7 +141,7 @@ Episodes are **single-step**: one completion per episode. The trainer and W&B se
|
|
| 141 |
+----------+ <------------ +---------------------------+
|
| 142 |
```
|
| 143 |
|
| 144 |
-
###
|
| 145 |
|
| 146 |
DeepMind's [AlphaQubit](https://www.nature.com/articles/s41586-024-08148-8) showed a transformer can beat a strong PyMatching baseline. We reimplement the *idea* with a commodity stack:
|
| 147 |
|
|
@@ -157,7 +157,7 @@ DeepMind's [AlphaQubit](https://www.nature.com/articles/s41586-024-08148-8) show
|
|
| 157 |
| Baseline | PyMatching (sparse blossom) | Same class of MWM decoder |
|
| 158 |
| Open source | This repo + Hub weights | Research partial |
|
| 159 |
|
| 160 |
-
### Methodology
|
| 161 |
|
| 162 |
| Concern | Status | Pointer |
|
| 163 |
|--------|--------|--------|
|
|
|
|
| 31 |
- **Trained LoRA on the Hub:** [ronitraj/quantumscribe](https://huggingface.co/ronitraj/quantumscribe)
|
| 32 |
- **Colab notebook (actual training run):** [`notebooks/meta_final.ipynb`](notebooks/meta_final.ipynb)
|
| 33 |
- **2-min video:** <!-- TODO: replace with submission video URL -->TBD-replace
|
| 34 |
+
- **Blog for Everyone:** [`BLOG.md`](BLOG.md)
|
| 35 |
- **W&B project:** [ronitraj/QuantumScribe-GRPO](https://wandb.ai/ronitraj/QuantumScribe-GRPO) · SFT [`yli513jl`](https://wandb.ai/ronitraj/QuantumScribe-GRPO/runs/yli513jl) · GRPO [`4p7eurnc`](https://wandb.ai/ronitraj/QuantumScribe-GRPO/runs/4p7eurnc)
|
| 36 |
- **OpenEnv manifest:** [`openenv.yaml`](openenv.yaml)
|
| 37 |
|
|
|
|
| 47 |

|
| 48 |
|
| 49 |
## Environment
|
| 50 |
+

|
| 51 |
| Field | Value |
|
| 52 |
|---|---|
|
| 53 |
| Observation | `QubitMedicObservation` — `prompt` (text), `syndrome` bits, `level`, `episode_id`, curriculum metadata (see [`qubit_medic/server/openenv_adapter.py`](qubit_medic/server/openenv_adapter.py)) |
|
|
|
|
| 90 |
|:-:|:-:|
|
| 91 |
| *Mean total episode reward across GRPO steps; x = step, y = mean reward (illustrative trajectory).* | *Fraction of episodes where the LLM is right and PyMatching is wrong; x = step, y = beat rate.* |
|
| 92 |
|
| 93 |
+
> **Caveat** On this slice `pymatching_beat = 0.0` — i.e. zero "beats" of PyMatching on the held-out set. During training we are able to do better than Pymatching on some examples where PyMatching fails. High logical correction (96.4%) and overlap with the PM frame remain meaningful signals, but we are not yet claiming to outperform PyMatching at d=3. See [`qubit_medic/server/rewards.py`](qubit_medic/server/rewards.py) for definitions.
|
| 94 |
|
| 95 |
### Before / after comparison
|
| 96 |
|
|
|
|
| 141 |
+----------+ <------------ +---------------------------+
|
| 142 |
```
|
| 143 |
|
| 144 |
+
### Technical Specifications
|
| 145 |
|
| 146 |
DeepMind's [AlphaQubit](https://www.nature.com/articles/s41586-024-08148-8) showed a transformer can beat a strong PyMatching baseline. We reimplement the *idea* with a commodity stack:
|
| 147 |
|
|
|
|
| 157 |
| Baseline | PyMatching (sparse blossom) | Same class of MWM decoder |
|
| 158 |
| Open source | This repo + Hub weights | Research partial |
|
| 159 |
|
| 160 |
+
### Methodology
|
| 161 |
|
| 162 |
| Concern | Status | Pointer |
|
| 163 |
|--------|--------|--------|
|