File size: 1,436 Bytes
00c4b53 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | ---
base_model: unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit
library_name: peft
pipeline_tag: text-generation
tags:
- lora
- grpo
- trl
- unsloth
- quantum-error-correction
license: mit
---
# QuantumScribe (GRPO LoRA)
**LoRA adapter** fine-tuned with **GRPO** for logical quantum error correction, on top of **base** [`unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit`](https://huggingface.co/unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit).
## Adapter
- LoRA `r=16`, `lora_alpha=32`, `lora_dropout=0.1`
- Target: `q_proj`, `k_proj`, `v_proj`, `o_proj` (PEFT 0.18.1)
## Training
- **W&B:** [ronitraj/QuantumScribe-GRPO](https://wandb.ai/ronitraj/QuantumScribe-GRPO) (e.g. run `4p7eurnc`)
- ~1500 GRPO steps; SFT warm-up as in the project `scripts/train_grpo.py`
## Eval (from project `data/eval_grpo.json`)
- **Logical correction rate** high (~0.96 on the recorded run)
- **pymatching_beat** reported at 0 on the evaluated split — align narrative and metrics (continuous vs threshold) with your harness and README
## Load
```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_id = "unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit"
adapter_id = "ronitraj/quantumscribe"
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
model = AutoModelForCausalLM.from_pretrained(
base_id, device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter_id)
```
|