| --- |
| base_model: unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit |
| library_name: peft |
| pipeline_tag: text-generation |
| tags: |
| - lora |
| - grpo |
| - trl |
| - unsloth |
| - quantum-error-correction |
| license: mit |
| --- |
| |
| # QuantumScribe (GRPO LoRA) |
|
|
| **LoRA adapter** fine-tuned with **GRPO** for logical quantum error correction, on top of **base** [`unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit`](https://huggingface.co/unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit). |
|
|
| ## Adapter |
|
|
| - LoRA `r=16`, `lora_alpha=32`, `lora_dropout=0.1` |
| - Target: `q_proj`, `k_proj`, `v_proj`, `o_proj` (PEFT 0.18.1) |
|
|
| ## Training |
|
|
| - **W&B:** [ronitraj/QuantumScribe-GRPO](https://wandb.ai/ronitraj/QuantumScribe-GRPO) (e.g. run `4p7eurnc`) |
| - ~1500 GRPO steps; SFT warm-up as in the project `scripts/train_grpo.py` |
|
|
| ## Eval (from project `data/eval_grpo.json`) |
| |
| - **Logical correction rate** high (~0.96 on the recorded run) |
| - **pymatching_beat** reported at 0 on the evaluated split — align narrative and metrics (continuous vs threshold) with your harness and README |
| |
| ## Load |
| |
| ```python |
| from peft import PeftModel |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| base_id = "unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit" |
| adapter_id = "ronitraj/quantumscribe" |
| tokenizer = AutoTokenizer.from_pretrained(adapter_id) |
| model = AutoModelForCausalLM.from_pretrained( |
| base_id, device_map="auto", trust_remote_code=True |
| ) |
| model = PeftModel.from_pretrained(model, adapter_id) |
| ``` |
| |