quantumscribe / README.md

ronitraj

Upload GRPO final LoRA + model card

00c4b53 verified 12 days ago

preview code

raw

history blame contribute delete

1.44 kB

metadata

base_model: unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit
library_name: peft
pipeline_tag: text-generation
tags:
  - lora
  - grpo
  - trl
  - unsloth
  - quantum-error-correction
license: mit

QuantumScribe (GRPO LoRA)

LoRA adapter fine-tuned with GRPO for logical quantum error correction, on top of base unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit.

Adapter

LoRA r=16, lora_alpha=32, lora_dropout=0.1
Target: q_proj, k_proj, v_proj, o_proj (PEFT 0.18.1)

Training

W&B: ronitraj/QuantumScribe-GRPO (e.g. run 4p7eurnc)
~1500 GRPO steps; SFT warm-up as in the project scripts/train_grpo.py

Eval (from project `data/eval_grpo.json`)

Logical correction rate high (~0.96 on the recorded run)
pymatching_beat reported at 0 on the evaluated split — align narrative and metrics (continuous vs threshold) with your harness and README

Load

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_id = "unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit"
adapter_id = "ronitraj/quantumscribe"
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
model = AutoModelForCausalLM.from_pretrained(
    base_id, device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter_id)

QuantumScribe (GRPO LoRA)

Adapter

Training

Eval (from project data/eval_grpo.json)

Load

Eval (from project `data/eval_grpo.json`)