ronitraj
/

quantumscribe

Text Generation

quantum-error-correction

Model card Files Files and versions

quantumscribe / README.md

ronitraj's picture

Upload GRPO final LoRA + model card

00c4b53 verified 12 days ago

|

history blame contribute delete

1.44 kB

	---
	base_model: unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- lora
	- grpo
	- trl
	- unsloth
	- quantum-error-correction
	license: mit
	---

	# QuantumScribe (GRPO LoRA)

	LoRA adapter fine-tuned with GRPO for logical quantum error correction, on top of base [`unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit`](https://huggingface.co/unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit).

	## Adapter

	- LoRA `r=16`, `lora_alpha=32`, `lora_dropout=0.1`
	- Target: `q_proj`, `k_proj`, `v_proj`, `o_proj` (PEFT 0.18.1)

	## Training

	- W&B: [ronitraj/QuantumScribe-GRPO](https://wandb.ai/ronitraj/QuantumScribe-GRPO) (e.g. run `4p7eurnc`)
	- ~1500 GRPO steps; SFT warm-up as in the project `scripts/train_grpo.py`

	## Eval (from project `data/eval_grpo.json`)

	- Logical correction rate high (~0.96 on the recorded run)
	- pymatching_beat reported at 0 on the evaluated split — align narrative and metrics (continuous vs threshold) with your harness and README

	## Load

	```python
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer

	base_id = "unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit"
	adapter_id = "ronitraj/quantumscribe"
	tokenizer = AutoTokenizer.from_pretrained(adapter_id)
	model = AutoModelForCausalLM.from_pretrained(
	base_id, device_map="auto", trust_remote_code=True
	)
	model = PeftModel.from_pretrained(model, adapter_id)
	```