Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6
Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string
TinyLoRA adapter on meta-llama/Llama-3.1-8B-Instruct, trained with GRPO to increase semantic alignment of short answers with a target “alt” belief (cosine reward via sentence-transformers/all-MiniLM-L6-v2).
GRPOTrainer)u=13, weight_tying=1.0, r=2, targets q_proj, v_proj)Answer_Alt stringimport torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "meta-llama/Llama-3.1-8B-Instruct"
adapter = "<YOUR_HF_USERNAME>/Semantic-Perinucleus-v1"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
base, torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter)
You need a Hugging Face token with access to the Llama 3.1 gated model.
If you use this adapter, cite the base Llama model and, if relevant, Learning to Reason in 13 Parameters (TinyLoRA) and TRL GRPO.
Base model
meta-llama/Llama-3.1-8B