ReframeBot-DPO-Llama3.1-8B
LoRA adapter for meta-llama/Meta-Llama-3.1-8B-Instruct, further aligned
with Direct Preference Optimisation (DPO) on top of the SFT checkpoint
(ReframeBot-SFT-Llama3.1-8B).
DPO training steered the model towards empathetic, open-ended Socratic responses and away from direct advice, dismissiveness, or unsafe content. This is the production adapter used in the ReframeBot system.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch
base = "meta-llama/Meta-Llama-3.1-8B-Instruct"
adapter = "Nhatminh1234/ReframeBot-DPO-Llama3.1-8B"
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, quantization_config=bnb, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
Training Details
| Hyperparameter | Value |
|---|---|
| Starting checkpoint | ReframeBot-SFT-Llama3.1-8B |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Learning rate | 5e-6 |
| Optimizer | paged_adamw_8bit |
| Effective batch size | 48 (2 × grad_accum 24) |
| Epochs | 3 |
| Beta (KL penalty) | 0.1 |
| Max sequence length | 512 |
| Quantization | 4-bit NF4, bfloat16 compute |
| Hardware | NVIDIA RTX 5070 (laptop, 8 GB VRAM) |
Dataset: 1,400 preference pairs {prompt, chosen, rejected} generated
with GPT-4. Chosen responses demonstrate empathy + open-ended questioning;
rejected responses contain direct advice, dismissiveness, or unsafe content.
Evaluation
| Metric | Score |
|---|---|
| BERTScore Relevance (F1) | 0.832 |
| BERTScore Faithfulness (F1) | 0.849 |
| Response Consistency | 0.732 |
Intended Use
Designed as a component in the ReframeBot system — not a standalone mental-health tool. Must not be used for clinical intervention or crisis support without human oversight.
Project
GitHub: ReframeBot
- Downloads last month
- 21
Model tree for Nhatminh1234/ReframeBot-DPO-Llama3.1-8B
Base model
meta-llama/Llama-3.1-8B