ptv3-lora-qwen7b
Userstory: From Voice Message to Psychotherapy Report End-to-end case study: Voice message -> transcription -> LoRA-generated psychotherapy report (PTV-3) -> PDF document. Built as a portfolio project in three stages on a single RTX 5090 (32 GB VRAM).
QLoRA adapter for generating structured German psychotherapy reports (PTV-3 / Bericht an den Gutachter) from therapy session transcripts.
Overview
| Property | Value |
|---|---|
| Base model | Qwen2.5-7B-Instruct (4-bit) |
| Method | QLoRA via Unsloth |
| LoRA rank | 16 (alpha 32) |
| Trainable params | 40M / 7.6B (0.53%) |
| Adapter size | 155 MB |
| Training | 3 epochs, 280 samples, ~90 min on RTX 5090 |
| Final loss | 0.37 |
Training Details
- Dataset: John-creative/ptv3-bericht-lora-de-300 (280 train / 31 val)
- Framework: Unsloth 2026.4.4 + TRL 0.24.0 SFTTrainer
- Response-only training: Loss computed only on assistant JSON tokens via
train_on_responses_only() - Hardware: NVIDIA RTX 5090 (32 GB), bf16
- Optimizer: AdamW, cosine LR schedule, lr=2e-4, warmup 10 steps
Evaluation
| Metric | Value |
|---|---|
| Parse rate (valid PTV3Report) | ~60% |
| Verfahren exact match | ~89% |
| Cosine similarity (vs teacher) | 0.92 |
| ROUGE-L konsiliarbericht | 0.52 |
| ROUGE-L befund | 0.51 |
| ICD exact match | ~30% |
Usage
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="John-creative/ptv3-lora-qwen7b",
max_seq_length=4096,
dtype=None,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
messages = [
{"role": "system", "content": "Du bist ein Assistenzsystem, das aus einer dokumentierten Psychotherapie-Sitzung einen strukturierten Bericht an den Gutachter nach PTV 3 als JSON erzeugt. Gib ausschließlich gültiges JSON zurück, ohne zusätzlichen Text."},
{"role": "user", "content": "<session transcript here>"},
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(input_ids=inputs, max_new_tokens=2048, temperature=0.1)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
Intended Use
Portfolio / educational project demonstrating QLoRA fine-tuning for structured clinical report generation. Not for clinical use.
Related
- Dataset: John-creative/ptv3-bericht-lora-de-300
- Code: John-Loschnig on GitHub
- Downloads last month
- 65
Model tree for John-creative/ptv3-lora-qwen7b
Base model
Qwen/Qwen2.5-7B Finetuned
Qwen/Qwen2.5-7B-Instruct Quantized
unsloth/Qwen2.5-7B-Instruct-bnb-4bit
