MedSLM-SFT-LoRA -- LoRA Adapters for Medical Instruction Tuning
Research Only -- Not for Clinical Use
This model is intended for research and educational purposes only. It must not be used for medical diagnosis, treatment recommendations, or any clinical decision-making.
Overview
This repository contains the LoRA adapter weights (~17.8 MB) produced by supervised fine-tuning (SFT) of the Saminx22/MedSLM base model on medical question-answering data. The adapters can be loaded on top of the base model using the PEFT library.
If you prefer a ready-to-use model that does not require PEFT at inference time, see the merged version: Saminx22/MedSLM-SFT.
Model Details
| Property | Value |
|---|---|
| Base model | Saminx22/MedSLM |
| Architecture | LLaMA-style (RMSNorm, RoPE, SwiGLU, GQA) |
| Base model parameters | ~330M |
| Trainable LoRA parameters | ~7.1M (3.59% of total) |
| Adapter size on disk | ~17.8 MB |
| Context length | 1,024 tokens |
| Vocabulary | 50,257 (GPT-2 tokenizer) |
| Fine-tuning method | QLoRA (4-bit NF4 quantized base + LoRA adapters) |
| Training framework | Unsloth + TRL SFTTrainer |
| Hardware | Tesla T4 (15.6 GB VRAM) |
LoRA Configuration
| Parameter | Value |
|---|---|
| Rank (r) | 16 |
| Alpha | 32 |
| Effective scaling (alpha / r) | 2.0 |
| Dropout | 0.0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Bias | none |
Architecture
The base model uses a LLaMA-style transformer architecture:
- RMSNorm pre-normalization
- Rotary Positional Embeddings (RoPE)
- SwiGLU activation in the feed-forward network
- Grouped-Query Attention (GQA) with 16 query heads and 8 key-value heads
The base model was pre-trained from scratch on ~148M tokens of medical text (PubMed abstracts, PMC full texts, and clinical guidelines).
Training Details
Dataset
- Repository:
Saminx22/medical_data_for_slm_SFT - Splits: 46,166 train / 2,565 validation / 2,565 test
- Sources: WikiDoc, medical Q&A corpora
- Average length: ~180 tokens per example
Prompt Template
The model was trained with the following instruction template. You must use this exact format at inference time for best results:
### System:
You are a medical AI assistant. Provide accurate, evidence-based answers to medical questions.
### User:
{question}
### Assistant:
{answer}
Hyperparameters
| Hyperparameter | Value |
|---|---|
| Learning rate | 2e-4 |
| LR scheduler | Cosine decay |
| Warmup ratio | 5% |
| Batch size (per device) | 4 |
| Gradient accumulation steps | 8 |
| Effective batch size | 32 |
| Epochs | 3 |
| Weight decay | 0.01 |
| Max gradient norm | 1.0 |
| Optimizer | AdamW (8-bit) |
| Sequence packing | Enabled |
| Max sequence length | 1,024 tokens |
| Precision | bf16 (fp16 fallback) |
Training Results
| Metric | Value |
|---|---|
| Total training steps | 4,329 |
| Final training loss | 2.4678 |
| Training runtime | ~43 minutes |
| Throughput | 53.4 samples/sec |
How to Use
Requirements
pip install transformers torch peft accelerate bitsandbytes
Loading the LoRA Adapters with PEFT
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE_MODEL_ID = "Saminx22/MedSLM"
LORA_ADAPTER_ID = "Saminx22/MedSLM-SFT-LoRA"
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
base_model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL_ID,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, LORA_ADAPTER_ID)
model.eval()
Generating a Response
SYSTEM_PROMPT = (
"You are a medical AI assistant. "
"Provide accurate, evidence-based answers to medical questions."
)
def ask(question: str, max_new_tokens: int = 300) -> str:
prompt = (
f"### System:\n{SYSTEM_PROMPT}\n\n"
f"### User:\n{question}\n\n"
f"### Assistant:\n"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
output_ids = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
temperature=0.7,
top_p=0.9,
top_k=50,
repetition_penalty=1.1,
pad_token_id=tokenizer.eos_token_id,
)
response = output_ids[0][inputs["input_ids"].shape[1]:]
return tokenizer.decode(response, skip_special_tokens=True).strip()
print(ask("What are the warning signs of a stroke?"))
Merging Adapters into the Base Model
If you want a standalone model without the PEFT dependency at inference time, you can merge the adapters:
merged_model = model.merge_and_unload()
merged_model.save_pretrained("MedSLM-SFT-merged")
tokenizer.save_pretrained("MedSLM-SFT-merged")
Alternatively, use the pre-merged version directly: Saminx22/MedSLM-SFT.
Repository Contents
| File | Description |
|---|---|
adapter_config.json |
PEFT / LoRA configuration (rank, alpha, target modules, etc.) |
adapter_model.safetensors |
LoRA adapter weights in safetensors format |
tokenizer.json |
Tokenizer vocabulary and merges |
tokenizer_config.json |
Tokenizer configuration |
Limitations and Risks
- Research only -- not validated for clinical use or patient care.
- Small model size (~330M parameters); more prone to hallucinations and factual errors than larger models.
- No RLHF, DPO, or other safety alignment has been applied.
- Trained for single-turn question answering only; not designed for multi-turn dialogue.
- Context length limited to 1,024 tokens.
- Training data is English-only; performance on other languages is not expected.
Citation
@misc{medslm-sft-lora-2025,
title = {MedSLM-SFT-LoRA: LoRA Adapters for Medical Instruction Tuning},
author = {Saminx22},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/Saminx22/MedSLM-SFT-LoRA}
}
Related Repositories
| Repository | Description |
|---|---|
Saminx22/MedSLM |
Pre-trained base model |
Saminx22/MedSLM-SFT |
Merged SFT model (LoRA adapters baked in) |
Saminx22/medical_data_for_slm_SFT |
SFT training dataset |
- Downloads last month
- 78
Model tree for Saminx22/MedSLM-SFT-LoRA
Base model
Saminx22/MedSLM