MedSLM-SFT-LoRA -- LoRA Adapters for Medical Instruction Tuning

Research Only -- Not for Clinical Use

This model is intended for research and educational purposes only. It must not be used for medical diagnosis, treatment recommendations, or any clinical decision-making.

Overview

This repository contains the LoRA adapter weights (~17.8 MB) produced by supervised fine-tuning (SFT) of the Saminx22/MedSLM base model on medical question-answering data. The adapters can be loaded on top of the base model using the PEFT library.

If you prefer a ready-to-use model that does not require PEFT at inference time, see the merged version: Saminx22/MedSLM-SFT.

Model Details

Property	Value
Base model	`Saminx22/MedSLM`
Architecture	LLaMA-style (RMSNorm, RoPE, SwiGLU, GQA)
Base model parameters	~330M
Trainable LoRA parameters	~7.1M (3.59% of total)
Adapter size on disk	~17.8 MB
Context length	1,024 tokens
Vocabulary	50,257 (GPT-2 tokenizer)
Fine-tuning method	QLoRA (4-bit NF4 quantized base + LoRA adapters)
Training framework	Unsloth + TRL SFTTrainer
Hardware	Tesla T4 (15.6 GB VRAM)

LoRA Configuration

Parameter	Value
Rank (r)	16
Alpha	32
Effective scaling (alpha / r)	2.0
Dropout	0.0
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Bias	none

Architecture

The base model uses a LLaMA-style transformer architecture:

RMSNorm pre-normalization
Rotary Positional Embeddings (RoPE)
SwiGLU activation in the feed-forward network
Grouped-Query Attention (GQA) with 16 query heads and 8 key-value heads

The base model was pre-trained from scratch on ~148M tokens of medical text (PubMed abstracts, PMC full texts, and clinical guidelines).

Training Details

Dataset

Repository: Saminx22/medical_data_for_slm_SFT
Splits: 46,166 train / 2,565 validation / 2,565 test
Sources: WikiDoc, medical Q&A corpora
Average length: ~180 tokens per example

Prompt Template

The model was trained with the following instruction template. You must use this exact format at inference time for best results:

### System:
You are a medical AI assistant. Provide accurate, evidence-based answers to medical questions.

### User:
{question}

### Assistant:
{answer}

Hyperparameters

Hyperparameter	Value
Learning rate	2e-4
LR scheduler	Cosine decay
Warmup ratio	5%
Batch size (per device)	4
Gradient accumulation steps	8
Effective batch size	32
Epochs	3
Weight decay	0.01
Max gradient norm	1.0
Optimizer	AdamW (8-bit)
Sequence packing	Enabled
Max sequence length	1,024 tokens
Precision	bf16 (fp16 fallback)

Training Results

Metric	Value
Total training steps	4,329
Final training loss	2.4678
Training runtime	~43 minutes
Throughput	53.4 samples/sec

How to Use

Requirements

pip install transformers torch peft accelerate bitsandbytes

Loading the LoRA Adapters with PEFT

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

BASE_MODEL_ID = "Saminx22/MedSLM"
LORA_ADAPTER_ID = "Saminx22/MedSLM-SFT-LoRA"

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    torch_dtype=torch.float16,
    device_map="auto",
)

model = PeftModel.from_pretrained(base_model, LORA_ADAPTER_ID)
model.eval()

Generating a Response

SYSTEM_PROMPT = (
    "You are a medical AI assistant. "
    "Provide accurate, evidence-based answers to medical questions."
)

def ask(question: str, max_new_tokens: int = 300) -> str:
    prompt = (
        f"### System:\n{SYSTEM_PROMPT}\n\n"
        f"### User:\n{question}\n\n"
        f"### Assistant:\n"
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.inference_mode():
        output_ids = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
            top_k=50,
            repetition_penalty=1.1,
            pad_token_id=tokenizer.eos_token_id,
        )

    response = output_ids[0][inputs["input_ids"].shape[1]:]
    return tokenizer.decode(response, skip_special_tokens=True).strip()

print(ask("What are the warning signs of a stroke?"))

Merging Adapters into the Base Model

If you want a standalone model without the PEFT dependency at inference time, you can merge the adapters:

merged_model = model.merge_and_unload()
merged_model.save_pretrained("MedSLM-SFT-merged")
tokenizer.save_pretrained("MedSLM-SFT-merged")

Alternatively, use the pre-merged version directly: Saminx22/MedSLM-SFT.

Repository Contents

File	Description
`adapter_config.json`	PEFT / LoRA configuration (rank, alpha, target modules, etc.)
`adapter_model.safetensors`	LoRA adapter weights in safetensors format
`tokenizer.json`	Tokenizer vocabulary and merges
`tokenizer_config.json`	Tokenizer configuration

Limitations and Risks

Research only -- not validated for clinical use or patient care.
Small model size (~330M parameters); more prone to hallucinations and factual errors than larger models.
No RLHF, DPO, or other safety alignment has been applied.
Trained for single-turn question answering only; not designed for multi-turn dialogue.
Context length limited to 1,024 tokens.
Training data is English-only; performance on other languages is not expected.

Citation

@misc{medslm-sft-lora-2025,
  title   = {MedSLM-SFT-LoRA: LoRA Adapters for Medical Instruction Tuning},
  author  = {Saminx22},
  year    = {2025},
  publisher = {Hugging Face},
  url     = {https://huggingface.co/Saminx22/MedSLM-SFT-LoRA}
}

Related Repositories

Repository	Description
`Saminx22/MedSLM`	Pre-trained base model
`Saminx22/MedSLM-SFT`	Merged SFT model (LoRA adapters baked in)
`Saminx22/medical_data_for_slm_SFT`	SFT training dataset

Downloads last month: 78

Model tree for Saminx22/MedSLM-SFT-LoRA

Base model

Saminx22/MedSLM

Adapter

(1)

this model

Saminx22
/

MedSLM-SFT-LoRA