MedSLM-SFT-LoRA -- LoRA Adapters for Medical Instruction Tuning

Research Only -- Not for Clinical Use

This model is intended for research and educational purposes only. It must not be used for medical diagnosis, treatment recommendations, or any clinical decision-making.


Overview

This repository contains the LoRA adapter weights (~17.8 MB) produced by supervised fine-tuning (SFT) of the Saminx22/MedSLM base model on medical question-answering data. The adapters can be loaded on top of the base model using the PEFT library.

If you prefer a ready-to-use model that does not require PEFT at inference time, see the merged version: Saminx22/MedSLM-SFT.

Model Details

Property Value
Base model Saminx22/MedSLM
Architecture LLaMA-style (RMSNorm, RoPE, SwiGLU, GQA)
Base model parameters ~330M
Trainable LoRA parameters ~7.1M (3.59% of total)
Adapter size on disk ~17.8 MB
Context length 1,024 tokens
Vocabulary 50,257 (GPT-2 tokenizer)
Fine-tuning method QLoRA (4-bit NF4 quantized base + LoRA adapters)
Training framework Unsloth + TRL SFTTrainer
Hardware Tesla T4 (15.6 GB VRAM)

LoRA Configuration

Parameter Value
Rank (r) 16
Alpha 32
Effective scaling (alpha / r) 2.0
Dropout 0.0
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Bias none

Architecture

The base model uses a LLaMA-style transformer architecture:

  • RMSNorm pre-normalization
  • Rotary Positional Embeddings (RoPE)
  • SwiGLU activation in the feed-forward network
  • Grouped-Query Attention (GQA) with 16 query heads and 8 key-value heads

The base model was pre-trained from scratch on ~148M tokens of medical text (PubMed abstracts, PMC full texts, and clinical guidelines).

Training Details

Dataset

  • Repository: Saminx22/medical_data_for_slm_SFT
  • Splits: 46,166 train / 2,565 validation / 2,565 test
  • Sources: WikiDoc, medical Q&A corpora
  • Average length: ~180 tokens per example

Prompt Template

The model was trained with the following instruction template. You must use this exact format at inference time for best results:

### System:
You are a medical AI assistant. Provide accurate, evidence-based answers to medical questions.

### User:
{question}

### Assistant:
{answer}

Hyperparameters

Hyperparameter Value
Learning rate 2e-4
LR scheduler Cosine decay
Warmup ratio 5%
Batch size (per device) 4
Gradient accumulation steps 8
Effective batch size 32
Epochs 3
Weight decay 0.01
Max gradient norm 1.0
Optimizer AdamW (8-bit)
Sequence packing Enabled
Max sequence length 1,024 tokens
Precision bf16 (fp16 fallback)

Training Results

Metric Value
Total training steps 4,329
Final training loss 2.4678
Training runtime ~43 minutes
Throughput 53.4 samples/sec

How to Use

Requirements

pip install transformers torch peft accelerate bitsandbytes

Loading the LoRA Adapters with PEFT

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

BASE_MODEL_ID = "Saminx22/MedSLM"
LORA_ADAPTER_ID = "Saminx22/MedSLM-SFT-LoRA"

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    torch_dtype=torch.float16,
    device_map="auto",
)

model = PeftModel.from_pretrained(base_model, LORA_ADAPTER_ID)
model.eval()

Generating a Response

SYSTEM_PROMPT = (
    "You are a medical AI assistant. "
    "Provide accurate, evidence-based answers to medical questions."
)

def ask(question: str, max_new_tokens: int = 300) -> str:
    prompt = (
        f"### System:\n{SYSTEM_PROMPT}\n\n"
        f"### User:\n{question}\n\n"
        f"### Assistant:\n"
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.inference_mode():
        output_ids = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
            top_k=50,
            repetition_penalty=1.1,
            pad_token_id=tokenizer.eos_token_id,
        )

    response = output_ids[0][inputs["input_ids"].shape[1]:]
    return tokenizer.decode(response, skip_special_tokens=True).strip()

print(ask("What are the warning signs of a stroke?"))

Merging Adapters into the Base Model

If you want a standalone model without the PEFT dependency at inference time, you can merge the adapters:

merged_model = model.merge_and_unload()
merged_model.save_pretrained("MedSLM-SFT-merged")
tokenizer.save_pretrained("MedSLM-SFT-merged")

Alternatively, use the pre-merged version directly: Saminx22/MedSLM-SFT.

Repository Contents

File Description
adapter_config.json PEFT / LoRA configuration (rank, alpha, target modules, etc.)
adapter_model.safetensors LoRA adapter weights in safetensors format
tokenizer.json Tokenizer vocabulary and merges
tokenizer_config.json Tokenizer configuration

Limitations and Risks

  • Research only -- not validated for clinical use or patient care.
  • Small model size (~330M parameters); more prone to hallucinations and factual errors than larger models.
  • No RLHF, DPO, or other safety alignment has been applied.
  • Trained for single-turn question answering only; not designed for multi-turn dialogue.
  • Context length limited to 1,024 tokens.
  • Training data is English-only; performance on other languages is not expected.

Citation

@misc{medslm-sft-lora-2025,
  title   = {MedSLM-SFT-LoRA: LoRA Adapters for Medical Instruction Tuning},
  author  = {Saminx22},
  year    = {2025},
  publisher = {Hugging Face},
  url     = {https://huggingface.co/Saminx22/MedSLM-SFT-LoRA}
}

Related Repositories

Repository Description
Saminx22/MedSLM Pre-trained base model
Saminx22/MedSLM-SFT Merged SFT model (LoRA adapters baked in)
Saminx22/medical_data_for_slm_SFT SFT training dataset
Downloads last month
78
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Saminx22/MedSLM-SFT-LoRA

Base model

Saminx22/MedSLM
Adapter
(1)
this model

Dataset used to train Saminx22/MedSLM-SFT-LoRA