LlamaTron RS1 Nemesis 1B

Base Model: meta-llama/Llama-3.2-1B-Instruct Dataset: OpenMed/Medical-Reasoning-SFT-MiniMax-M2.1


Model Overview

LlamaTron RS1 Nemesis is a medical reasoning model produced by fine-tuning meta-llama/Llama-3.2-1B-Instruct on the Medical-Reasoning-SFT-MiniMax-M2.1 dataset using QLoRA. The dataset contains 204,773 clinical reasoning conversations with full chain-of-thought traces covering differential diagnosis, treatment planning, pharmacology, and clinical case analysis.

Despite being a 1 billion parameter model, it handles complex clinical questions with structured and coherent reasoning.


Demo Screenshots

Info

y3msQ

Interface

1

Model Response Example

2


Training Setup

Parameter Value
Base Model meta-llama/Llama-3.2-1B-Instruct
GPU NVIDIA H200
Method QLoRA (4-bit NF4 + LoRA)
LoRA Rank r=8, alpha=16
LoRA Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
LoRA Dropout 0.05
Trainable Parameters 5.6M out of 1.24B (0.45%)
Effective Batch Size 32 (8 per device x 4 gradient accumulation)
Learning Rate 2e-4
LR Scheduler Cosine
Warmup Ratio 0.05
Optimizer paged_adamw_8bit
Max Sequence Length 512
Precision bf16 + tf32
Epochs 1
Total Steps 6,271
Training Time 3 hours 59 minutes

Training Results

Step Train Loss Validation Loss
500 1.5759 1.6126
1000 1.5176 1.5538
1500 1.4805 1.5256
2000 1.4795 1.5060
2500 1.4508 1.4939
3000 1.4534 1.4815
3500 1.4384 1.4739
4000 1.4228 1.4663
4500 1.4251 1.4605
5000 1.4301 1.4567
5500 1.4102 1.4545
6000 1.4246 1.4538
6271 1.4200 1.4500

Loss decreased consistently across all steps with train and validation loss tracking closely. No overfitting observed.


Dataset

Trained on Medical-Reasoning-SFT-MiniMax-M2.1 released by Maziyar Panahi under the OpenMed initiative.

Property Value
Total Samples 204,773
Estimated Tokens ~621 Million
Format Multi-turn chat with chain-of-thought reasoning
License Apache 2.0
Topics Differential diagnosis, treatment planning, pharmacology, clinical case analysis

How to Use

Load the Model

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "Rumiii/LlamaTron_RS1_Nemesis_1B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

messages = [
    {
        "role": "system",
        "content": "You are LlamaTron RS1 Nemesis, a knowledgeable and compassionate medical AI assistant. Provide accurate, evidence-based medical information clearly and helpfully."
    },
    {
        "role": "user",
        "content": "What are the early symptoms of Type 2 Diabetes?"
    },
]

output = pipe(
    messages,
    max_new_tokens=400,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)

print(output[0]["generated_text"][-1]["content"])

Repository

The full training code, merging scripts, and inference interface are available on GitHub: github.com/sufirumii/LlamaTron-RS1-Nemesis-1B

GitHub

Sample


Limitations

  • This model is intended for research and educational purposes only
  • It is not a substitute for professional medical advice, diagnosis, or treatment
  • The model was trained with a maximum sequence length of 512 tokens which may limit performance on longer clinical texts
  • Always consult a qualified healthcare provider for medical decisions

Credits

  • Dataset: Maziyar Panahi and the OpenMed initiative for releasing the Medical-Reasoning-SFT-MiniMax-M2.1 dataset under Apache 2.0
  • Base Model: Meta AI for releasing Llama-3.2-1B-Instruct
  • Libraries: Hugging Face Transformers, PEFT, TRL, BitsAndBytes, Accelerate

License

Apache 2.0 — see LICENSE for details.

Downloads last month
614
Safetensors
Model size
1B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Rumiii/LlamaTron-RS1-Nemesis-1B

Finetuned
(1601)
this model
Merges
1 model
Quantizations
1 model

Dataset used to train Rumiii/LlamaTron-RS1-Nemesis-1B