LlamaTron RS1 Nemesis 1B

Base Model: meta-llama/Llama-3.2-1B-Instruct Dataset: OpenMed/Medical-Reasoning-SFT-MiniMax-M2.1

Model Overview

LlamaTron RS1 Nemesis is a medical reasoning model produced by fine-tuning meta-llama/Llama-3.2-1B-Instruct on the Medical-Reasoning-SFT-MiniMax-M2.1 dataset using QLoRA. The dataset contains 204,773 clinical reasoning conversations with full chain-of-thought traces covering differential diagnosis, treatment planning, pharmacology, and clinical case analysis.

Despite being a 1 billion parameter model, it handles complex clinical questions with structured and coherent reasoning.

Demo Screenshots

Info

Interface

Model Response Example

Training Setup

Parameter	Value
Base Model	meta-llama/Llama-3.2-1B-Instruct
GPU	NVIDIA H200
Method	QLoRA (4-bit NF4 + LoRA)
LoRA Rank	r=8, alpha=16
LoRA Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
LoRA Dropout	0.05
Trainable Parameters	5.6M out of 1.24B (0.45%)
Effective Batch Size	32 (8 per device x 4 gradient accumulation)
Learning Rate	2e-4
LR Scheduler	Cosine
Warmup Ratio	0.05
Optimizer	paged_adamw_8bit
Max Sequence Length	512
Precision	bf16 + tf32
Epochs	1
Total Steps	6,271
Training Time	3 hours 59 minutes

Training Results

Step	Train Loss	Validation Loss
500	1.5759	1.6126
1000	1.5176	1.5538
1500	1.4805	1.5256
2000	1.4795	1.5060
2500	1.4508	1.4939
3000	1.4534	1.4815
3500	1.4384	1.4739
4000	1.4228	1.4663
4500	1.4251	1.4605
5000	1.4301	1.4567
5500	1.4102	1.4545
6000	1.4246	1.4538
6271	1.4200	1.4500

Loss decreased consistently across all steps with train and validation loss tracking closely. No overfitting observed.

Dataset

Trained on Medical-Reasoning-SFT-MiniMax-M2.1 released by Maziyar Panahi under the OpenMed initiative.

Property	Value
Total Samples	204,773
Estimated Tokens	~621 Million
Format	Multi-turn chat with chain-of-thought reasoning
License	Apache 2.0
Topics	Differential diagnosis, treatment planning, pharmacology, clinical case analysis

How to Use

Load the Model

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "Rumiii/LlamaTron_RS1_Nemesis_1B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

messages = [
    {
        "role": "system",
        "content": "You are LlamaTron RS1 Nemesis, a knowledgeable and compassionate medical AI assistant. Provide accurate, evidence-based medical information clearly and helpfully."
    },
    {
        "role": "user",
        "content": "What are the early symptoms of Type 2 Diabetes?"
    },
]

output = pipe(
    messages,
    max_new_tokens=400,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)

print(output[0]["generated_text"][-1]["content"])

Repository

The full training code, merging scripts, and inference interface are available on GitHub: github.com/sufirumii/LlamaTron-RS1-Nemesis-1B

GitHub

Limitations

This model is intended for research and educational purposes only
It is not a substitute for professional medical advice, diagnosis, or treatment
The model was trained with a maximum sequence length of 512 tokens which may limit performance on longer clinical texts
Always consult a qualified healthcare provider for medical decisions

Credits

Dataset: Maziyar Panahi and the OpenMed initiative for releasing the Medical-Reasoning-SFT-MiniMax-M2.1 dataset under Apache 2.0
Base Model: Meta AI for releasing Llama-3.2-1B-Instruct
Libraries: Hugging Face Transformers, PEFT, TRL, BitsAndBytes, Accelerate

License

Apache 2.0 — see LICENSE for details.

Downloads last month: 614

Safetensors

Model size

1B params

Tensor type

F16

Model tree for Rumiii/LlamaTron-RS1-Nemesis-1B

Base model

meta-llama/Llama-3.2-1B-Instruct

Finetuned

(1601)

this model

Merges

1 model

Quantizations