Text Generation
PEFT
gemma4
medical
healthcare
community-health
lora
unsloth
who-imci
saheli
fhir
function-calling
thinking-mode
semantic-rag

🏥 SAHELI v2 — Gemma 4 E4B Medical Fine-Tune

Smart Adaptive Health Engine for Local Intelligence

A LoRA fine-tune of Google Gemma 4 E4B-it with 7 technical novelties for clinical decision support in low-resource settings.

7 Novel Features

# Novelty Description Based On
1 Transparent Reasoning enable_thinking=True shows step-by-step clinical reasoning chains ArgMed-Agents (arxiv:2403.06294)
2 FHIR Function Calling Native tool use generates HL7 FHIR R4 records with ICD-10 codes MedAgentBench (arxiv:2501.14654)
3 Semantic RAG GTE-small + FAISS vector search over 11 WHO guideline areas MIRAGE (arxiv:2402.13178)
4 Multi-Agent Triage 3-tier severity routing adapts response depth to case complexity MDAgents (NeurIPS 2024, arxiv:2404.15155)
5 Multimodal Single model handles text + images + audio (no separate pipelines) Gemma 4 native
6 Medical Benchmarks Evaluated on MedQA, MedMCQA, PubMedQA, AfriMed-QA v2
7 Edge Deployment GGUF Q4_K_M for Ollama/llama.cpp on $150 Android phone

Model Details

Property Value
Base Model google/gemma-4-E4B-it (8B params, 4.5B effective)
Method QLoRA (4-bit NF4) via Unsloth + TRL SFT
LoRA Rank 16, Alpha 16
Target Modules q/k/v/o_proj, gate/up/down_proj
Training Epochs 3
Learning Rate 2e-4 (cosine)
Max Seq Length 2048
Optimizer AdamW 8-bit

Training Data

Dataset Size Purpose
FreedomIntelligence/medical-o1-reasoning-SFT ~31MB Chain-of-thought clinical reasoning
lavita/medical-qa-datasets 148MB Broad medical QA dialogues

Architecture

Patient Input (Voice / Photo / Text)
     |
[Complexity Triage] → LOW / MODERATE / HIGH
     |
[Gemma 4 E4B + Thinking Mode] → Clinical reasoning
     |
[Semantic RAG: GTE-small + FAISS] → WHO guidelines
     |
[Function Calling: FHIR Tools] → Structured records
     |
Answer + Reasoning Chain + FHIR JSON + Triage Level

Files

File Description
train_saheli.py Complete fine-tuning script (Unsloth + TRL)
app_v2.py Enhanced Gradio app with all 7 novelties
eval_benchmarks.py Medical benchmark evaluation (MedQA/MedMCQA/PubMedQA/AfriMed-QA)
Modelfile Ollama deployment config
setup.sh One-command setup
KAGGLE_WRITEUP.md Hackathon submission writeup

Quick Start

from transformers import AutoProcessor, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("google/gemma-4-E4B-it", dtype="auto", device_map="auto")
processor = AutoProcessor.from_pretrained("google/gemma-4-E4B-it")

messages = [
    {"role": "system", "content": "You are SAHELI, a medical AI for community health workers."},
    {"role": "user", "content": "2-year-old, cough 3 days, breathing fast 52/min, temp 38.5C"}
]

# With thinking mode
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=True)
inputs = processor(text=text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=1.0, top_p=0.95, top_k=64)
response = processor.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=False)
parsed = processor.parse_response(response)
print("Thinking:", parsed.get("thinking", ""))
print("Answer:", parsed.get("answer", ""))

Links

Resource URL
Live Demo HF Spaces
Model HuggingFace
Base Model Gemma 4 E4B-it

Hackathon Tracks

Main Track | Health & Sciences | Digital Equity | Safety & Trust | Unsloth | Ollama | llama.cpp

License

Apache 2.0 (following base model)

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for muthuk1/saheli-gemma4-e4b-medical

Adapter
(71)
this model

Datasets used to train muthuk1/saheli-gemma4-e4b-medical

Papers for muthuk1/saheli-gemma4-e4b-medical