🩺 House M.D. — SFT (Gemma 3 4B-IT)

A LoRA adapter for unsloth/gemma-3-4b-it that warm-starts the policy used as the GRPO seed in the House M.D. clinical-reasoning RL environment (Apr '26 Meta OpenEnv Hackathon submission).

It learns to emit valid OpenEnv actions for the SnehShah/house-md-env Space — INTERVIEW, EXAMINE, ORDER_TEST, UPDATE_DIFFERENTIAL, DIAGNOSE — so GRPO doesn't have to learn the format-rubric and the action-vocab from scratch.

GitHub repo (training pipeline, notebooks, eval, blog): https://github.com/sneh2909/Overfitters Live env: https://huggingface.co/spaces/SnehShah/house-md-env W&B run (production GRPO that built on this adapter): https://wandb.ai/sneh2909-christ-university/house-md?nw=nwusersneh2909

How it was trained


Base model	`unsloth/gemma-3-4b-it-unsloth-bnb-4bit`
Method	Supervised fine-tuning on oracle traces (TRL `SFTTrainer` + Unsloth)
Adapter	LoRA, r=16, alpha=32, target=`q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj`
Dataset	`SnehShah/house-md-sft-data` (oracle trajectories)
Examples	2,151 (sft_dataset.jsonl)
Epochs	1
Eff. batch size	32 (per_device=2 × grad_accum=16)
LR	2e-4, cosine, warmup ratio 0.05
Seq length	4096
Loss on completion only	yes
Hardware	HF Jobs L4 ×1 (~12 min)

Training script: scripts/train_sft.py. Reproduction notebook: notebooks/02_sft.ipynb.

Use

from unsloth import FastLanguageModel

model, tok = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gemma-3-4b-it-unsloth-bnb-4bit",
    max_seq_length = 4096,
    load_in_4bit = True,
)
model.load_adapter("SnehShah/house-md-sft-gemma3-4b")
FastLanguageModel.for_inference(model)

# Then run an episode against the live Space:
from house_md_env import HouseMDEnv, HouseMDAction
with HouseMDEnv(base_url="https://snehshah-house-md-env.hf.space") as env:
    res = env.reset(seed=0)
    # ...feed res.observation to the prompt builder, decode an action, env.step(...)

See scripts/eval_hf.py for the exact harness used to produce results/eval_sft.json.

License

Apache 2.0. The base model carries the Gemma terms of use; this adapter is bound by them too.

Downloads last month: 47

Model tree for SnehShah/house-md-sft-gemma3-4b

Base model

google/gemma-3-4b-pt

Finetuned

google/gemma-3-4b-it

Quantized

unsloth/gemma-3-4b-it-unsloth-bnb-4bit

Adapter

(84)

this model

SnehShah
/

house-md-sft-gemma3-4b

🩺 House M.D. — SFT (Gemma 3 4B-IT)

How it was trained

Use

License

Model tree for SnehShah/house-md-sft-gemma3-4b

Dataset used to train SnehShah/house-md-sft-gemma3-4b