🩺 House M.D. — SFT (Gemma 3 4B-IT)

A LoRA adapter for unsloth/gemma-3-4b-it that warm-starts the policy used as the GRPO seed in the House M.D. clinical-reasoning RL environment (Apr '26 Meta OpenEnv Hackathon submission).

It learns to emit valid OpenEnv actions for the SnehShah/house-md-env Space — INTERVIEW, EXAMINE, ORDER_TEST, UPDATE_DIFFERENTIAL, DIAGNOSE — so GRPO doesn't have to learn the format-rubric and the action-vocab from scratch.

GitHub repo (training pipeline, notebooks, eval, blog): https://github.com/sneh2909/Overfitters Live env: https://huggingface.co/spaces/SnehShah/house-md-env W&B run (production GRPO that built on this adapter): https://wandb.ai/sneh2909-christ-university/house-md?nw=nwusersneh2909


How it was trained

Base model unsloth/gemma-3-4b-it-unsloth-bnb-4bit
Method Supervised fine-tuning on oracle traces (TRL SFTTrainer + Unsloth)
Adapter LoRA, r=16, alpha=32, target=q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj
Dataset SnehShah/house-md-sft-data (oracle trajectories)
Examples 2,151 (sft_dataset.jsonl)
Epochs 1
Eff. batch size 32 (per_device=2 × grad_accum=16)
LR 2e-4, cosine, warmup ratio 0.05
Seq length 4096
Loss on completion only yes
Hardware HF Jobs L4 ×1 (~12 min)

Training script: scripts/train_sft.py. Reproduction notebook: notebooks/02_sft.ipynb.


Use

from unsloth import FastLanguageModel

model, tok = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gemma-3-4b-it-unsloth-bnb-4bit",
    max_seq_length = 4096,
    load_in_4bit = True,
)
model.load_adapter("SnehShah/house-md-sft-gemma3-4b")
FastLanguageModel.for_inference(model)

# Then run an episode against the live Space:
from house_md_env import HouseMDEnv, HouseMDAction
with HouseMDEnv(base_url="https://snehshah-house-md-env.hf.space") as env:
    res = env.reset(seed=0)
    # ...feed res.observation to the prompt builder, decode an action, env.step(...)

See scripts/eval_hf.py for the exact harness used to produce results/eval_sft.json.


License

Apache 2.0. The base model carries the Gemma terms of use; this adapter is bound by them too.

Downloads last month
47
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SnehShah/house-md-sft-gemma3-4b

Adapter
(84)
this model

Dataset used to train SnehShah/house-md-sft-gemma3-4b