🩺 House M.D. — SFT (Gemma 3 4B-IT)
A LoRA adapter for unsloth/gemma-3-4b-it that warm-starts the policy used as the GRPO seed in the House M.D. clinical-reasoning RL environment (Apr '26 Meta OpenEnv Hackathon submission).
It learns to emit valid OpenEnv actions for the SnehShah/house-md-env Space — INTERVIEW, EXAMINE, ORDER_TEST, UPDATE_DIFFERENTIAL, DIAGNOSE — so GRPO doesn't have to learn the format-rubric and the action-vocab from scratch.
GitHub repo (training pipeline, notebooks, eval, blog): https://github.com/sneh2909/Overfitters Live env: https://huggingface.co/spaces/SnehShah/house-md-env W&B run (production GRPO that built on this adapter): https://wandb.ai/sneh2909-christ-university/house-md?nw=nwusersneh2909
How it was trained
| Base model | unsloth/gemma-3-4b-it-unsloth-bnb-4bit |
| Method | Supervised fine-tuning on oracle traces (TRL SFTTrainer + Unsloth) |
| Adapter | LoRA, r=16, alpha=32, target=q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj |
| Dataset | SnehShah/house-md-sft-data (oracle trajectories) |
| Examples | 2,151 (sft_dataset.jsonl) |
| Epochs | 1 |
| Eff. batch size | 32 (per_device=2 × grad_accum=16) |
| LR | 2e-4, cosine, warmup ratio 0.05 |
| Seq length | 4096 |
| Loss on completion only | yes |
| Hardware | HF Jobs L4 ×1 (~12 min) |
Training script: scripts/train_sft.py.
Reproduction notebook: notebooks/02_sft.ipynb.
Use
from unsloth import FastLanguageModel
model, tok = FastLanguageModel.from_pretrained(
model_name = "unsloth/gemma-3-4b-it-unsloth-bnb-4bit",
max_seq_length = 4096,
load_in_4bit = True,
)
model.load_adapter("SnehShah/house-md-sft-gemma3-4b")
FastLanguageModel.for_inference(model)
# Then run an episode against the live Space:
from house_md_env import HouseMDEnv, HouseMDAction
with HouseMDEnv(base_url="https://snehshah-house-md-env.hf.space") as env:
res = env.reset(seed=0)
# ...feed res.observation to the prompt builder, decode an action, env.step(...)
See scripts/eval_hf.py for the exact harness used to produce results/eval_sft.json.
License
Apache 2.0. The base model carries the Gemma terms of use; this adapter is bound by them too.
- Downloads last month
- 47
Model tree for SnehShah/house-md-sft-gemma3-4b
Base model
google/gemma-3-4b-pt