physground-judger9B β Anonymous Judge LoRA Adapter
LoRA adapter trained as a judge model that scores generated videos against
prompt-alignment, temporal, persistence, and 13 physical-law sub-rubrics.
Released anonymously alongside the companion dataset
anonymouscla/physground.
The base model identifier required to attach this adapter is recorded in
adapter_config.json (base_model_name_or_path); the inference script
reads it automatically.
Files
| File | Purpose |
|---|---|
adapter_config.json |
PEFT/LoRA config (records base model id) |
adapter_model.safetensors |
LoRA weights (~167 MB) |
additional_config.json |
ms-swift extras (lora_dtype / lr ratios) |
training_args.json |
sanitized training hyperparameters |
subq+human.yaml |
prompt template used at training and inference time |
infer.py |
standalone end-to-end inference script |
Setup
pip install "transformers>=4.49" peft accelerate pyyaml \
"qwen-vl-utils[decord]" huggingface_hub
Loading the base model in bf16 needs roughly 24 GB of GPU memory.
Quickstart β Hugging Face Hub
infer.py accepts either a local folder or a HF Hub repo id via
--adapter-dir; the default value already points at this repo, so the
following commands work without cloning anything.
# General axes (1β5 each): SA / PTV / persistence
python infer.py \
--video /path/to/video.mp4 \
--caption "A ball rolls down a ramp and knocks over a block." \
--metric SA
# Physical-law axes (1β5 each): one of the 13 laws below
python infer.py \
--video /path/to/video.mp4 \
--caption "A ball rolls down a ramp and knocks over a block." \
--law gravity
infer.py will:
- Resolve
--adapter-dirto a local directory (huggingface_hub.snapshot_downloadif it is a Hub id). - Read
adapter_config.jsonto find the base model and load it viatransformers. - Attach the LoRA adapter via PEFT.
- Render the scoring prompt from
subq+human.yaml, plus the relevant sub-questions / per-law criterion (constants embedded ininfer.py). - Run greedy decoding with
--max-new-tokens 64(matches training). - Parse the JSON object and print the integer score.
Output is a single JSON line:
{"key": "gravity", "score": 4, "raw": "{\"gravity\": 4}"}
--metric choices: SA, PTV, persistence.
--law choices: gravity, inertia, momentum, impenetrability,
collision, material, buoyancy, displacement, flow_dynamics,
boundary_interaction, fluid_continuity, reflection, shadow.
Add --print-prompt to inspect the exact rendered system + user prompt
before generation.
Programmatic use
from pathlib import Path
import torch
from infer import (
build_messages,
build_prompt,
decode_generated,
load_model,
load_yaml,
parse_score,
prepare_inputs,
)
processor, model, adapter_dir = load_model(
"anonymouscla/physground-judger9B",
dtype=torch.bfloat16,
device_map="auto",
)
cfg = load_yaml(adapter_dir / "subq+human.yaml")
system, user, key = build_prompt(
cfg,
caption="A ball rolls down a ramp and knocks over a block.",
law="gravity",
)
messages = build_messages(system, user, Path("video.mp4"))
inputs = prepare_inputs(
processor,
messages,
next(model.parameters()).device,
fps=2.0,
max_pixels=360 * 640,
)
with torch.inference_mode():
out = model.generate(**inputs, max_new_tokens=64, do_sample=False)
raw = decode_generated(processor, inputs, out)
print({"key": key, "score": parse_score(raw, key), "raw": raw})
Prompt templates
Both training and inference prompts are rendered from two sources:
subq+human.yamlβ system prompt, the SA / PTV / persistence templates for the general axes, and thephysical_templateshared by all 13 physical-law axes (with{prompt},{law},{criteria},{questions_block}placeholders). Use--print-promptto dump the fully rendered system + user prompt.infer.pyβ the per-axis sub-question lists (GENERAL_SUB_QUESTIONS,PHYSICAL_SUB_QUESTIONS) and per-law criteria (PHYSICAL_CRITERIA) that are spliced into the YAML templates. Override any criterion at inference time with--criteria "..."instead of editing the source.
The judge always replies with a single JSON object containing one key (the metric or law name) and an integer score in 1β5.
Training summary
LoRA via PEFT (rank 32, Ξ± 64, dropout 0.05) over the language-tower
linear layers, vision encoder frozen, bf16 + gradient checkpointing,
AdamW lr = 1e-4 cosine, 1.0 epoch / 294 steps on the subq+human split
(automatically derived sub-question judgements + human-rated samples).
Full hyperparameters in training_args.json and additional_config.json;
exact LoRA target regex and rank in adapter_config.json. Framework:
ms-swift 4.1.2, PEFT 0.19.1, DeepSpeed ZeRO-2.
See the companion dataset
anonymouscla/physground
for prompts, physical-law tags, and example videos.
License
The base model is released by its respective authors; this LoRA adapter is shared for anonymous review purposes. No identifying metadata is included.
- Downloads last month
- 58
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B") model = PeftModel.from_pretrained(base_model, "anonymouscla/phyjudge-9B")