--- library_name: peft pipeline_tag: text-generation tags: - lora - peft - judge - video-evaluation - anonymous-release --- # physground-judger9B — Anonymous Judge LoRA Adapter LoRA adapter trained as a judge model that scores generated videos against prompt-alignment, temporal, persistence, and 13 physical-law sub-rubrics. Released anonymously alongside the companion dataset [`anonymouscla/physground`](https://huggingface.co/datasets/anonymouscla/physground). The base model identifier required to attach this adapter is recorded in `adapter_config.json` (`base_model_name_or_path`); the inference script reads it automatically. ## Files | File | Purpose | | --- | --- | | `adapter_config.json` | PEFT/LoRA config (records base model id) | | `adapter_model.safetensors` | LoRA weights (~167 MB) | | `additional_config.json` | ms-swift extras (lora_dtype / lr ratios) | | `training_args.json` | sanitized training hyperparameters | | `subq+human.yaml` | prompt template used at training and inference time | | `infer.py` | standalone end-to-end inference script | ## Setup ```bash pip install "transformers>=4.49" peft accelerate pyyaml \ "qwen-vl-utils[decord]" huggingface_hub ``` Loading the base model in bf16 needs roughly 24 GB of GPU memory. ## Quickstart — Hugging Face Hub `infer.py` accepts either a local folder or a HF Hub repo id via `--adapter-dir`; the default value already points at this repo, so the following commands work without cloning anything. ```bash # General axes (1–5 each): SA / PTV / persistence python infer.py \ --video /path/to/video.mp4 \ --caption "A ball rolls down a ramp and knocks over a block." \ --metric SA # Physical-law axes (1–5 each): one of the 13 laws below python infer.py \ --video /path/to/video.mp4 \ --caption "A ball rolls down a ramp and knocks over a block." \ --law gravity ``` `infer.py` will: 1. Resolve `--adapter-dir` to a local directory (`huggingface_hub.snapshot_download` if it is a Hub id). 2. Read `adapter_config.json` to find the base model and load it via `transformers`. 3. Attach the LoRA adapter via PEFT. 4. Render the scoring prompt from `subq+human.yaml`, plus the relevant sub-questions / per-law criterion (constants embedded in `infer.py`). 5. Run greedy decoding with `--max-new-tokens 64` (matches training). 6. Parse the JSON object and print the integer score. Output is a single JSON line: ```json {"key": "gravity", "score": 4, "raw": "{\"gravity\": 4}"} ``` `--metric` choices: `SA`, `PTV`, `persistence`. `--law` choices: `gravity`, `inertia`, `momentum`, `impenetrability`, `collision`, `material`, `buoyancy`, `displacement`, `flow_dynamics`, `boundary_interaction`, `fluid_continuity`, `reflection`, `shadow`. Add `--print-prompt` to inspect the exact rendered system + user prompt before generation. ## Programmatic use ```python from pathlib import Path import torch from infer import ( build_messages, build_prompt, decode_generated, load_model, load_yaml, parse_score, prepare_inputs, ) processor, model, adapter_dir = load_model( "anonymouscla/physground-judger9B", dtype=torch.bfloat16, device_map="auto", ) cfg = load_yaml(adapter_dir / "subq+human.yaml") system, user, key = build_prompt( cfg, caption="A ball rolls down a ramp and knocks over a block.", law="gravity", ) messages = build_messages(system, user, Path("video.mp4")) inputs = prepare_inputs( processor, messages, next(model.parameters()).device, fps=2.0, max_pixels=360 * 640, ) with torch.inference_mode(): out = model.generate(**inputs, max_new_tokens=64, do_sample=False) raw = decode_generated(processor, inputs, out) print({"key": key, "score": parse_score(raw, key), "raw": raw}) ``` ## Prompt templates Both training and inference prompts are rendered from two sources: - `subq+human.yaml` — system prompt, the SA / PTV / persistence templates for the general axes, and the `physical_template` shared by all 13 physical-law axes (with `{prompt}`, `{law}`, `{criteria}`, `{questions_block}` placeholders). Use `--print-prompt` to dump the fully rendered system + user prompt. - `infer.py` — the per-axis sub-question lists (`GENERAL_SUB_QUESTIONS`, `PHYSICAL_SUB_QUESTIONS`) and per-law criteria (`PHYSICAL_CRITERIA`) that are spliced into the YAML templates. Override any criterion at inference time with `--criteria "..."` instead of editing the source. The judge always replies with a single JSON object containing one key (the metric or law name) and an integer score in 1–5. ## Training summary LoRA via PEFT (rank 32, α 64, dropout 0.05) over the language-tower linear layers, vision encoder frozen, bf16 + gradient checkpointing, AdamW lr = 1e-4 cosine, 1.0 epoch / 294 steps on the `subq+human` split (automatically derived sub-question judgements + human-rated samples). Full hyperparameters in `training_args.json` and `additional_config.json`; exact LoRA target regex and rank in `adapter_config.json`. Framework: ms-swift 4.1.2, PEFT 0.19.1, DeepSpeed ZeRO-2. See the companion dataset [`anonymouscla/physground`](https://huggingface.co/datasets/anonymouscla/physground) for prompts, physical-law tags, and example videos. ## License The base model is released by its respective authors; this LoRA adapter is shared for anonymous review purposes. No identifying metadata is included.