| --- |
| library_name: peft |
| pipeline_tag: text-generation |
| tags: |
| - lora |
| - peft |
| - judge |
| - video-evaluation |
| - anonymous-release |
| --- |
| |
| # physground-judger9B β Anonymous Judge LoRA Adapter |
|
|
| LoRA adapter trained as a judge model that scores generated videos against |
| prompt-alignment, temporal, persistence, and 13 physical-law sub-rubrics. |
| Released anonymously alongside the companion dataset |
| [`anonymouscla/physground`](https://huggingface.co/datasets/anonymouscla/physground). |
|
|
| The base model identifier required to attach this adapter is recorded in |
| `adapter_config.json` (`base_model_name_or_path`); the inference script |
| reads it automatically. |
|
|
| ## Files |
|
|
| | File | Purpose | |
| | --- | --- | |
| | `adapter_config.json` | PEFT/LoRA config (records base model id) | |
| | `adapter_model.safetensors` | LoRA weights (~167 MB) | |
| | `additional_config.json` | ms-swift extras (lora_dtype / lr ratios) | |
| | `training_args.json` | sanitized training hyperparameters | |
| | `subq+human.yaml` | prompt template used at training and inference time | |
| | `infer.py` | standalone end-to-end inference script | |
|
|
| ## Setup |
|
|
| ```bash |
| pip install "transformers>=4.49" peft accelerate pyyaml \ |
| "qwen-vl-utils[decord]" huggingface_hub |
| ``` |
|
|
| Loading the base model in bf16 needs roughly 24 GB of GPU memory. |
|
|
| ## Quickstart β Hugging Face Hub |
|
|
| `infer.py` accepts either a local folder or a HF Hub repo id via |
| `--adapter-dir`; the default value already points at this repo, so the |
| following commands work without cloning anything. |
|
|
| ```bash |
| # General axes (1β5 each): SA / PTV / persistence |
| python infer.py \ |
| --video /path/to/video.mp4 \ |
| --caption "A ball rolls down a ramp and knocks over a block." \ |
| --metric SA |
| |
| # Physical-law axes (1β5 each): one of the 13 laws below |
| python infer.py \ |
| --video /path/to/video.mp4 \ |
| --caption "A ball rolls down a ramp and knocks over a block." \ |
| --law gravity |
| ``` |
|
|
| `infer.py` will: |
|
|
| 1. Resolve `--adapter-dir` to a local directory (`huggingface_hub.snapshot_download` |
| if it is a Hub id). |
| 2. Read `adapter_config.json` to find the base model and load it via |
| `transformers`. |
| 3. Attach the LoRA adapter via PEFT. |
| 4. Render the scoring prompt from `subq+human.yaml`, plus the relevant |
| sub-questions / per-law criterion (constants embedded in `infer.py`). |
| 5. Run greedy decoding with `--max-new-tokens 64` (matches training). |
| 6. Parse the JSON object and print the integer score. |
|
|
| Output is a single JSON line: |
|
|
| ```json |
| {"key": "gravity", "score": 4, "raw": "{\"gravity\": 4}"} |
| ``` |
|
|
| `--metric` choices: `SA`, `PTV`, `persistence`. |
| `--law` choices: `gravity`, `inertia`, `momentum`, `impenetrability`, |
| `collision`, `material`, `buoyancy`, `displacement`, `flow_dynamics`, |
| `boundary_interaction`, `fluid_continuity`, `reflection`, `shadow`. |
|
|
| Add `--print-prompt` to inspect the exact rendered system + user prompt |
| before generation. |
|
|
| ## Programmatic use |
|
|
| ```python |
| from pathlib import Path |
| import torch |
| |
| from infer import ( |
| build_messages, |
| build_prompt, |
| decode_generated, |
| load_model, |
| load_yaml, |
| parse_score, |
| prepare_inputs, |
| ) |
| |
| processor, model, adapter_dir = load_model( |
| "anonymouscla/physground-judger9B", |
| dtype=torch.bfloat16, |
| device_map="auto", |
| ) |
| cfg = load_yaml(adapter_dir / "subq+human.yaml") |
| |
| system, user, key = build_prompt( |
| cfg, |
| caption="A ball rolls down a ramp and knocks over a block.", |
| law="gravity", |
| ) |
| messages = build_messages(system, user, Path("video.mp4")) |
| inputs = prepare_inputs( |
| processor, |
| messages, |
| next(model.parameters()).device, |
| fps=2.0, |
| max_pixels=360 * 640, |
| ) |
| |
| with torch.inference_mode(): |
| out = model.generate(**inputs, max_new_tokens=64, do_sample=False) |
| |
| raw = decode_generated(processor, inputs, out) |
| print({"key": key, "score": parse_score(raw, key), "raw": raw}) |
| ``` |
|
|
| ## Prompt templates |
|
|
| Both training and inference prompts are rendered from two sources: |
|
|
| - `subq+human.yaml` β system prompt, the SA / PTV / persistence templates |
| for the general axes, and the `physical_template` shared by all 13 |
| physical-law axes (with `{prompt}`, `{law}`, `{criteria}`, |
| `{questions_block}` placeholders). Use `--print-prompt` to dump the |
| fully rendered system + user prompt. |
| - `infer.py` β the per-axis sub-question lists (`GENERAL_SUB_QUESTIONS`, |
| `PHYSICAL_SUB_QUESTIONS`) and per-law criteria (`PHYSICAL_CRITERIA`) |
| that are spliced into the YAML templates. Override any criterion at |
| inference time with `--criteria "..."` instead of editing the source. |
|
|
| The judge always replies with a single JSON object containing one key |
| (the metric or law name) and an integer score in 1β5. |
|
|
| ## Training summary |
|
|
| LoRA via PEFT (rank 32, Ξ± 64, dropout 0.05) over the language-tower |
| linear layers, vision encoder frozen, bf16 + gradient checkpointing, |
| AdamW lr = 1e-4 cosine, 1.0 epoch / 294 steps on the `subq+human` split |
| (automatically derived sub-question judgements + human-rated samples). |
| Full hyperparameters in `training_args.json` and `additional_config.json`; |
| exact LoRA target regex and rank in `adapter_config.json`. Framework: |
| ms-swift 4.1.2, PEFT 0.19.1, DeepSpeed ZeRO-2. |
|
|
| See the companion dataset |
| [`anonymouscla/physground`](https://huggingface.co/datasets/anonymouscla/physground) |
| for prompts, physical-law tags, and example videos. |
|
|
| ## License |
|
|
| The base model is released by its respective authors; this LoRA adapter |
| is shared for anonymous review purposes. No identifying metadata is |
| included. |
|
|