docs: drop verbatim YAML/JSON dumps; point to source files

c59fc70 verified 15 days ago

5.45 kB

	---
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- lora
	- peft
	- judge
	- video-evaluation
	- anonymous-release
	---

	# physground-judger9B — Anonymous Judge LoRA Adapter

	LoRA adapter trained as a judge model that scores generated videos against
	prompt-alignment, temporal, persistence, and 13 physical-law sub-rubrics.
	Released anonymously alongside the companion dataset
	[`anonymouscla/physground`](https://huggingface.co/datasets/anonymouscla/physground).

	The base model identifier required to attach this adapter is recorded in
	`adapter_config.json` (`base_model_name_or_path`); the inference script
	reads it automatically.

	## Files

	\| File \| Purpose \|
	\| --- \| --- \|
	\| `adapter_config.json` \| PEFT/LoRA config (records base model id) \|
	\| `adapter_model.safetensors` \| LoRA weights (~167 MB) \|
	\| `additional_config.json` \| ms-swift extras (lora_dtype / lr ratios) \|
	\| `training_args.json` \| sanitized training hyperparameters \|
	\| `subq+human.yaml` \| prompt template used at training and inference time \|
	\| `infer.py` \| standalone end-to-end inference script \|

	## Setup

	```bash
	pip install "transformers>=4.49" peft accelerate pyyaml \
	"qwen-vl-utils[decord]" huggingface_hub
	```

	Loading the base model in bf16 needs roughly 24 GB of GPU memory.

	## Quickstart — Hugging Face Hub

	`infer.py` accepts either a local folder or a HF Hub repo id via
	`--adapter-dir`; the default value already points at this repo, so the
	following commands work without cloning anything.

	```bash
	# General axes (1–5 each): SA / PTV / persistence
	python infer.py \
	--video /path/to/video.mp4 \
	--caption "A ball rolls down a ramp and knocks over a block." \
	--metric SA

	# Physical-law axes (1–5 each): one of the 13 laws below
	python infer.py \
	--video /path/to/video.mp4 \
	--caption "A ball rolls down a ramp and knocks over a block." \
	--law gravity
	```

	`infer.py` will:

	1. Resolve `--adapter-dir` to a local directory (`huggingface_hub.snapshot_download`
	if it is a Hub id).
	2. Read `adapter_config.json` to find the base model and load it via
	`transformers`.
	3. Attach the LoRA adapter via PEFT.
	4. Render the scoring prompt from `subq+human.yaml`, plus the relevant
	sub-questions / per-law criterion (constants embedded in `infer.py`).
	5. Run greedy decoding with `--max-new-tokens 64` (matches training).
	6. Parse the JSON object and print the integer score.

	Output is a single JSON line:

	```json
	{"key": "gravity", "score": 4, "raw": "{\"gravity\": 4}"}
	```

	`--metric` choices: `SA`, `PTV`, `persistence`.
	`--law` choices: `gravity`, `inertia`, `momentum`, `impenetrability`,
	`collision`, `material`, `buoyancy`, `displacement`, `flow_dynamics`,
	`boundary_interaction`, `fluid_continuity`, `reflection`, `shadow`.

	Add `--print-prompt` to inspect the exact rendered system + user prompt
	before generation.

	## Programmatic use

	```python
	from pathlib import Path
	import torch

	from infer import (
	build_messages,
	build_prompt,
	decode_generated,
	load_model,
	load_yaml,
	parse_score,
	prepare_inputs,
	)

	processor, model, adapter_dir = load_model(
	"anonymouscla/physground-judger9B",
	dtype=torch.bfloat16,
	device_map="auto",
	)
	cfg = load_yaml(adapter_dir / "subq+human.yaml")

	system, user, key = build_prompt(
	cfg,
	caption="A ball rolls down a ramp and knocks over a block.",
	law="gravity",
	)
	messages = build_messages(system, user, Path("video.mp4"))
	inputs = prepare_inputs(
	processor,
	messages,
	next(model.parameters()).device,
	fps=2.0,
	max_pixels=360 * 640,
	)

	with torch.inference_mode():
	out = model.generate(**inputs, max_new_tokens=64, do_sample=False)

	raw = decode_generated(processor, inputs, out)
	print({"key": key, "score": parse_score(raw, key), "raw": raw})
	```

	## Prompt templates

	Both training and inference prompts are rendered from two sources:

	- `subq+human.yaml` — system prompt, the SA / PTV / persistence templates
	for the general axes, and the `physical_template` shared by all 13
	physical-law axes (with `{prompt}`, `{law}`, `{criteria}`,
	`{questions_block}` placeholders). Use `--print-prompt` to dump the
	fully rendered system + user prompt.
	- `infer.py` — the per-axis sub-question lists (`GENERAL_SUB_QUESTIONS`,
	`PHYSICAL_SUB_QUESTIONS`) and per-law criteria (`PHYSICAL_CRITERIA`)
	that are spliced into the YAML templates. Override any criterion at
	inference time with `--criteria "..."` instead of editing the source.

	The judge always replies with a single JSON object containing one key
	(the metric or law name) and an integer score in 1–5.

	## Training summary

	LoRA via PEFT (rank 32, α 64, dropout 0.05) over the language-tower
	linear layers, vision encoder frozen, bf16 + gradient checkpointing,
	AdamW lr = 1e-4 cosine, 1.0 epoch / 294 steps on the `subq+human` split
	(automatically derived sub-question judgements + human-rated samples).
	Full hyperparameters in `training_args.json` and `additional_config.json`;
	exact LoRA target regex and rank in `adapter_config.json`. Framework:
	ms-swift 4.1.2, PEFT 0.19.1, DeepSpeed ZeRO-2.

	See the companion dataset
	[`anonymouscla/physground`](https://huggingface.co/datasets/anonymouscla/physground)
	for prompts, physical-law tags, and example videos.

	## License

	The base model is released by its respective authors; this LoRA adapter
	is shared for anonymous review purposes. No identifying metadata is
	included.