NU-World-Model-Embodied-AI
/

phyjudge-9B

Text Generation

video-evaluation

Model card Files Files and versions

phyjudge-9B / subq+human.yaml

juyil's picture

Initial release: phyjudge-9B LoRA judge adapter

4aee60a verified 3 days ago

history blame contribute delete

2.69 kB

	scheme: subq_hint
	description: \|-
	JSON-only per-task prompts with observable sub-questions/checklists. The
	subq+human setting uses sub-question prompts as input and human scores as
	training targets.
	sub_questions:
	source: static
	answer_format: hint
	system_prompt: You are a strict video evaluation model.
	general_keys:
	- SA
	- PTV
	- persistence
	eval_prompts:
	SA: \|-
	Evaluate Prompt Alignment (SA).

	Caption:
	"{prompt}"

	The video was generated using a text+image-to-video (ti2v) model, conditioned on the first frame and the text prompt above.

	Sub-questions to consider in your mind before scoring:
	{questions_block}

	Score 1-5:
	5=fully aligned
	4=mostly aligned with minor deviations
	3=partially aligned with notable gaps
	2=mostly misaligned
	1=not aligned

	Then output ONLY a JSON object with exactly one key: SA.

	Example:
	{{"SA": 3}}
	PTV: \|-
	Evaluate Temporal Coherence (PTV).

	Caption:
	"{prompt}"

	The video was generated using a text+image-to-video (ti2v) model, conditioned on the first frame and the text prompt above.

	Sub-questions to consider in your mind before scoring:
	{questions_block}

	Score 1-5:
	5=fully plausible event order
	4=mostly plausible with minor timing issues
	3=partially plausible
	2=mostly implausible
	1=completely implausible order

	Then output ONLY a JSON object with exactly one key: PTV.

	Example:
	{{"PTV": 4}}
	persistence: \|-
	Evaluate Object Persistence.

	Caption, for context only:
	"{prompt}"

	The video was generated using a text+image-to-video (ti2v) model, conditioned on the first frame and the text prompt above.

	Sub-questions to consider in your mind before scoring:
	{questions_block}

	Score 1-5:
	5=fully consistent
	4=mostly consistent with minor flicker
	3=noticeable issues
	2=major inconsistencies
	1=severe disappearance or identity changes

	Then output ONLY a JSON object with exactly one key: persistence.

	Example:
	{{"persistence": 4}}
	physical_sub_questions: true
	physical_template: \|-
	Evaluate physical realism for one physical law: {law}.

	Criterion:
	{criteria}

	Caption, for context only:
	"{prompt}"

	Sub-questions to consider in your mind before scoring:
	{questions_block}

	Judge the video itself. Do not penalize prompt mismatch unless it affects whether this physical law can be evaluated.

	Score 1-5:
	5=clearly correct
	4=mostly correct with minor issues
	3=partially correct or ambiguous
	2=mostly incorrect
	1=severely incorrect

	Then output ONLY a JSON object with exactly one key: {law}.

	Example:
	{{"{law}": 3}}