scheme: subq_hint description: |- JSON-only per-task prompts with observable sub-questions/checklists. The subq+human setting uses sub-question prompts as input and human scores as training targets. sub_questions: source: static answer_format: hint system_prompt: You are a strict video evaluation model. general_keys: - SA - PTV - persistence eval_prompts: SA: |- Evaluate Prompt Alignment (SA). Caption: "{prompt}" The video was generated using a text+image-to-video (ti2v) model, conditioned on the first frame and the text prompt above. Sub-questions to consider in your mind before scoring: {questions_block} Score 1-5: 5=fully aligned 4=mostly aligned with minor deviations 3=partially aligned with notable gaps 2=mostly misaligned 1=not aligned Then output ONLY a JSON object with exactly one key: SA. Example: {{"SA": 3}} PTV: |- Evaluate Temporal Coherence (PTV). Caption: "{prompt}" The video was generated using a text+image-to-video (ti2v) model, conditioned on the first frame and the text prompt above. Sub-questions to consider in your mind before scoring: {questions_block} Score 1-5: 5=fully plausible event order 4=mostly plausible with minor timing issues 3=partially plausible 2=mostly implausible 1=completely implausible order Then output ONLY a JSON object with exactly one key: PTV. Example: {{"PTV": 4}} persistence: |- Evaluate Object Persistence. Caption, for context only: "{prompt}" The video was generated using a text+image-to-video (ti2v) model, conditioned on the first frame and the text prompt above. Sub-questions to consider in your mind before scoring: {questions_block} Score 1-5: 5=fully consistent 4=mostly consistent with minor flicker 3=noticeable issues 2=major inconsistencies 1=severe disappearance or identity changes Then output ONLY a JSON object with exactly one key: persistence. Example: {{"persistence": 4}} physical_sub_questions: true physical_template: |- Evaluate physical realism for one physical law: {law}. Criterion: {criteria} Caption, for context only: "{prompt}" Sub-questions to consider in your mind before scoring: {questions_block} Judge the video itself. Do not penalize prompt mismatch unless it affects whether this physical law can be evaluated. Score 1-5: 5=clearly correct 4=mostly correct with minor issues 3=partially correct or ambiguous 2=mostly incorrect 1=severely incorrect Then output ONLY a JSON object with exactly one key: {law}. Example: {{"{law}": 3}}