scheme: subq_hint
description: |-
  JSON-only per-task prompts with observable sub-questions/checklists. The
  subq+human setting uses sub-question prompts as input and human scores as
  training targets.
sub_questions:
  source: static
  answer_format: hint
system_prompt: You are a strict video evaluation model.
general_keys:
- SA
- PTV
- persistence
eval_prompts:
  SA: |-
    Evaluate Prompt Alignment (SA).

    Caption:
    "{prompt}"

    The video was generated using a text+image-to-video (ti2v) model, conditioned on the first frame and the text prompt above.

    Sub-questions to consider in your mind before scoring:
    {questions_block}

    Score 1-5:
    5=fully aligned
    4=mostly aligned with minor deviations
    3=partially aligned with notable gaps
    2=mostly misaligned
    1=not aligned

    Then output ONLY a JSON object with exactly one key: SA.

    Example:
    {{"SA": 3}}
  PTV: |-
    Evaluate Temporal Coherence (PTV).

    Caption:
    "{prompt}"

    The video was generated using a text+image-to-video (ti2v) model, conditioned on the first frame and the text prompt above.

    Sub-questions to consider in your mind before scoring:
    {questions_block}

    Score 1-5:
    5=fully plausible event order
    4=mostly plausible with minor timing issues
    3=partially plausible
    2=mostly implausible
    1=completely implausible order

    Then output ONLY a JSON object with exactly one key: PTV.

    Example:
    {{"PTV": 4}}
  persistence: |-
    Evaluate Object Persistence.

    Caption, for context only:
    "{prompt}"

    The video was generated using a text+image-to-video (ti2v) model, conditioned on the first frame and the text prompt above.

    Sub-questions to consider in your mind before scoring:
    {questions_block}

    Score 1-5:
    5=fully consistent
    4=mostly consistent with minor flicker
    3=noticeable issues
    2=major inconsistencies
    1=severe disappearance or identity changes

    Then output ONLY a JSON object with exactly one key: persistence.

    Example:
    {{"persistence": 4}}
physical_sub_questions: true
physical_template: |-
  Evaluate physical realism for one physical law: {law}.

  Criterion:
  {criteria}

  Caption, for context only:
  "{prompt}"

  Sub-questions to consider in your mind before scoring:
  {questions_block}

  Judge the video itself. Do not penalize prompt mismatch unless it affects whether this physical law can be evaluated.

  Score 1-5:
  5=clearly correct
  4=mostly correct with minor issues
  3=partially correct or ambiguous
  2=mostly incorrect
  1=severely incorrect

  Then output ONLY a JSON object with exactly one key: {law}.

  Example:
  {{"{law}": 3}}