anonymouscla
/

phyjudge-9B

+scheme: subq_hint
+description: |-
+  JSON-only per-task prompts with observable sub-questions/checklists. The
+  subq+human setting uses sub-question prompts as input and human scores as
+  training targets.
+sub_questions:
+  source: static
+  answer_format: hint
+system_prompt: You are a strict video evaluation model.
+general_keys:
+- SA
+- PTV
+- persistence
+eval_prompts:
+  SA: |-
+    Evaluate Prompt Alignment (SA).
+    Caption:
+    "{prompt}"
+    The video was generated using a text+image-to-video (ti2v) model, conditioned on the first frame and the text prompt above.
+    Sub-questions to consider in your mind before scoring:
+    {questions_block}
+    Score 1-5:
+    5=fully aligned
+    4=mostly aligned with minor deviations
+    3=partially aligned with notable gaps
+    2=mostly misaligned
+    1=not aligned
+    Then output ONLY a JSON object with exactly one key: SA.
+    Example:
+    {{"SA": 3}}
+  PTV: |-
+    Evaluate Temporal Coherence (PTV).
+    Caption:
+    "{prompt}"
+    The video was generated using a text+image-to-video (ti2v) model, conditioned on the first frame and the text prompt above.
+    Sub-questions to consider in your mind before scoring:
+    {questions_block}
+    Score 1-5:
+    5=fully plausible event order
+    4=mostly plausible with minor timing issues
+    3=partially plausible
+    2=mostly implausible
+    1=completely implausible order
+    Then output ONLY a JSON object with exactly one key: PTV.
+    Example:
+    {{"PTV": 4}}
+  persistence: |-
+    Evaluate Object Persistence.
+    Caption, for context only:
+    "{prompt}"
+    The video was generated using a text+image-to-video (ti2v) model, conditioned on the first frame and the text prompt above.
+    Sub-questions to consider in your mind before scoring:
+    {questions_block}
+    Score 1-5:
+    5=fully consistent
+    4=mostly consistent with minor flicker
+    3=noticeable issues
+    2=major inconsistencies
+    1=severe disappearance or identity changes
+    Then output ONLY a JSON object with exactly one key: persistence.
+    Example:
+    {{"persistence": 4}}
+physical_sub_questions: true
+physical_template: |-
+  Evaluate physical realism for one physical law: {law}.
+  Criterion:
+  {criteria}
+  Caption, for context only:
+  "{prompt}"
+  Sub-questions to consider in your mind before scoring:
+  {questions_block}
+  Judge the video itself. Do not penalize prompt mismatch unless it affects whether this physical law can be evaluated.
+  Score 1-5:
+  5=clearly correct
+  4=mostly correct with minor issues
+  3=partially correct or ambiguous
+  2=mostly incorrect
+  1=severely incorrect
+  Then output ONLY a JSON object with exactly one key: {law}.
+  Example:
+  {{"{law}": 3}}