anonymouscla commited on
Commit
55aca10
·
verified ·
1 Parent(s): c59fc70

sync local anonymous/model/ contents

Browse files
Files changed (1) hide show
  1. subq+human.yaml +106 -0
subq+human.yaml ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ scheme: subq_hint
2
+ description: |-
3
+ JSON-only per-task prompts with observable sub-questions/checklists. The
4
+ subq+human setting uses sub-question prompts as input and human scores as
5
+ training targets.
6
+ sub_questions:
7
+ source: static
8
+ answer_format: hint
9
+ system_prompt: You are a strict video evaluation model.
10
+ general_keys:
11
+ - SA
12
+ - PTV
13
+ - persistence
14
+ eval_prompts:
15
+ SA: |-
16
+ Evaluate Prompt Alignment (SA).
17
+
18
+ Caption:
19
+ "{prompt}"
20
+
21
+ The video was generated using a text+image-to-video (ti2v) model, conditioned on the first frame and the text prompt above.
22
+
23
+ Sub-questions to consider in your mind before scoring:
24
+ {questions_block}
25
+
26
+ Score 1-5:
27
+ 5=fully aligned
28
+ 4=mostly aligned with minor deviations
29
+ 3=partially aligned with notable gaps
30
+ 2=mostly misaligned
31
+ 1=not aligned
32
+
33
+ Then output ONLY a JSON object with exactly one key: SA.
34
+
35
+ Example:
36
+ {{"SA": 3}}
37
+ PTV: |-
38
+ Evaluate Temporal Coherence (PTV).
39
+
40
+ Caption:
41
+ "{prompt}"
42
+
43
+ The video was generated using a text+image-to-video (ti2v) model, conditioned on the first frame and the text prompt above.
44
+
45
+ Sub-questions to consider in your mind before scoring:
46
+ {questions_block}
47
+
48
+ Score 1-5:
49
+ 5=fully plausible event order
50
+ 4=mostly plausible with minor timing issues
51
+ 3=partially plausible
52
+ 2=mostly implausible
53
+ 1=completely implausible order
54
+
55
+ Then output ONLY a JSON object with exactly one key: PTV.
56
+
57
+ Example:
58
+ {{"PTV": 4}}
59
+ persistence: |-
60
+ Evaluate Object Persistence.
61
+
62
+ Caption, for context only:
63
+ "{prompt}"
64
+
65
+ The video was generated using a text+image-to-video (ti2v) model, conditioned on the first frame and the text prompt above.
66
+
67
+ Sub-questions to consider in your mind before scoring:
68
+ {questions_block}
69
+
70
+ Score 1-5:
71
+ 5=fully consistent
72
+ 4=mostly consistent with minor flicker
73
+ 3=noticeable issues
74
+ 2=major inconsistencies
75
+ 1=severe disappearance or identity changes
76
+
77
+ Then output ONLY a JSON object with exactly one key: persistence.
78
+
79
+ Example:
80
+ {{"persistence": 4}}
81
+ physical_sub_questions: true
82
+ physical_template: |-
83
+ Evaluate physical realism for one physical law: {law}.
84
+
85
+ Criterion:
86
+ {criteria}
87
+
88
+ Caption, for context only:
89
+ "{prompt}"
90
+
91
+ Sub-questions to consider in your mind before scoring:
92
+ {questions_block}
93
+
94
+ Judge the video itself. Do not penalize prompt mismatch unless it affects whether this physical law can be evaluated.
95
+
96
+ Score 1-5:
97
+ 5=clearly correct
98
+ 4=mostly correct with minor issues
99
+ 3=partially correct or ambiguous
100
+ 2=mostly incorrect
101
+ 1=severely incorrect
102
+
103
+ Then output ONLY a JSON object with exactly one key: {law}.
104
+
105
+ Example:
106
+ {{"{law}": 3}}