Spaces:
Sleeping
Sleeping
add side-by-side baseline-vs-trained demo
Browse files- docs/side_by_side.html +146 -0
docs/side_by_side.html
ADDED
|
@@ -0,0 +1,146 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html><head>
|
| 3 |
+
<meta charset="utf-8">
|
| 4 |
+
<title>Subtext Arena — baseline vs trained, hand-picked clips</title>
|
| 5 |
+
<style>
|
| 6 |
+
body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
|
| 7 |
+
max-width: 1200px; margin: 40px auto; padding: 0 20px; color: #222; }
|
| 8 |
+
h1 { font-size: 24px; }
|
| 9 |
+
.clip { border: 1px solid #ddd; border-radius: 8px; padding: 16px;
|
| 10 |
+
margin-bottom: 24px; background: #fafafa; }
|
| 11 |
+
.clip h2 { font-size: 18px; margin-top: 0; }
|
| 12 |
+
.gold-sarcastic { color: #b3274d; }
|
| 13 |
+
.gold-sincere { color: #1d7a4a; }
|
| 14 |
+
.columns { display: grid; grid-template-columns: 1fr 1fr; gap: 16px; }
|
| 15 |
+
.col { padding: 12px; border-radius: 6px; }
|
| 16 |
+
.baseline { background: #fff5f5; border: 1px solid #f8c4c4; }
|
| 17 |
+
.trained { background: #effaf3; border: 1px solid #b6e2c1; }
|
| 18 |
+
.col h3 { margin-top: 0; font-size: 14px; text-transform: uppercase;
|
| 19 |
+
letter-spacing: 0.05em; color: #666; }
|
| 20 |
+
.verdict-correct { color: #1d7a4a; font-weight: bold; }
|
| 21 |
+
.verdict-wrong { color: #b3274d; font-weight: bold; }
|
| 22 |
+
pre { white-space: pre-wrap; word-wrap: break-word; font-size: 13px;
|
| 23 |
+
line-height: 1.4; background: white; padding: 8px; border-radius: 4px;
|
| 24 |
+
border: 1px solid #eee; }
|
| 25 |
+
.transcript { font-style: italic; color: #555; margin-bottom: 12px; }
|
| 26 |
+
</style>
|
| 27 |
+
</head><body>
|
| 28 |
+
<h1>Subtext Arena — baseline vs trained</h1>
|
| 29 |
+
<p>Same prompt fed to the base Qwen2.5-3B-Instruct (left) and to the GRPO-trained
|
| 30 |
+
checkpoint (right). Each shows the model's reasoning trace and final answer.</p>
|
| 31 |
+
<div class="clip">
|
| 32 |
+
<h2>Clip 1_8136 — speaker: PERSON, gold: <span class="gold-sarcastic">sarcastic</span></h2>
|
| 33 |
+
<div class="transcript">"Yeah, I'm really looking forward to it. In fact, there's no place I would rather be than here."</div>
|
| 34 |
+
<div class="columns">
|
| 35 |
+
<div class="col baseline">
|
| 36 |
+
<h3>Baseline (no training)</h3>
|
| 37 |
+
<p>predicted: <span class="verdict-wrong">sincere</span> (conf 0.90)</p>
|
| 38 |
+
<pre><think>
|
| 39 |
+
The target utterance "Yeah, I'm really looking forward to it. In fact, there's no place I would rather be than here." contains positive words and a clear indication of anticipation ("looking forward to it"). However, the prosody features suggest a flat affect with no internal pauses and a pitch contour that is mostly flat with only minor fluctuations. This combination of prosody and content is more likely to indicate sincerity rather than sarcasm. Additionally, the speaker is agreeing with Raj about the loneliness of others on Valentine's Day, which aligns with the positive sentiment expressed in the target utterance.
|
| 40 |
+
</think>
|
| 41 |
+
<final>{"label":"sincere","confidence":0.9}</final></pre>
|
| 42 |
+
</div>
|
| 43 |
+
<div class="col trained">
|
| 44 |
+
<h3>Trained checkpoint</h3>
|
| 45 |
+
<p>predicted: <span class="verdict-wrong">sincere</span> (conf 0.90)</p>
|
| 46 |
+
<pre><think>
|
| 47 |
+
The target utterance "Yeah, I'm really looking forward to it. In fact, there's no place I would rather be than here." contains positive words like "looking forward" and "no place I would rather be." However, the prosody does not support a sarcastic tone. The pitch contour shows a flat affect with little variation, which is consistent with sincerity. Additionally, there are no pauses or emphasis that might indicate sarcasm. The speaker seems genuinely enthusiastic about the event.
|
| 48 |
+
</think>
|
| 49 |
+
<final>{"label":"sincere","confidence":0.9}</final></pre>
|
| 50 |
+
</div>
|
| 51 |
+
</div>
|
| 52 |
+
</div>
|
| 53 |
+
|
| 54 |
+
<div class="clip">
|
| 55 |
+
<h2>Clip 1_5581 — speaker: SHELDON, gold: <span class="gold-sincere">sincere</span></h2>
|
| 56 |
+
<div class="transcript">"I gargled with tequila and may have swallowed a teeny bit."</div>
|
| 57 |
+
<div class="columns">
|
| 58 |
+
<div class="col baseline">
|
| 59 |
+
<h3>Baseline (no training)</h3>
|
| 60 |
+
<p>predicted: <span class="verdict-correct">sincere</span> (conf 0.90)</p>
|
| 61 |
+
<pre><think>
|
| 62 |
+
The prosody analysis shows a flat affect with no internal pauses, which is consistent with a sincere delivery. Sheldon's tone does not rise or dip significantly, and there are no pauses that might indicate he is holding back something. His statement about gargling with tequila seems to be meant seriously, likely as a joke rather than sarcasm. The prosody does not suggest any hidden meaning or irony.
|
| 63 |
+
</think>
|
| 64 |
+
<final>{"label":"sincere","confidence":0.9}</final></pre>
|
| 65 |
+
</div>
|
| 66 |
+
<div class="col trained">
|
| 67 |
+
<h3>Trained checkpoint</h3>
|
| 68 |
+
<p>predicted: <span class="verdict-correct">sincere</span> (conf 0.90)</p>
|
| 69 |
+
<pre><think>
|
| 70 |
+
The prosody analysis shows that Sheldon's speech is flat and lacks any significant pitch variation or emphasis. He does not pause or emphasize any particular word, which suggests sincerity rather than sarcasm. Additionally, his tone is neutral, without any hints of mockery or irony. The phrase "I gargled with tequila and may have swallowed a teenie bit" is a straightforward statement about his alcohol consumption, without any underlying meaning. Howard's previous comments about engagement and parental protection suggest a serious tone, but Sheldon's response does not contradict or undermine these sentiments. Instead, it seems to be a simple factual declaration.
|
| 71 |
+
</think>
|
| 72 |
+
<final>{"label":"sincere","confidence":0.9}</final></pre>
|
| 73 |
+
</div>
|
| 74 |
+
</div>
|
| 75 |
+
</div>
|
| 76 |
+
|
| 77 |
+
<div class="clip">
|
| 78 |
+
<h2>Clip 1_2616 — speaker: LEONARD, gold: <span class="gold-sarcastic">sarcastic</span></h2>
|
| 79 |
+
<div class="transcript">"Sure, maybe a tableau of me trying to pummel you to death."</div>
|
| 80 |
+
<div class="columns">
|
| 81 |
+
<div class="col baseline">
|
| 82 |
+
<h3>Baseline (no training)</h3>
|
| 83 |
+
<p>predicted: <span class="verdict-wrong">sincere</span> (conf 0.90)</p>
|
| 84 |
+
<pre><think>
|
| 85 |
+
The prosody analysis shows that Leonard's pitch is relatively flat with minimal variability, and there are no internal pauses or emphasis on certain words. The tone doesn't seem to match the potentially violent implication of his words. Additionally, the prosodic features indicate low energy and no significant pitch changes, which suggests a lack of emotional intensity. Given these factors, it seems unlikely that Leonard is being sarcastic about a hypothetical violent encounter with Sheldon.
|
| 86 |
+
</think>
|
| 87 |
+
<final>{"label":"sincere","confidence":0.9}</final></pre>
|
| 88 |
+
</div>
|
| 89 |
+
<div class="col trained">
|
| 90 |
+
<h3>Trained checkpoint</h3>
|
| 91 |
+
<p>predicted: <span class="verdict-wrong">sincere</span> (conf 0.90)</p>
|
| 92 |
+
<pre><think>
|
| 93 |
+
Leonard's statement "Sure, maybe a tableau of me trying to pummel you to death" is a clear contradiction to Sheldon's previous statements about being a theoretical physicist and being famous. The prosody doesn't provide any strong indicators of sarcasm; Leonard speaks with a normal tone, no internal pauses, and uses neutral content. There are no pitch variations that would suggest he is mocking Sheldon's situation. Instead, his statement seems sincere, as it directly contradicts Sheldon's boastful claims.
|
| 94 |
+
</think>
|
| 95 |
+
<final>{"label":"sincere","confidence":0.9}</final></pre>
|
| 96 |
+
</div>
|
| 97 |
+
</div>
|
| 98 |
+
</div>
|
| 99 |
+
|
| 100 |
+
<div class="clip">
|
| 101 |
+
<h2>Clip 1_3660 — speaker: PENNY, gold: <span class="gold-sarcastic">sarcastic</span></h2>
|
| 102 |
+
<div class="transcript">"There a new girlfriend in there? 'Cause you might need one."</div>
|
| 103 |
+
<div class="columns">
|
| 104 |
+
<div class="col baseline">
|
| 105 |
+
<h3>Baseline (no training)</h3>
|
| 106 |
+
<p>predicted: <span class="verdict-wrong">sincere</span> (conf 0.90)</p>
|
| 107 |
+
<pre><think>
|
| 108 |
+
The tone of Penny's line seems flat and matter-of-fact, with no pitch variation or emphasis that would suggest sarcasm. Additionally, the prosody shows a steady pitch without any rise, which is more indicative of sincerity rather than sarcasm. The use of "new girlfriend" is straightforward and doesn't imply any hidden meaning. The context also supports a sincere statement about Leonard needing a new girlfriend due to their recent sexual activity.
|
| 109 |
+
</think>
|
| 110 |
+
<final>{"label":"sincere","confidence":0.9}</final></pre>
|
| 111 |
+
</div>
|
| 112 |
+
<div class="col trained">
|
| 113 |
+
<h3>Trained checkpoint</h3>
|
| 114 |
+
<p>predicted: <span class="verdict-correct">sarcastic</span> (conf 0.90)</p>
|
| 115 |
+
<pre><think>
|
| 116 |
+
The prosody features indicate a high pitch variability with a rising pitch contour, which suggests sarcasm. Additionally, there are no pauses or breaks in the speech that would normally be expected if the speaker was being sincere. The tone is mocking and questioning, implying that the other person might need a new girlfriend. This is a clear example of sarcastic speech.
|
| 117 |
+
</think>
|
| 118 |
+
<final>{"label":"sarcastic","confidence":0.9}</final></pre>
|
| 119 |
+
</div>
|
| 120 |
+
</div>
|
| 121 |
+
</div>
|
| 122 |
+
|
| 123 |
+
<div class="clip">
|
| 124 |
+
<h2>Clip 1_12331 — speaker: PENNY, gold: <span class="gold-sincere">sincere</span></h2>
|
| 125 |
+
<div class="transcript">"Wow, there's a Denny's in Vegas you can actually get married in."</div>
|
| 126 |
+
<div class="columns">
|
| 127 |
+
<div class="col baseline">
|
| 128 |
+
<h3>Baseline (no training)</h3>
|
| 129 |
+
<p>predicted: <span class="verdict-correct">sincere</span> (conf 0.90)</p>
|
| 130 |
+
<pre><think>
|
| 131 |
+
The prosody analysis shows that Penny's tone is quite flat, with minimal pitch variation. There are no emphatic pauses or rising intonation that would suggest sarcasm. The content itself could be interpreted as sincere if taken literally, but the lack of any sarcastic elements makes it more likely to be sincere. Additionally, the context of her statement about a wedding location in Vegas suggests she might be joking, which aligns with a sincere tone rather than a sarcastic one.
|
| 132 |
+
</think>
|
| 133 |
+
<final>{"label":"sincere","confidence":0.9}</final></pre>
|
| 134 |
+
</div>
|
| 135 |
+
<div class="col trained">
|
| 136 |
+
<h3>Trained checkpoint</h3>
|
| 137 |
+
<p>predicted: <span class="verdict-correct">sincere</span> (conf 0.90)</p>
|
| 138 |
+
<pre><think>
|
| 139 |
+
The prosody features show a flat affect and no significant pitch variation throughout the utterance. There are no internal pauses or emphatic pauses, which would suggest sarcasm. The statement "Wow, there's a Denny's in Vegas you can actually get married in" is a common joke about the location of wedding venues. However, the flat tone of the speech and lack of any emotional inflection makes it unlikely that Penny is being sarcastic here. Instead, she seems to be making a light-hearted comment without any underlying sarcasm.
|
| 140 |
+
</think>
|
| 141 |
+
<final>{"label":"sincere","confidence":0.9}</final></pre>
|
| 142 |
+
</div>
|
| 143 |
+
</div>
|
| 144 |
+
</div>
|
| 145 |
+
|
| 146 |
+
</body></html>
|