Lzy01241010 Claude Opus 4.7 commited on
Commit
9b0c194
ยท
1 Parent(s): 770e96d

ui: add "See it in action" walkthrough (real M2W2 trace, collapsed by default)

Browse files

A folded gr.Accordion below the demo, no impact on the default page.
When expanded, walks through one real Mind2Web 2 task that QUEST-35B
ran end-to-end, showcasing the five stages of a research run:

Question -> think+tool x N -> context management -> think+tool x N -> answer

All numbers (62 turns, 80,824 tokens, 11 trusted facts, 122->2 message
compression) and the example trusted fact come verbatim from the source
trajectories.jsonl + condenser_call_1_*.json in zilus inference dump.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (1) hide show
  1. app.py +285 -0
app.py CHANGED
@@ -1121,6 +1121,151 @@ gradio-app > div {
1121
  text-transform: none;
1122
  line-height: 1.45;
1123
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1124
  .memory-help {
1125
  color: var(--q-muted);
1126
  font-size: 12.5px;
@@ -2448,6 +2593,139 @@ EXAMPLES = [
2448
  ]
2449
 
2450
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2451
  def _example_label(ex: Dict[str, str]) -> str:
2452
  return f"{ex['icon']} {ex['category']} โ€” {ex['text']}"
2453
 
@@ -2588,6 +2866,13 @@ with gr.Blocks(
2588
  elem_id="quest-temperature",
2589
  )
2590
 
 
 
 
 
 
 
 
2591
  gr.HTML(
2592
  """
2593
  <footer class="quest-footer">
 
1121
  text-transform: none;
1122
  line-height: 1.45;
1123
  }
1124
+
1125
+ /* โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1126
+ Walkthrough: a curated trace from one real Mind2Web 2 task. Shown inside
1127
+ a collapsed Accordion below the main demo so the default view is
1128
+ unchanged; expanded view illustrates the 5 stages of a research run:
1129
+ Question โ†’ think+toolร—N โ†’ context-management โ†’ think+toolร—N โ†’ answer.
1130
+ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ */
1131
+ .walkthrough { font-size: 0.9rem; color: var(--q-text); }
1132
+ .walkthrough .wt-intro { color: var(--q-muted); margin: 0 0 14px; line-height: 1.55; }
1133
+ .walkthrough .wt-block {
1134
+ background: var(--q-paper);
1135
+ border: 1px solid var(--q-line);
1136
+ border-radius: 12px;
1137
+ padding: 14px 16px;
1138
+ margin: 0 0 10px;
1139
+ }
1140
+ .walkthrough .wt-phase-tag {
1141
+ font-size: 0.72rem;
1142
+ font-weight: 800;
1143
+ letter-spacing: 0.12em;
1144
+ text-transform: uppercase;
1145
+ color: var(--q-accent);
1146
+ margin: 0 0 10px;
1147
+ display: flex;
1148
+ align-items: baseline;
1149
+ gap: 10px;
1150
+ flex-wrap: wrap;
1151
+ }
1152
+ .walkthrough .wt-rounds {
1153
+ font-size: 0.68rem;
1154
+ font-weight: 600;
1155
+ letter-spacing: 0.04em;
1156
+ text-transform: none;
1157
+ color: var(--q-muted);
1158
+ }
1159
+ .walkthrough .wt-question blockquote {
1160
+ margin: 0;
1161
+ font-family: "Source Serif 4", "Source Serif Pro", ui-serif, Georgia, serif;
1162
+ font-size: 0.92rem;
1163
+ line-height: 1.55;
1164
+ color: var(--q-text);
1165
+ border-left: 3px solid var(--q-accent-line, var(--q-accent));
1166
+ padding: 2px 0 2px 12px;
1167
+ }
1168
+ .walkthrough .wt-stats {
1169
+ font-size: 0.82rem;
1170
+ color: var(--q-muted);
1171
+ line-height: 1.55;
1172
+ margin: 4px 0;
1173
+ }
1174
+ .walkthrough .wt-stats code { padding: 1px 5px; background: var(--q-surface-alt); border-radius: 4px; font-size: 0.8em; }
1175
+ .walkthrough .wt-turn {
1176
+ background: var(--q-surface-alt);
1177
+ border-radius: 8px;
1178
+ padding: 10px 12px;
1179
+ margin: 10px 0 0;
1180
+ }
1181
+ .walkthrough .wt-turn-label {
1182
+ display: inline-block;
1183
+ font-size: 0.66rem;
1184
+ font-weight: 800;
1185
+ letter-spacing: 0.1em;
1186
+ text-transform: uppercase;
1187
+ color: var(--q-muted);
1188
+ margin-bottom: 8px;
1189
+ }
1190
+ .walkthrough .wt-step { margin: 6px 0; display: flex; gap: 10px; align-items: flex-start; flex-wrap: wrap; }
1191
+ .walkthrough .wt-step-tag {
1192
+ font-size: 0.72rem;
1193
+ font-weight: 700;
1194
+ color: var(--q-accent);
1195
+ flex: 0 0 76px;
1196
+ padding-top: 2px;
1197
+ }
1198
+ .walkthrough .wt-step-body { flex: 1 1 0; min-width: 0; font-size: 0.85rem; line-height: 1.55; }
1199
+ .walkthrough .wt-step pre {
1200
+ margin: 0;
1201
+ padding: 8px 10px;
1202
+ background: #0D1117;
1203
+ color: #E6EDF3;
1204
+ border-radius: 6px;
1205
+ font-size: 0.76rem;
1206
+ line-height: 1.5;
1207
+ overflow-x: auto;
1208
+ white-space: pre;
1209
+ }
1210
+ .walkthrough .wt-ellipsis {
1211
+ color: var(--q-muted);
1212
+ font-style: italic;
1213
+ font-size: 0.8rem;
1214
+ padding: 8px 4px 0 8px;
1215
+ border-left: 2px dotted var(--q-line-strong);
1216
+ margin-left: 8px;
1217
+ }
1218
+ .walkthrough .wt-condenser {
1219
+ border-color: var(--q-accent-line, var(--q-accent));
1220
+ background: var(--q-accent-soft);
1221
+ }
1222
+ .walkthrough .wt-condenser details { margin: 8px 0 0; }
1223
+ .walkthrough .wt-condenser summary {
1224
+ cursor: pointer;
1225
+ color: var(--q-accent);
1226
+ font-size: 0.78rem;
1227
+ font-weight: 600;
1228
+ user-select: none;
1229
+ }
1230
+ .walkthrough .wt-condenser details pre {
1231
+ margin: 8px 0 0;
1232
+ padding: 10px;
1233
+ background: #0D1117;
1234
+ color: #E6EDF3;
1235
+ border-radius: 6px;
1236
+ font-size: 0.75rem;
1237
+ line-height: 1.5;
1238
+ overflow-x: auto;
1239
+ }
1240
+ .walkthrough .wt-effect {
1241
+ margin-top: 10px;
1242
+ padding-top: 10px;
1243
+ border-top: 1px dashed var(--q-line-strong);
1244
+ font-size: 0.82rem;
1245
+ color: var(--q-text);
1246
+ }
1247
+ .walkthrough .wt-table-wrap { overflow-x: auto; margin-top: 10px; }
1248
+ .walkthrough .wt-table { width: 100%; border-collapse: collapse; font-size: 0.82rem; }
1249
+ .walkthrough .wt-table th, .walkthrough .wt-table td {
1250
+ padding: 8px 10px;
1251
+ border-bottom: 1px solid var(--q-line);
1252
+ text-align: left;
1253
+ vertical-align: top;
1254
+ }
1255
+ .walkthrough .wt-table th {
1256
+ background: var(--q-surface-alt);
1257
+ font-weight: 700;
1258
+ font-size: 0.7rem;
1259
+ text-transform: uppercase;
1260
+ letter-spacing: 0.06em;
1261
+ color: var(--q-muted);
1262
+ }
1263
+ .walkthrough .wt-table code { padding: 1px 5px; background: var(--q-surface-alt); border-radius: 4px; font-size: 0.78rem; }
1264
+ .walkthrough .wt-table a { color: var(--q-accent); text-decoration: none; }
1265
+ .walkthrough .wt-table a:hover { text-decoration: underline; }
1266
+ @media (max-width: 600px) {
1267
+ .walkthrough .wt-step-tag { flex-basis: 100%; }
1268
+ }
1269
  .memory-help {
1270
  color: var(--q-muted);
1271
  font-size: 12.5px;
 
2593
  ]
2594
 
2595
 
2596
+ # Curated walkthrough of one real Mind2Web 2 task that QUEST-35B ran end-to-end.
2597
+ # All numbers, sample turns, the condenser trigger and the trusted-facts table
2598
+ # are taken verbatim from inference/.../task_idx_95/iter1 (trajectories.jsonl +
2599
+ # condenser_call_1_*.json). Edited only for length.
2600
+ WALKTHROUGH_HTML = """
2601
+ <div class="walkthrough">
2602
+ <p class="wt-intro">
2603
+ The trace below is from a single Mind2Web 2 task that QUEST-35B solved end-to-end.
2604
+ It shows how the agent loops through <strong>think โ†’ tool</strong>, hits its 80k-token
2605
+ context budget, condenses everything to a structured memory state, and then closes out
2606
+ with a synthesized answer.
2607
+ </p>
2608
+
2609
+ <!-- Question -->
2610
+ <div class="wt-block wt-question">
2611
+ <div class="wt-phase-tag">Question</div>
2612
+ <blockquote>
2613
+ I am interested in Retrieval-Augmented Generation (RAG) and would like to explore
2614
+ popular and easy-to-use repositories on GitHub. Please identify the 10 most-starred
2615
+ repositories listed under the GitHub topic <em>"retrieval-augmented-generation"</em>.
2616
+ For each repository, please provide its name, a direct link to its GitHub page, and
2617
+ indicate whether its README explicitly states that it can be installed using a single
2618
+ <code>pip install &lt;package-name&gt;</code> command (without additional steps). If it
2619
+ does, please also provide a link to its PyPI page; if not, explicitly note that this
2620
+ installation method isn't mentioned.
2621
+ </blockquote>
2622
+ </div>
2623
+
2624
+ <!-- Phase 1 -->
2625
+ <div class="wt-block">
2626
+ <div class="wt-phase-tag">Phase 1 ยท Exploration <span class="wt-rounds">rounds 1 โ†’ 62</span></div>
2627
+ <div class="wt-stats">
2628
+ ๐Ÿง  think + ๐Ÿ”ง tool ร— <strong>62 turns</strong> &nbsp;ยท&nbsp;
2629
+ 22 visits + 2 searches &nbsp;ยท&nbsp;
2630
+ context grew to <strong>80,824 tokens</strong> / 122 messages
2631
+ </div>
2632
+
2633
+ <div class="wt-turn">
2634
+ <div class="wt-turn-label">Sample ยท round 1</div>
2635
+ <div class="wt-step">
2636
+ <span class="wt-step-tag">๐Ÿง  Think</span>
2637
+ <div class="wt-step-body">
2638
+ "We need to produce a list of the 10 most-starred repositories under GitHub topic
2639
+ 'retrieval-augmented-generation' &hellip; standard web search may not know about
2640
+ topics; let's try several queries and see what sticks."
2641
+ </div>
2642
+ </div>
2643
+ <div class="wt-step">
2644
+ <span class="wt-step-tag">๐Ÿ”ง Tool</span>
2645
+ <pre>search([
2646
+ "retrieval-augmented-generation github topic",
2647
+ "GitHub topic 'retrieval-augmented-generation' popular repos",
2648
+ "retrieval augmented generation github repositories star count",
2649
+ "retrieval-augmented-generation python library"
2650
+ ])</pre>
2651
+ </div>
2652
+ <div class="wt-step">
2653
+ <span class="wt-step-tag">๐Ÿ“ฅ Result</span>
2654
+ <div class="wt-step-body">
2655
+ 10 hits including the topic page itself
2656
+ <code>github.com/topics/retrieval-augmented-generation</code> โ€” the agent now
2657
+ has a starting URL to visit and crawl.
2658
+ </div>
2659
+ </div>
2660
+ </div>
2661
+
2662
+ <div class="wt-ellipsis">โ‹ฎ 61 more think โ†’ tool turns: visit the topic page, then each candidate repo, fetch READMEs, check PyPI metadata one repo at a time.</div>
2663
+ </div>
2664
+
2665
+ <!-- Context Management -->
2666
+ <div class="wt-block wt-condenser">
2667
+ <div class="wt-phase-tag">๐Ÿ—œ๏ธ Context Management <span class="wt-rounds">round 62</span></div>
2668
+ <div class="wt-stats">trigger: <code>CONTEXT_THRESHOLD</code> (80,000-token budget hit) &nbsp;ยท&nbsp; condenser LLM fires</div>
2669
+ <div class="wt-stats">output state: <strong>11 trusted</strong> facts ยท 1 uncertain ยท 0 untrusted &nbsp;ยท&nbsp; 21 visited sources ยท 8 search queries (deduplicated)</div>
2670
+ <details>
2671
+ <summary>example trusted fact (1 of 11)</summary>
2672
+ <pre>{
2673
+ "id": "T1",
2674
+ "claim": "Langchain-Chatchat README includes a pip install line:
2675
+ 'pip install langchain-chatchat -U'.",
2676
+ "sources": [
2677
+ "https://github.com/chatchat-space/Langchain-Chatchat",
2678
+ "https://pypi.org/project/langchain-chatchat/"
2679
+ ],
2680
+ "reason": "README's Installation section shows the pip command;
2681
+ PyPI confirms the package."
2682
+ }</pre>
2683
+ </details>
2684
+ <div class="wt-effect">
2685
+ โ†’ history reset to <code>[system, question, state_summary]</code>
2686
+ &nbsp;ยท&nbsp; <strong>122 messages โ†’ 2</strong>
2687
+ &nbsp;ยท&nbsp; ~80k tokens of headroom reclaimed.
2688
+ </div>
2689
+ </div>
2690
+
2691
+ <!-- Phase 2 -->
2692
+ <div class="wt-block">
2693
+ <div class="wt-phase-tag">Phase 2 ยท Resolve &amp; Synthesize <span class="wt-rounds">rounds 63 โ†’ 87</span></div>
2694
+ <div class="wt-stats">
2695
+ model reads <code>prev_state</code>, sees the one <em>uncertain</em> claim
2696
+ (PageIndex's pip availability), re-visits the missing PyPI page to settle it,
2697
+ and then synthesizes the final markdown table from the 11 trusted facts.
2698
+ </div>
2699
+ </div>
2700
+
2701
+ <!-- Answer -->
2702
+ <div class="wt-block">
2703
+ <div class="wt-phase-tag">โœ๏ธ Answer <span class="wt-rounds">round 87</span></div>
2704
+ <div class="wt-stats">final structured markdown built directly from the consolidated trusted state</div>
2705
+ <div class="wt-table-wrap">
2706
+ <table class="wt-table">
2707
+ <thead>
2708
+ <tr><th>Repository</th><th>One-line <code>pip install</code>?</th><th>PyPI</th></tr>
2709
+ </thead>
2710
+ <tbody>
2711
+ <tr><td><a href="https://github.com/chatchat-space/Langchain-Chatchat" target="_blank" rel="noopener">chatchat-space/Langchain-Chatchat</a></td><td>โœ… <code>pip install langchain-chatchat -U</code></td><td><a href="https://pypi.org/project/langchain-chatchat/" target="_blank" rel="noopener">langchain-chatchat</a></td></tr>
2712
+ <tr><td><a href="https://github.com/HKUDS/LightRAG" target="_blank" rel="noopener">HKUDS/LightRAG</a></td><td>โœ… <code>pip install lightrag-hku</code></td><td><a href="https://pypi.org/project/lightrag-hku/" target="_blank" rel="noopener">lightrag-hku</a></td></tr>
2713
+ <tr><td><a href="https://github.com/stanford-oval/storm" target="_blank" rel="noopener">stanford-oval/storm</a></td><td>โœ… <code>pip install knowledge-storm</code></td><td><a href="https://pypi.org/project/knowledge-storm/" target="_blank" rel="noopener">knowledge-storm</a></td></tr>
2714
+ <tr><td><a href="https://github.com/deepset-ai/haystack" target="_blank" rel="noopener">deepset-ai/haystack</a></td><td>โœ… <code>pip install haystack-ai</code></td><td><a href="https://pypi.org/project/haystack-ai/" target="_blank" rel="noopener">haystack-ai</a></td></tr>
2715
+ <tr><td><a href="https://github.com/HKUDS/RAG-Anything" target="_blank" rel="noopener">HKUDS/RAG-Anything</a></td><td>โœ… <code>pip install raganything</code></td><td><a href="https://pypi.org/project/raganything/" target="_blank" rel="noopener">raganything</a></td></tr>
2716
+ <tr><td><a href="https://github.com/memvid/memvid" target="_blank" rel="noopener">memvid/memvid</a></td><td>โœ… <code>pip install memvid-sdk</code></td><td><a href="https://pypi.org/project/memvid-sdk/" target="_blank" rel="noopener">memvid-sdk</a></td></tr>
2717
+ <tr><td><a href="https://github.com/llmware-ai/llmware" target="_blank" rel="noopener">llmware-ai/llmware</a></td><td>โœ… <code>pip3 install llmware</code></td><td><a href="https://pypi.org/project/llmware/" target="_blank" rel="noopener">llmware</a></td></tr>
2718
+ <tr><td><a href="https://github.com/VectifyAI/PageIndex" target="_blank" rel="noopener">VectifyAI/PageIndex</a></td><td>โŒ not mentioned (README uses <code>pip install -r requirements.txt</code>)</td><td>โ€”</td></tr>
2719
+ <tr><td><a href="https://github.com/pathwaycom/llm-app" target="_blank" rel="noopener">pathwaycom/llm-app</a></td><td>โŒ not mentioned (Docker / poetry / templates)</td><td>โ€”</td></tr>
2720
+ <tr><td><a href="https://github.com/NirDiamant/RAG_Techniques" target="_blank" rel="noopener">NirDiamant/RAG_Techniques</a></td><td>โŒ not mentioned (clone + run notebooks)</td><td>โ€”</td></tr>
2721
+ </tbody>
2722
+ </table>
2723
+ </div>
2724
+ </div>
2725
+ </div>
2726
+ """
2727
+
2728
+
2729
  def _example_label(ex: Dict[str, str]) -> str:
2730
  return f"{ex['icon']} {ex['category']} โ€” {ex['text']}"
2731
 
 
2866
  elem_id="quest-temperature",
2867
  )
2868
 
2869
+ # โ”€โ”€ Walkthrough: curated trace from one real Mind2Web 2 run โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
2870
+ with gr.Accordion(
2871
+ label="See it in action โ€” a real Mind2Web 2 research run",
2872
+ open=False,
2873
+ ):
2874
+ gr.HTML(WALKTHROUGH_HTML)
2875
+
2876
  gr.HTML(
2877
  """
2878
  <footer class="quest-footer">