TomLii commited on
Commit
8e8119b
Β·
1 Parent(s): 3fd8fc1

Speed up Quest-4B research: add Serper backend and stream live progress

Browse files

Two user-perceivable wins on the Quest endpoint, which was taking 60+ s per
question on the Space and left the UI blank the whole time:

1. Wire up Google Serper as the primary search backend. When
SERPER_API_KEY (or SERPER_KEY_ID, matching the research repo's env
name) is set in Space secrets, `_run_search_single` now hits Serper
first and falls back to DuckDuckGo only if Serper errors. Serper
responds in <1 s and is not subject to the 202 Ratelimit that shared
HF Space IPs routinely trip on html.duckduckgo.com, which both cuts
latency and eliminates the "Error: 202 Ratelimit" failures users
were hitting on comparison-table queries.

2. Convert build_research_agent and run_ui into Gradio generators that
emit a live progress panel between turns: "turn N: thinking…",
"turn N: searching `...`", "got 5 hit(s) via serper", "writing final
answer". The total wall-clock time of a Quest run is unchanged but
the user now sees what the agent is doing instead of staring at an
empty Result pane for a minute.

Also: lower the default Max Turns slider from 8 to 6 (most research
queries finish in 2-4 turns; going to 8 mostly just burns budget on
dead-end branches) and update .env.example to document SERPER_API_KEY,
QUEST_MAX_NEW_TOKENS, and which of the other research-repo env vars
(JINA_API_KEYS, OpenAI keys, SUMMARY_MODEL_NAME, etc.) are NOT currently
wired into the Space starter so future deploys are not surprised that
setting them has no effect.

Regression coverage in _test_markdown_fix.py now includes: Serper being
preferred when the key is set, graceful DDG fallback when Serper errors,
graceful error when both fail, and an end-to-end mock run of the
generator verifying multiple progress yields before a final real answer.

Made-with: Cursor

Files changed (3) hide show
  1. .env.example +36 -1
  2. _test_markdown_fix.py +466 -0
  3. app.py +242 -40
.env.example CHANGED
@@ -1,4 +1,8 @@
1
- # Required: personal HF token with read access to osunlp/Quest-4B.
 
 
 
 
2
  HF_TOKEN=hf_xxx
3
 
4
  # Dedicated HF Inference Endpoint URL that serves osunlp/Quest-4B.
@@ -11,3 +15,34 @@ QUEST_ENDPOINT_MODEL=tgi
11
 
12
  # Default model preselected in the dropdown.
13
  DEFAULT_MODEL=osunlp/Quest-4B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # =============================================================================
2
+ # Required
3
+ # =============================================================================
4
+
5
+ # Personal HF token with read access to osunlp/Quest-4B.
6
  HF_TOKEN=hf_xxx
7
 
8
  # Dedicated HF Inference Endpoint URL that serves osunlp/Quest-4B.
 
15
 
16
  # Default model preselected in the dropdown.
17
  DEFAULT_MODEL=osunlp/Quest-4B
18
+
19
+ # =============================================================================
20
+ # Recommended: strongly improves latency and reliability
21
+ # =============================================================================
22
+
23
+ # Google Serper API key. When set, the `search` tool uses Serper first and only
24
+ # falls back to the DuckDuckGo HTML backend if Serper fails. Serper is ~10x
25
+ # faster than scraping DDG and is not subject to the 202 Ratelimit that hits
26
+ # shared HF Space IPs. Get one at https://serper.dev/api-key
27
+ # Either name is accepted to match the research repo's convention:
28
+ SERPER_API_KEY=
29
+ # SERPER_KEY_ID=
30
+
31
+ # Max tokens the Quest endpoint is allowed to emit per turn. 4096 gives the
32
+ # <think> block enough room; raise to 6144 for very long research reports.
33
+ QUEST_MAX_NEW_TOKENS=4096
34
+
35
+ # =============================================================================
36
+ # Optional: not currently wired into app.py (listed for reference)
37
+ # =============================================================================
38
+
39
+ # The research repo (QUEST-main/inference) uses these to plug in Jina Reader
40
+ # for HTML-to-markdown extraction and GPT for condenser/summarization, but the
41
+ # Space starter does not call either of them. Setting them here has no effect
42
+ # today; they are listed only so you know what you'd plug in for the full
43
+ # research pipeline.
44
+ # JINA_API_KEYS=
45
+ # API_KEY= # OpenAI API key
46
+ # SUMMARY_MODEL_NAME=gpt-5-mini
47
+ # MEMORY_MODEL_NAME=gpt-5-mini
48
+ # MEMORY_OPENAI_API_KEY=
_test_markdown_fix.py ADDED
@@ -0,0 +1,466 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Regression tests for the '<answer>...</answer>' placeholder bug that caused the
3
+ Space to render only a literal `...` instead of the real (often table-shaped)
4
+ final answer.
5
+
6
+ These tests are plain asserts, runnable with `python _test_markdown_fix.py`.
7
+ They import the fixed helpers directly from `app.py` without booting Gradio.
8
+ """
9
+
10
+ import os
11
+ import sys
12
+ from pathlib import Path
13
+
14
+ # Do not start the Gradio UI when importing app.py.
15
+ os.environ.setdefault("GRADIO_SERVER_PORT", "0")
16
+
17
+ HERE = Path(__file__).resolve().parent
18
+ sys.path.insert(0, str(HERE))
19
+
20
+ from unittest import mock
21
+
22
+ from app import (
23
+ extract_answer,
24
+ strip_think_blocks,
25
+ ensure_markdown_table_blank_lines,
26
+ decode_escaped_whitespace,
27
+ _is_placeholder_answer,
28
+ parse_tool_call,
29
+ )
30
+
31
+
32
+ def _check(name: str, actual, expected) -> None:
33
+ ok = actual == expected
34
+ status = "PASS" if ok else "FAIL"
35
+ print(f"[{status}] {name}")
36
+ if not ok:
37
+ print(f" expected: {expected!r}")
38
+ print(f" actual : {actual!r}")
39
+ assert ok, name
40
+
41
+
42
+ # -------------------------------------------------------------------------
43
+ # 1. The original bug: Quest-4B echoes the template literally.
44
+ # -------------------------------------------------------------------------
45
+ _check(
46
+ "echoed placeholder `<answer>...</answer>` is rejected",
47
+ extract_answer("<answer>...</answer>"),
48
+ None,
49
+ )
50
+
51
+ _check(
52
+ "echoed unicode ellipsis `<answer>…</answer>` is rejected",
53
+ extract_answer("<answer>…</answer>"),
54
+ None,
55
+ )
56
+
57
+ _check(
58
+ "whitespace-only `<answer> </answer>` is rejected",
59
+ extract_answer("<answer> </answer>"),
60
+ None,
61
+ )
62
+
63
+ _check(
64
+ "placeholder detector recognises ASCII dots",
65
+ _is_placeholder_answer("..."),
66
+ True,
67
+ )
68
+ _check(
69
+ "placeholder detector recognises unicode ellipsis",
70
+ _is_placeholder_answer("…"),
71
+ True,
72
+ )
73
+ _check(
74
+ "placeholder detector recognises interpunct",
75
+ _is_placeholder_answer("Β·"),
76
+ True,
77
+ )
78
+ _check(
79
+ "placeholder detector accepts real text",
80
+ _is_placeholder_answer("The answer is 3..."),
81
+ False,
82
+ )
83
+
84
+
85
+ # -------------------------------------------------------------------------
86
+ # 2. A real Markdown table inside <answer> survives round-trip.
87
+ # -------------------------------------------------------------------------
88
+ table_body = "| Color | Hex |\n|---|---|\n| Red | #ff0000 |\n| Green | #00ff00 |"
89
+ _check(
90
+ "Markdown table inside <answer> is returned intact",
91
+ extract_answer(f"<answer>\n{table_body}\n</answer>"),
92
+ table_body,
93
+ )
94
+
95
+
96
+ # -------------------------------------------------------------------------
97
+ # 3. <think> block is stripped before extracting the answer.
98
+ # -------------------------------------------------------------------------
99
+ _check(
100
+ "<think>...</think> is removed from answer content",
101
+ extract_answer("<think>reasoning goes here</think><answer>real answer</answer>"),
102
+ "real answer",
103
+ )
104
+
105
+ _check(
106
+ "multi-line <think> is removed",
107
+ extract_answer(
108
+ "<think>line 1\nline 2\nline 3</think>\n<answer>the truth</answer>"
109
+ ),
110
+ "the truth",
111
+ )
112
+
113
+ _check(
114
+ "strip_think_blocks leaves non-think content alone",
115
+ strip_think_blocks("plain text"),
116
+ "plain text",
117
+ )
118
+
119
+
120
+ # -------------------------------------------------------------------------
121
+ # 4. Truncated output: <answer> opened, never closed.
122
+ # -------------------------------------------------------------------------
123
+ _check(
124
+ "truncated `<answer>` with real text is still extracted",
125
+ extract_answer("<answer>Here is the partial answer"),
126
+ "Here is the partial answer",
127
+ )
128
+
129
+ _check(
130
+ "truncated `<answer>` that is just dots is still rejected",
131
+ extract_answer("<answer>..."),
132
+ None,
133
+ )
134
+
135
+
136
+ # -------------------------------------------------------------------------
137
+ # 5. ensure_markdown_table_blank_lines inserts the required break.
138
+ # -------------------------------------------------------------------------
139
+ glued = "Here is the comparison:\n| Col | Val |\n|---|---|\n| a | b |"
140
+ fixed = ensure_markdown_table_blank_lines(glued)
141
+ assert "\n\n| Col | Val |" in fixed, f"blank line was not inserted: {fixed!r}"
142
+ print("[PASS] ensure_markdown_table_blank_lines inserts break before table")
143
+
144
+ already_ok = "Here is the comparison:\n\n| Col | Val |\n|---|---|\n| a | b |"
145
+ _check(
146
+ "ensure_markdown_table_blank_lines is a no-op when blank line already exists",
147
+ ensure_markdown_table_blank_lines(already_ok),
148
+ already_ok,
149
+ )
150
+
151
+ table_at_start = "| Col | Val |\n|---|---|\n| a | b |"
152
+ _check(
153
+ "ensure_markdown_table_blank_lines leaves a table at the very start alone",
154
+ ensure_markdown_table_blank_lines(table_at_start),
155
+ table_at_start,
156
+ )
157
+
158
+
159
+ # -------------------------------------------------------------------------
160
+ # 6. parse_tool_call still works after the <think>-stripping refactor.
161
+ # -------------------------------------------------------------------------
162
+ tool_out = (
163
+ "<think>I should search for this</think>\n"
164
+ '<tool_call>{"name": "search", "arguments": {"query": ["hello"]}}</tool_call>'
165
+ )
166
+ name, args, err = parse_tool_call(tool_out)
167
+ assert err is None, f"unexpected parse error: {err}"
168
+ _check("parse_tool_call extracts name", name, "search")
169
+ _check("parse_tool_call extracts arguments", args, {"query": ["hello"]})
170
+
171
+
172
+ # -------------------------------------------------------------------------
173
+ # 7. Escaped-whitespace decoding (the 2nd reported bug):
174
+ # the endpoint returned `\n` as literal 2-char sequences, so the
175
+ # pipe table rendered as a one-line sentence of `| a | b |\n...`.
176
+ # -------------------------------------------------------------------------
177
+ user_reported_payload = (
178
+ "\\n| Color | Hex |\\n|---|---|\\n| Red | #FF0000 |"
179
+ "\\n| Green | #00FF00 |\\n| Blue | #0000FF |\\n"
180
+ )
181
+ decoded_user_payload = decode_escaped_whitespace(user_reported_payload)
182
+ assert "\n| Color | Hex |" in decoded_user_payload, decoded_user_payload
183
+ assert "\\n" not in decoded_user_payload, decoded_user_payload
184
+ print("[PASS] decode_escaped_whitespace converts the user-reported payload")
185
+
186
+ # Extract from a full <answer> block whose content is escape-encoded.
187
+ escape_encoded_answer = f"<answer>{user_reported_payload}</answer>"
188
+ extracted_escape = extract_answer(escape_encoded_answer)
189
+ assert extracted_escape is not None
190
+ assert "| Red | #FF0000 |" in extracted_escape
191
+ assert "\\n" not in extracted_escape
192
+ # And the separator must be on its own line so GFM recognises the table.
193
+ assert "|---|---|" in extracted_escape
194
+ print("[PASS] extract_answer decodes escape-encoded <answer> into real newlines")
195
+
196
+ # Heuristic: do NOT decode when escapes are rare (a real code example).
197
+ code_example = 'Some prose with a single \\n in a code example.'
198
+ _check(
199
+ "decode_escaped_whitespace leaves lightly-escaped prose alone",
200
+ decode_escaped_whitespace(code_example),
201
+ code_example,
202
+ )
203
+
204
+ # Heuristic: do NOT decode when real newlines already dominate.
205
+ mostly_real = "real\nnewlines\nhere\nwith\\none escape"
206
+ _check(
207
+ "decode_escaped_whitespace leaves mostly-real-newline text alone",
208
+ decode_escaped_whitespace(mostly_real),
209
+ mostly_real,
210
+ )
211
+
212
+ # Heuristic: DO decode when escapes clearly dominate.
213
+ mostly_escaped = "one real\n then \\na \\nb \\nc \\nd"
214
+ decoded_ok = decode_escaped_whitespace(mostly_escaped)
215
+ assert decoded_ok.count("\n") > mostly_escaped.count("\n"), decoded_ok
216
+ assert decoded_ok.count("\\n") == 0, decoded_ok
217
+ print("[PASS] decode_escaped_whitespace decodes when escapes dominate")
218
+
219
+
220
+ # -------------------------------------------------------------------------
221
+ # 8. End-to-end: the originally-reported scenario now renders a real table.
222
+ # -------------------------------------------------------------------------
223
+ buggy_output = "<answer>...</answer>"
224
+ good_output = (
225
+ "<think>let me build the table</think>\n"
226
+ "<answer>\n"
227
+ "Here is the table:\n"
228
+ "| Planet | Distance (AU) |\n"
229
+ "|---|---|\n"
230
+ "| Mercury | 0.39 |\n"
231
+ "| Venus | 0.72 |\n"
232
+ "| Earth | 1.00 |\n"
233
+ "</answer>"
234
+ )
235
+
236
+ # The buggy case must no longer be accepted as an answer.
237
+ assert extract_answer(buggy_output) is None
238
+ # The good case must round-trip AND come out table-ready.
239
+ extracted = extract_answer(good_output)
240
+ assert extracted is not None
241
+ rendered_ready = ensure_markdown_table_blank_lines(extracted)
242
+ assert "\n\n| Planet | Distance (AU) |" in rendered_ready, rendered_ready
243
+ print("[PASS] end-to-end: placeholder rejected, real table rendered with blank line")
244
+
245
+ # -------------------------------------------------------------------------
246
+ # 9. Search backend rate-limit no longer crashes the whole agent.
247
+ # Simulates the DuckDuckGo 202 Ratelimit error the user reported.
248
+ # -------------------------------------------------------------------------
249
+ import app as _app
250
+
251
+ class _FakeRatelimit(Exception):
252
+ pass
253
+
254
+
255
+ class _RatelimitedDDGS:
256
+ """Stand-in for DDGS that always raises the way ddgs does on 202."""
257
+
258
+ def __enter__(self):
259
+ return self
260
+
261
+ def __exit__(self, exc_type, exc, tb):
262
+ return False
263
+
264
+ def text(self, *args, **kwargs):
265
+ raise _FakeRatelimit("https://html.duckduckgo.com/html 202 Ratelimit")
266
+
267
+
268
+ # Clear in-memory cache so the mock is actually exercised.
269
+ _app.SEARCH_CACHE.clear()
270
+
271
+ with mock.patch.object(_app, "DDGS", _RatelimitedDDGS), \
272
+ mock.patch.object(_app.time, "sleep", lambda *_a, **_k: None):
273
+ out = _app._run_search_single("iPhone 15 vs iPhone 16 features", max_results=3)
274
+
275
+ assert out["ok"] is False, out
276
+ assert "Ratelimit" in out["error"], out
277
+ assert out["results"] == []
278
+ assert "hint" in out and "training knowledge" in out["hint"], out
279
+ print("[PASS] _run_search_single converts DDG rate-limit into a graceful tool error")
280
+
281
+ # The caller that invokes build_research_agent wraps tool responses into a
282
+ # user message; the important thing is that _run_search_single NEVER raises,
283
+ # so the agent loop can continue and let the model produce an <answer>.
284
+ _app.SEARCH_CACHE.clear()
285
+ with mock.patch.object(_app, "DDGS", _RatelimitedDDGS), \
286
+ mock.patch.object(_app.time, "sleep", lambda *_a, **_k: None):
287
+ try:
288
+ _ = _app.run_search(["q1", "q2"], max_results=3)
289
+ raised = False
290
+ except Exception:
291
+ raised = True
292
+ assert not raised, "run_search should not raise when DDG rate-limits"
293
+ print("[PASS] run_search swallows backend errors across multi-query calls")
294
+
295
+
296
+ # -------------------------------------------------------------------------
297
+ # 10. Serper backend is preferred when SERPER_API_KEY is set, and DDG is
298
+ # used as a fallback. Verifies the latency fix for the iPhone query.
299
+ # -------------------------------------------------------------------------
300
+ class _FakeResponse:
301
+ def __init__(self, payload):
302
+ self._payload = payload
303
+
304
+ def raise_for_status(self):
305
+ return None
306
+
307
+ def json(self):
308
+ return self._payload
309
+
310
+
311
+ def _fake_serper_ok(url, headers, json, timeout): # noqa: A002 - gradio-style arg
312
+ assert headers.get("X-API-KEY") == "test-serper-key"
313
+ return _FakeResponse(
314
+ {
315
+ "answerBox": {
316
+ "title": "iPhone 16 vs 15",
317
+ "link": "https://example.com/answer",
318
+ "snippet": "Apple replaced the mute switch with an action button.",
319
+ },
320
+ "organic": [
321
+ {
322
+ "title": "iPhone 16 Specs",
323
+ "link": "https://example.com/iphone-16",
324
+ "snippet": "A18 chip, 48 MP camera, ...",
325
+ },
326
+ {
327
+ "title": "iPhone 15 Specs",
328
+ "link": "https://example.com/iphone-15",
329
+ "snippet": "A16 Bionic, Dynamic Island...",
330
+ },
331
+ ],
332
+ }
333
+ )
334
+
335
+
336
+ _app.SEARCH_CACHE.clear()
337
+ with mock.patch.object(_app, "SERPER_API_KEY", "test-serper-key"), \
338
+ mock.patch.object(_app.requests, "post", side_effect=_fake_serper_ok):
339
+ serper_out = _app._run_search_single("iPhone 16 vs iPhone 15", max_results=5)
340
+
341
+ assert serper_out["ok"] is True, serper_out
342
+ assert serper_out.get("backend") == "serper", serper_out
343
+ assert serper_out["results"][0]["title"] == "iPhone 16 vs 15", serper_out # answer box first
344
+ assert len(serper_out["results"]) == 3, serper_out
345
+ print("[PASS] Serper backend is preferred when SERPER_API_KEY is set")
346
+
347
+
348
+ def _fake_serper_fail(url, headers, json, timeout): # noqa: A002
349
+ raise RuntimeError("serper: 429 quota exceeded")
350
+
351
+
352
+ class _WorkingDDGS:
353
+ def __enter__(self):
354
+ return self
355
+
356
+ def __exit__(self, exc_type, exc, tb):
357
+ return False
358
+
359
+ def text(self, *args, **kwargs):
360
+ yield {
361
+ "title": "DDG result",
362
+ "href": "https://example.org/ddg",
363
+ "body": "ddg fallback body",
364
+ }
365
+
366
+
367
+ _app.SEARCH_CACHE.clear()
368
+ with mock.patch.object(_app, "SERPER_API_KEY", "test-serper-key"), \
369
+ mock.patch.object(_app.requests, "post", side_effect=_fake_serper_fail), \
370
+ mock.patch.object(_app, "DDGS", _WorkingDDGS):
371
+ fallback_out = _app._run_search_single("anything", max_results=2)
372
+
373
+ assert fallback_out["ok"] is True, fallback_out
374
+ assert fallback_out.get("backend") == "duckduckgo", fallback_out
375
+ assert fallback_out["results"][0]["href"] == "https://example.org/ddg"
376
+ print("[PASS] Falls back to DuckDuckGo when Serper errors out")
377
+
378
+
379
+ _app.SEARCH_CACHE.clear()
380
+ with mock.patch.object(_app, "SERPER_API_KEY", "test-serper-key"), \
381
+ mock.patch.object(_app.requests, "post", side_effect=_fake_serper_fail), \
382
+ mock.patch.object(_app, "DDGS", _RatelimitedDDGS), \
383
+ mock.patch.object(_app.time, "sleep", lambda *_a, **_k: None):
384
+ both_fail = _app._run_search_single("anything", max_results=2)
385
+
386
+ assert both_fail["ok"] is False, both_fail
387
+ assert "serper" in both_fail["error"].lower(), both_fail
388
+ assert "duckduckgo" in both_fail["error"].lower(), both_fail
389
+ assert "hint" in both_fail
390
+ print("[PASS] Returns graceful error when both Serper and DDG fail")
391
+
392
+
393
+ # -------------------------------------------------------------------------
394
+ # 11. build_research_agent streams progress (is a generator).
395
+ # -------------------------------------------------------------------------
396
+ import inspect as _inspect
397
+
398
+ assert _inspect.isgeneratorfunction(_app.build_research_agent), (
399
+ "build_research_agent should be a generator so run_ui can stream progress"
400
+ )
401
+ assert _inspect.isgeneratorfunction(_app.run_ui), (
402
+ "run_ui should be a generator so Gradio streams per-turn status to the UI"
403
+ )
404
+ print("[PASS] build_research_agent and run_ui are streaming generators")
405
+
406
+
407
+ # -------------------------------------------------------------------------
408
+ # 12. End-to-end dry run of the generator: verify at least one progress
409
+ # tuple is yielded BEFORE the final answer, and that the final yield
410
+ # is a real answer (not a placeholder).
411
+ # -------------------------------------------------------------------------
412
+ _fake_model_script = [
413
+ (
414
+ "<think>I should search the web for Mercury distance.</think>"
415
+ '<tool_call>{"name": "search", "arguments": {"query": ["Mercury distance AU"]}}</tool_call>',
416
+ "fake-model",
417
+ ),
418
+ (
419
+ "<answer>\n"
420
+ "Here is the table:\n"
421
+ "| Planet | Distance (AU) |\n"
422
+ "|---|---|\n"
423
+ "| Mercury | 0.39 |\n"
424
+ "</answer>",
425
+ "fake-model",
426
+ ),
427
+ ]
428
+
429
+
430
+ def _fake_call_model(*args, **kwargs):
431
+ return _fake_model_script.pop(0)
432
+
433
+
434
+ class _FakeInferenceClient:
435
+ def __init__(self, *a, **k):
436
+ pass
437
+
438
+
439
+ _app.SEARCH_CACHE.clear()
440
+ with mock.patch.object(_app, "call_model", side_effect=_fake_call_model), \
441
+ mock.patch.object(_app, "_build_client_for_model",
442
+ return_value=(_FakeInferenceClient(), "fake-model", [])), \
443
+ mock.patch.object(_app, "SERPER_API_KEY", "test-serper-key"), \
444
+ mock.patch.object(_app.requests, "post", side_effect=_fake_serper_ok):
445
+ gen = _app.build_research_agent(
446
+ question="How far is Mercury from the sun?",
447
+ model="fake-model",
448
+ max_turns=4,
449
+ max_search_results=3,
450
+ temperature=0.0,
451
+ )
452
+ emitted = list(gen)
453
+
454
+ assert len(emitted) >= 3, f"expected multiple progress yields, got {len(emitted)}"
455
+ final_answer, final_trace = emitted[-1]
456
+ assert "Mercury" in final_answer, final_answer
457
+ assert "| Planet |" in final_answer, final_answer
458
+ assert "...</answer>" not in final_answer
459
+ # Intermediate yields should have progress scaffolding.
460
+ assert any("⏳ Researching" in ans for ans, _ in emitted[:-1]), (
461
+ "no intermediate progress yield detected"
462
+ )
463
+ print("[PASS] build_research_agent streams progress then a real final answer")
464
+
465
+ print()
466
+ print("All markdown-fix regression tests passed.")
app.py CHANGED
@@ -960,22 +960,85 @@ _SEARCH_UNAVAILABLE_HINT = (
960
  "retry later if the question truly requires a fresh web lookup."
961
  )
962
 
 
 
 
 
 
 
963
 
964
- def _run_search_single(query: str, max_results: int) -> Dict[str, Any]:
965
- """Run one DuckDuckGo query.
966
 
967
- Returns a structured dict on both success and failure, never raises. If
968
- the search backend rate-limits us (Space IPs share outbound NAT and
969
- often trip DuckDuckGo's anti-scraping throttle), we return an
970
- `ok: False` payload with a hint that lets the agent fall back to its
971
- own knowledge instead of aborting the whole research run.
 
 
972
  """
973
- if not query.strip():
974
- return {"ok": False, "error": "Search query cannot be empty."}
975
- cache_key = f"{query.strip().lower()}::{max_results}"
976
- if cache_key in SEARCH_CACHE:
977
- return {**SEARCH_CACHE[cache_key], "cached": True}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
978
 
 
 
979
  last_exc: Optional[BaseException] = None
980
  for attempt in range(2):
981
  try:
@@ -989,14 +1052,15 @@ def _run_search_single(query: str, max_results: int) -> Dict[str, Any]:
989
  "body": item.get("body", ""),
990
  }
991
  )
992
- payload = {"ok": True, "query": query, "results": rows, "cached": False}
993
- SEARCH_CACHE[cache_key] = payload
994
- return payload
 
 
 
 
995
  except Exception as exc:
996
  last_exc = exc
997
- # One retry with a small backoff covers most transient 202
998
- # Ratelimit / transient network hiccups; on the second failure
999
- # we give up and return a graceful error to the agent.
1000
  if attempt == 0:
1001
  time.sleep(1.5)
1002
  continue
@@ -1005,7 +1069,53 @@ def _run_search_single(query: str, max_results: int) -> Dict[str, Any]:
1005
  return {
1006
  "ok": False,
1007
  "query": query,
1008
- "error": f"Search backend unavailable ({err}).",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1009
  "results": [],
1010
  "hint": _SEARCH_UNAVAILABLE_HINT,
1011
  }
@@ -1126,18 +1236,67 @@ def call_model(
1126
  raise RuntimeError(f"All model candidates failed. Last error: {last_error}")
1127
 
1128
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1129
  def build_research_agent(
1130
  question: str,
1131
  model: str,
1132
  max_turns: int,
1133
  max_search_results: int,
1134
  temperature: float,
1135
- ) -> Tuple[str, str]:
 
 
 
 
 
 
 
1136
  client, primary_model, fallback_models = _build_client_for_model(model)
1137
  # Display label: the real HF repo id is nicer than the TGI shim name.
1138
  display_primary = model if (model == QUEST_MODEL_ID) else primary_model
1139
  state = AgentState()
1140
  used_model = display_primary
 
 
 
 
 
 
 
 
1141
 
1142
  messages: List[Dict[str, str]] = [
1143
  {"role": "system", "content": build_system_prompt()},
@@ -1146,6 +1305,9 @@ def build_research_agent(
1146
 
1147
  final_answer: Optional[str] = None
1148
 
 
 
 
1149
  for turn in range(1, max_turns + 1):
1150
  if state.trusted_notes and turn > 1 and turn % 3 == 0:
1151
  summary_lines = "\n".join(f"- {n}" for n in state.trusted_notes[-6:])
@@ -1156,6 +1318,10 @@ def build_research_agent(
1156
  }
1157
  )
1158
 
 
 
 
 
1159
  raw_output, endpoint_model = call_model(
1160
  client=client,
1161
  messages=messages,
@@ -1164,21 +1330,28 @@ def build_research_agent(
1164
  temperature=temperature,
1165
  max_new_tokens=int(os.getenv("QUEST_MAX_NEW_TOKENS", "4096")),
1166
  )
 
1167
  model_output = raw_output
1168
  # Preserve the human-friendly model id for the trace even if the
1169
  # endpoint ignores the "model" param and returns the TGI shim name.
1170
  used_model = display_primary if endpoint_model == primary_model == QUEST_ENDPOINT_MODEL else endpoint_model
1171
  messages.append({"role": "assistant", "content": model_output})
1172
- state.trace.append({"turn": turn, "assistant": model_output})
 
 
1173
 
1174
  extracted_answer = extract_answer(model_output)
1175
  if extracted_answer:
1176
  final_answer = extracted_answer
 
 
1177
  break
1178
 
1179
  tool_name, tool_args, tool_err = parse_tool_call(model_output)
1180
  if tool_err:
1181
  tool_response = {"ok": False, "error": tool_err}
 
 
1182
  elif not tool_name:
1183
  # No explicit tool call and no final answer: force finalization.
1184
  # IMPORTANT: do not write the literal characters `<answer>...</answer>`
@@ -1202,6 +1375,8 @@ def build_research_agent(
1202
  ),
1203
  }
1204
  )
 
 
1205
  continue
1206
  else:
1207
  if tool_name == "search":
@@ -1214,7 +1389,13 @@ def build_research_agent(
1214
  max_results = int(tool_args.get("max_results", max_search_results))
1215
  max_results = max(1, min(max_results, 10))
1216
 
 
 
 
 
1217
  per_query: List[Dict[str, Any]] = []
 
 
1218
  for q in queries:
1219
  if q in state.searched_query_set:
1220
  per_query.append({
@@ -1224,22 +1405,36 @@ def build_research_agent(
1224
  "note": "Already searched; reusing cached result.",
1225
  "results": [],
1226
  })
 
1227
  continue
1228
  state.searched_queries.append(q)
1229
  state.searched_query_set.add(q)
1230
  single = _run_search_single(q, max_results)
1231
  per_query.append(single)
 
1232
  if single.get("ok"):
 
1233
  first_titles = [r.get("title", "") for r in single.get("results", [])[:2]]
1234
  if first_titles:
1235
  state.trusted_notes.append(
1236
  f"Searched '{q}' and found leads: {', '.join(t for t in first_titles if t)}"
1237
  )
 
 
 
 
 
1238
  tool_response = (
1239
  per_query[0]
1240
  if len(per_query) == 1
1241
  else {"ok": True, "queries": queries, "results": per_query}
1242
  )
 
 
 
 
 
 
1243
  elif tool_name == "visit":
1244
  raw_url = tool_args.get("url", "")
1245
  urls: List[str]
@@ -1251,7 +1446,12 @@ def build_research_agent(
1251
  max_chars = int(tool_args.get("max_chars", 6000))
1252
  max_chars = max(500, min(max_chars, 20000))
1253
 
 
 
 
 
1254
  per_url: List[Dict[str, Any]] = []
 
1255
  for u in urls:
1256
  if u in state.visited_url_set:
1257
  per_url.append({
@@ -1260,12 +1460,14 @@ def build_research_agent(
1260
  "cached": True,
1261
  "note": "Already visited; reusing cached result.",
1262
  })
 
1263
  continue
1264
  state.visited_urls.append(u)
1265
  state.visited_url_set.add(u)
1266
  single = _run_visit_single(u, max_chars, goal)
1267
  per_url.append(single)
1268
  if single.get("ok"):
 
1269
  snippet = str(single.get("content", ""))[:180]
1270
  if snippet:
1271
  state.trusted_notes.append(
@@ -1276,8 +1478,14 @@ def build_research_agent(
1276
  if len(per_url) == 1
1277
  else {"ok": True, "goal": goal, "results": per_url}
1278
  )
 
 
 
 
1279
  else:
1280
  tool_response = {"ok": False, "error": f"Unknown tool: {tool_name}"}
 
 
1281
 
1282
  state.trace.append({"turn": turn, "tool": tool_name, "tool_response": tool_response})
1283
  messages.append(
@@ -1302,18 +1510,8 @@ def build_research_agent(
1302
  if citations:
1303
  final_answer = f"{final_answer}\n\n### Visited Sources\n{citations}"
1304
 
1305
- trace_text = json.dumps(
1306
- {
1307
- "used_model": used_model,
1308
- "searched_queries": state.searched_queries,
1309
- "visited_urls": state.visited_urls,
1310
- "trusted_notes": state.trusted_notes[-10:],
1311
- "trace": state.trace,
1312
- },
1313
- ensure_ascii=False,
1314
- indent=2,
1315
- )
1316
- return final_answer, trace_text
1317
 
1318
 
1319
  def run_ui(
@@ -1324,13 +1522,15 @@ def run_ui(
1324
  temperature: float,
1325
  ):
1326
  if not question.strip():
1327
- return "Please input a question.", "{}"
 
1328
  if not os.getenv("HF_TOKEN"):
1329
  warning = (
1330
  "HF_TOKEN is not configured in Space Secrets. "
1331
  "Go to Settings -> Secrets -> add `HF_TOKEN`, then retry."
1332
  )
1333
- return warning, json.dumps({"error": warning}, ensure_ascii=False, indent=2)
 
1334
  if model == QUEST_MODEL_ID and not QUEST_BASE_URL:
1335
  warning = (
1336
  f"`{QUEST_MODEL_ID}` is private and not available via the free HF Inference API. "
@@ -1338,17 +1538,19 @@ def run_ui(
1338
  "then set `QUEST_BASE_URL` in Space Secrets to the endpoint's `/v1/` URL. "
1339
  "In the meantime you can pick one of the open-weights models in the dropdown."
1340
  )
1341
- return warning, json.dumps({"error": warning}, ensure_ascii=False, indent=2)
 
1342
  try:
1343
- return build_research_agent(
1344
  question=question,
1345
  model=model,
1346
  max_turns=max_turns,
1347
  max_search_results=max_search_results,
1348
  temperature=temperature,
1349
- )
 
1350
  except Exception as exc:
1351
- return f"Error: {exc}", json.dumps({"error": str(exc)}, ensure_ascii=False, indent=2)
1352
 
1353
 
1354
  EXAMPLES = [
@@ -1470,7 +1672,7 @@ with gr.Blocks(
1470
  label="Max Turns",
1471
  minimum=2,
1472
  maximum=20,
1473
- value=8,
1474
  step=1,
1475
  )
1476
  max_search_results = gr.Slider(
 
960
  "retry later if the question truly requires a fresh web lookup."
961
  )
962
 
963
+ # Google Serper API key. Either SERPER_API_KEY or SERPER_KEY_ID is accepted
964
+ # so that the Space matches the env-var name used by the research repo.
965
+ SERPER_API_KEY = (
966
+ os.getenv("SERPER_API_KEY") or os.getenv("SERPER_KEY_ID") or ""
967
+ ).strip()
968
+ SERPER_ENDPOINT = os.getenv("SERPER_ENDPOINT", "https://google.serper.dev/search")
969
 
 
 
970
 
971
+ def _serper_search(query: str, max_results: int) -> Dict[str, Any]:
972
+ """Hit the Google Serper API. Returns the same shape as `_ddg_search`.
973
+
974
+ Serper responds in well under a second and is not subject to the 202
975
+ Ratelimit we get from html.duckduckgo.com, so preferring it when the
976
+ key is set cuts latency dramatically and eliminates most search
977
+ failures on shared Space IPs.
978
  """
979
+ try:
980
+ resp = requests.post(
981
+ SERPER_ENDPOINT,
982
+ headers={
983
+ "X-API-KEY": SERPER_API_KEY,
984
+ "Content-Type": "application/json",
985
+ },
986
+ json={"q": query, "num": max_results},
987
+ timeout=15,
988
+ )
989
+ resp.raise_for_status()
990
+ data = resp.json()
991
+ except Exception as exc:
992
+ return {
993
+ "ok": False,
994
+ "query": query,
995
+ "error": f"Serper error: {type(exc).__name__}: {exc}",
996
+ "results": [],
997
+ "backend": "serper",
998
+ }
999
+
1000
+ rows: List[Dict[str, str]] = []
1001
+ for item in (data.get("organic") or [])[:max_results]:
1002
+ rows.append(
1003
+ {
1004
+ "title": item.get("title", ""),
1005
+ "href": item.get("link", ""),
1006
+ "body": item.get("snippet", ""),
1007
+ }
1008
+ )
1009
+ # Fold in the answer box and knowledge graph when present; these often
1010
+ # carry the exact fact the model is looking for in a compact form.
1011
+ answer_box = data.get("answerBox") or {}
1012
+ if answer_box:
1013
+ rows.insert(
1014
+ 0,
1015
+ {
1016
+ "title": answer_box.get("title", "Answer box"),
1017
+ "href": answer_box.get("link", ""),
1018
+ "body": answer_box.get("snippet")
1019
+ or answer_box.get("answer")
1020
+ or "",
1021
+ },
1022
+ )
1023
+ if not rows:
1024
+ return {
1025
+ "ok": False,
1026
+ "query": query,
1027
+ "error": "Serper returned no organic results",
1028
+ "results": [],
1029
+ "backend": "serper",
1030
+ }
1031
+ return {
1032
+ "ok": True,
1033
+ "query": query,
1034
+ "results": rows,
1035
+ "cached": False,
1036
+ "backend": "serper",
1037
+ }
1038
+
1039
 
1040
+ def _ddg_search(query: str, max_results: int) -> Dict[str, Any]:
1041
+ """Fallback path: scrape DuckDuckGo. Rate-limits on shared IPs."""
1042
  last_exc: Optional[BaseException] = None
1043
  for attempt in range(2):
1044
  try:
 
1052
  "body": item.get("body", ""),
1053
  }
1054
  )
1055
+ return {
1056
+ "ok": True,
1057
+ "query": query,
1058
+ "results": rows,
1059
+ "cached": False,
1060
+ "backend": "duckduckgo",
1061
+ }
1062
  except Exception as exc:
1063
  last_exc = exc
 
 
 
1064
  if attempt == 0:
1065
  time.sleep(1.5)
1066
  continue
 
1069
  return {
1070
  "ok": False,
1071
  "query": query,
1072
+ "error": f"DuckDuckGo unavailable ({err}).",
1073
+ "results": [],
1074
+ "backend": "duckduckgo",
1075
+ }
1076
+
1077
+
1078
+ def _run_search_single(query: str, max_results: int) -> Dict[str, Any]:
1079
+ """Run one search query, preferring Serper when the key is set.
1080
+
1081
+ Returns a structured dict on both success and failure; never raises.
1082
+ Order of preference:
1083
+
1084
+ 1. Google Serper (fast, no scraping, requires `SERPER_API_KEY` /
1085
+ `SERPER_KEY_ID`).
1086
+ 2. DuckDuckGo HTML backend (free, but rate-limits on shared Space IPs).
1087
+ 3. Graceful `ok: False` payload with a hint that tells the agent to
1088
+ answer from its own knowledge if it reasonably can.
1089
+ """
1090
+ if not query.strip():
1091
+ return {"ok": False, "error": "Search query cannot be empty."}
1092
+ cache_key = f"{query.strip().lower()}::{max_results}"
1093
+ if cache_key in SEARCH_CACHE:
1094
+ return {**SEARCH_CACHE[cache_key], "cached": True}
1095
+
1096
+ tried: List[Dict[str, Any]] = []
1097
+ if SERPER_API_KEY:
1098
+ serper_result = _serper_search(query, max_results)
1099
+ if serper_result.get("ok"):
1100
+ SEARCH_CACHE[cache_key] = serper_result
1101
+ return serper_result
1102
+ tried.append(serper_result)
1103
+
1104
+ ddg_result = _ddg_search(query, max_results)
1105
+ if ddg_result.get("ok"):
1106
+ SEARCH_CACHE[cache_key] = ddg_result
1107
+ return ddg_result
1108
+ tried.append(ddg_result)
1109
+
1110
+ # Both backends failed (or no Serper key and DDG rate-limited).
1111
+ errors = "; ".join(
1112
+ f"{r.get('backend', 'unknown')}: {r.get('error', 'no results')}"
1113
+ for r in tried
1114
+ )
1115
+ return {
1116
+ "ok": False,
1117
+ "query": query,
1118
+ "error": f"All search backends failed ({errors}).",
1119
  "results": [],
1120
  "hint": _SEARCH_UNAVAILABLE_HINT,
1121
  }
 
1236
  raise RuntimeError(f"All model candidates failed. Last error: {last_error}")
1237
 
1238
 
1239
+ def _render_progress(
1240
+ lines: List[str],
1241
+ used_model: str,
1242
+ question: str,
1243
+ ) -> str:
1244
+ """Render the in-progress status view that replaces the Markdown panel
1245
+ while the agent is still running, so the user is not staring at a blank
1246
+ box for the 20-60 seconds a full Quest-4B research run can take."""
1247
+ header = (
1248
+ f"### ⏳ Researching…\n\n"
1249
+ f"**Model:** `{used_model}` \n"
1250
+ f"**Question:** {question.strip()[:200]}"
1251
+ )
1252
+ if not lines:
1253
+ body = "_Starting agent…_"
1254
+ else:
1255
+ body = "\n".join(f"- {line}" for line in lines)
1256
+ return f"{header}\n\n{body}"
1257
+
1258
+
1259
+ def _trace_to_json(state: "AgentState", used_model: str) -> str:
1260
+ return json.dumps(
1261
+ {
1262
+ "used_model": used_model,
1263
+ "searched_queries": state.searched_queries,
1264
+ "visited_urls": state.visited_urls,
1265
+ "trusted_notes": state.trusted_notes[-10:],
1266
+ "trace": state.trace,
1267
+ },
1268
+ ensure_ascii=False,
1269
+ indent=2,
1270
+ )
1271
+
1272
+
1273
  def build_research_agent(
1274
  question: str,
1275
  model: str,
1276
  max_turns: int,
1277
  max_search_results: int,
1278
  temperature: float,
1279
+ ):
1280
+ """Run the ReAct research loop as a generator.
1281
+
1282
+ Each `yield` emits a `(markdown_for_answer_panel, json_for_record_panel)`
1283
+ tuple. Intermediate yields show progress so that Gradio streams the
1284
+ status lines into the UI as work happens. The last yield contains the
1285
+ final answer and the final trace.
1286
+ """
1287
  client, primary_model, fallback_models = _build_client_for_model(model)
1288
  # Display label: the real HF repo id is nicer than the TGI shim name.
1289
  display_primary = model if (model == QUEST_MODEL_ID) else primary_model
1290
  state = AgentState()
1291
  used_model = display_primary
1292
+ status_lines: List[str] = []
1293
+
1294
+ def _emit():
1295
+ """Yield the current progress snapshot to Gradio."""
1296
+ return (
1297
+ _render_progress(status_lines, used_model, question),
1298
+ _trace_to_json(state, used_model),
1299
+ )
1300
 
1301
  messages: List[Dict[str, str]] = [
1302
  {"role": "system", "content": build_system_prompt()},
 
1305
 
1306
  final_answer: Optional[str] = None
1307
 
1308
+ status_lines.append("πŸš€ Starting research agent")
1309
+ yield _emit()
1310
+
1311
  for turn in range(1, max_turns + 1):
1312
  if state.trusted_notes and turn > 1 and turn % 3 == 0:
1313
  summary_lines = "\n".join(f"- {n}" for n in state.trusted_notes[-6:])
 
1318
  }
1319
  )
1320
 
1321
+ status_lines.append(f"🧠 turn {turn}: thinking…")
1322
+ yield _emit()
1323
+
1324
+ t0 = time.time()
1325
  raw_output, endpoint_model = call_model(
1326
  client=client,
1327
  messages=messages,
 
1330
  temperature=temperature,
1331
  max_new_tokens=int(os.getenv("QUEST_MAX_NEW_TOKENS", "4096")),
1332
  )
1333
+ dt = time.time() - t0
1334
  model_output = raw_output
1335
  # Preserve the human-friendly model id for the trace even if the
1336
  # endpoint ignores the "model" param and returns the TGI shim name.
1337
  used_model = display_primary if endpoint_model == primary_model == QUEST_ENDPOINT_MODEL else endpoint_model
1338
  messages.append({"role": "assistant", "content": model_output})
1339
+ state.trace.append({"turn": turn, "assistant": model_output, "elapsed_s": round(dt, 2)})
1340
+ status_lines[-1] = f"🧠 turn {turn}: model reply in {dt:.1f}s"
1341
+ yield _emit()
1342
 
1343
  extracted_answer = extract_answer(model_output)
1344
  if extracted_answer:
1345
  final_answer = extracted_answer
1346
+ status_lines.append("✍️ writing final answer")
1347
+ yield _emit()
1348
  break
1349
 
1350
  tool_name, tool_args, tool_err = parse_tool_call(model_output)
1351
  if tool_err:
1352
  tool_response = {"ok": False, "error": tool_err}
1353
+ status_lines.append(f"⚠️ turn {turn}: malformed tool call β€” {tool_err}")
1354
+ yield _emit()
1355
  elif not tool_name:
1356
  # No explicit tool call and no final answer: force finalization.
1357
  # IMPORTANT: do not write the literal characters `<answer>...</answer>`
 
1375
  ),
1376
  }
1377
  )
1378
+ status_lines.append(f"πŸ™ƒ turn {turn}: model stalled; asking for an answer")
1379
+ yield _emit()
1380
  continue
1381
  else:
1382
  if tool_name == "search":
 
1389
  max_results = int(tool_args.get("max_results", max_search_results))
1390
  max_results = max(1, min(max_results, 10))
1391
 
1392
+ queries_preview = ", ".join(f"`{q}`" for q in queries) or "_(empty)_"
1393
+ status_lines.append(f"πŸ” turn {turn}: searching {queries_preview}")
1394
+ yield _emit()
1395
+
1396
  per_query: List[Dict[str, Any]] = []
1397
+ backend_labels: List[str] = []
1398
+ hits_total = 0
1399
  for q in queries:
1400
  if q in state.searched_query_set:
1401
  per_query.append({
 
1405
  "note": "Already searched; reusing cached result.",
1406
  "results": [],
1407
  })
1408
+ backend_labels.append("cache")
1409
  continue
1410
  state.searched_queries.append(q)
1411
  state.searched_query_set.add(q)
1412
  single = _run_search_single(q, max_results)
1413
  per_query.append(single)
1414
+ backend_labels.append(single.get("backend", "unknown"))
1415
  if single.get("ok"):
1416
+ hits_total += len(single.get("results", []))
1417
  first_titles = [r.get("title", "") for r in single.get("results", [])[:2]]
1418
  if first_titles:
1419
  state.trusted_notes.append(
1420
  f"Searched '{q}' and found leads: {', '.join(t for t in first_titles if t)}"
1421
  )
1422
+ else:
1423
+ status_lines.append(
1424
+ f"⚠️ search failed on `{q}` via {single.get('backend', 'unknown')}: "
1425
+ f"{single.get('error', 'no results')}"
1426
+ )
1427
  tool_response = (
1428
  per_query[0]
1429
  if len(per_query) == 1
1430
  else {"ok": True, "queries": queries, "results": per_query}
1431
  )
1432
+ unique_backends = sorted(set(backend_labels))
1433
+ backend_str = "/".join(unique_backends) if unique_backends else "?"
1434
+ status_lines.append(
1435
+ f"βœ… turn {turn}: got {hits_total} hit(s) via {backend_str}"
1436
+ )
1437
+ yield _emit()
1438
  elif tool_name == "visit":
1439
  raw_url = tool_args.get("url", "")
1440
  urls: List[str]
 
1446
  max_chars = int(tool_args.get("max_chars", 6000))
1447
  max_chars = max(500, min(max_chars, 20000))
1448
 
1449
+ urls_preview = ", ".join(f"`{u[:60]}`" for u in urls) or "_(empty)_"
1450
+ status_lines.append(f"🌐 turn {turn}: visiting {urls_preview}")
1451
+ yield _emit()
1452
+
1453
  per_url: List[Dict[str, Any]] = []
1454
+ visit_ok = 0
1455
  for u in urls:
1456
  if u in state.visited_url_set:
1457
  per_url.append({
 
1460
  "cached": True,
1461
  "note": "Already visited; reusing cached result.",
1462
  })
1463
+ visit_ok += 1
1464
  continue
1465
  state.visited_urls.append(u)
1466
  state.visited_url_set.add(u)
1467
  single = _run_visit_single(u, max_chars, goal)
1468
  per_url.append(single)
1469
  if single.get("ok"):
1470
+ visit_ok += 1
1471
  snippet = str(single.get("content", ""))[:180]
1472
  if snippet:
1473
  state.trusted_notes.append(
 
1478
  if len(per_url) == 1
1479
  else {"ok": True, "goal": goal, "results": per_url}
1480
  )
1481
+ status_lines.append(
1482
+ f"βœ… turn {turn}: read {visit_ok}/{len(urls)} page(s)"
1483
+ )
1484
+ yield _emit()
1485
  else:
1486
  tool_response = {"ok": False, "error": f"Unknown tool: {tool_name}"}
1487
+ status_lines.append(f"⚠️ turn {turn}: unknown tool `{tool_name}`")
1488
+ yield _emit()
1489
 
1490
  state.trace.append({"turn": turn, "tool": tool_name, "tool_response": tool_response})
1491
  messages.append(
 
1510
  if citations:
1511
  final_answer = f"{final_answer}\n\n### Visited Sources\n{citations}"
1512
 
1513
+ trace_text = _trace_to_json(state, used_model)
1514
+ yield (final_answer, trace_text)
 
 
 
 
 
 
 
 
 
 
1515
 
1516
 
1517
  def run_ui(
 
1522
  temperature: float,
1523
  ):
1524
  if not question.strip():
1525
+ yield "Please input a question.", "{}"
1526
+ return
1527
  if not os.getenv("HF_TOKEN"):
1528
  warning = (
1529
  "HF_TOKEN is not configured in Space Secrets. "
1530
  "Go to Settings -> Secrets -> add `HF_TOKEN`, then retry."
1531
  )
1532
+ yield warning, json.dumps({"error": warning}, ensure_ascii=False, indent=2)
1533
+ return
1534
  if model == QUEST_MODEL_ID and not QUEST_BASE_URL:
1535
  warning = (
1536
  f"`{QUEST_MODEL_ID}` is private and not available via the free HF Inference API. "
 
1538
  "then set `QUEST_BASE_URL` in Space Secrets to the endpoint's `/v1/` URL. "
1539
  "In the meantime you can pick one of the open-weights models in the dropdown."
1540
  )
1541
+ yield warning, json.dumps({"error": warning}, ensure_ascii=False, indent=2)
1542
+ return
1543
  try:
1544
+ for partial_answer, partial_trace in build_research_agent(
1545
  question=question,
1546
  model=model,
1547
  max_turns=max_turns,
1548
  max_search_results=max_search_results,
1549
  temperature=temperature,
1550
+ ):
1551
+ yield partial_answer, partial_trace
1552
  except Exception as exc:
1553
+ yield f"Error: {exc}", json.dumps({"error": str(exc)}, ensure_ascii=False, indent=2)
1554
 
1555
 
1556
  EXAMPLES = [
 
1672
  label="Max Turns",
1673
  minimum=2,
1674
  maximum=20,
1675
+ value=6,
1676
  step=1,
1677
  )
1678
  max_search_results = gr.Slider(