Commit ·
5228496
1
Parent(s): f7d0ad7
agent: bump default MEMORY_TOKEN_THRESHOLD 16k -> 80k
Browse filesQUEST-35B serves with max_model_len=160000, so 16k was triggering the
condenser way earlier than needed and hurting throughput. 80k gives the
agent more headroom to accumulate tool responses before paying the
summarizer LLM round-trip; condenser still kicks in well before the
context window cap.
app.py
CHANGED
|
@@ -1564,7 +1564,7 @@ MEMORY_TOKEN_THRESHOLD = int(
|
|
| 1564 |
os.getenv("MEMORY_THRESHOLD")
|
| 1565 |
or os.getenv("MEMORY_CONTEXT_THRESHOLD")
|
| 1566 |
or os.getenv("MEMORY_TOKEN_THRESHOLD")
|
| 1567 |
-
or "
|
| 1568 |
)
|
| 1569 |
|
| 1570 |
# Azure OpenAI support — mirrors inference/tool_visit.py logic. When
|
|
|
|
| 1564 |
os.getenv("MEMORY_THRESHOLD")
|
| 1565 |
or os.getenv("MEMORY_CONTEXT_THRESHOLD")
|
| 1566 |
or os.getenv("MEMORY_TOKEN_THRESHOLD")
|
| 1567 |
+
or "80000"
|
| 1568 |
)
|
| 1569 |
|
| 1570 |
# Azure OpenAI support — mirrors inference/tool_visit.py logic. When
|