Lzy01241010 commited on
Commit
5228496
·
1 Parent(s): f7d0ad7

agent: bump default MEMORY_TOKEN_THRESHOLD 16k -> 80k

Browse files

QUEST-35B serves with max_model_len=160000, so 16k was triggering the
condenser way earlier than needed and hurting throughput. 80k gives the
agent more headroom to accumulate tool responses before paying the
summarizer LLM round-trip; condenser still kicks in well before the
context window cap.

Files changed (1) hide show
  1. app.py +1 -1
app.py CHANGED
@@ -1564,7 +1564,7 @@ MEMORY_TOKEN_THRESHOLD = int(
1564
  os.getenv("MEMORY_THRESHOLD")
1565
  or os.getenv("MEMORY_CONTEXT_THRESHOLD")
1566
  or os.getenv("MEMORY_TOKEN_THRESHOLD")
1567
- or "16000"
1568
  )
1569
 
1570
  # Azure OpenAI support — mirrors inference/tool_visit.py logic. When
 
1564
  os.getenv("MEMORY_THRESHOLD")
1565
  or os.getenv("MEMORY_CONTEXT_THRESHOLD")
1566
  or os.getenv("MEMORY_TOKEN_THRESHOLD")
1567
+ or "80000"
1568
  )
1569
 
1570
  # Azure OpenAI support — mirrors inference/tool_visit.py logic. When