Spaces:

osunlp
/

QUEST

Running

Lzy01241010 commited on 8 days ago

Commit

5228496

1 Parent(s): f7d0ad7

agent: bump default MEMORY_TOKEN_THRESHOLD 16k -> 80k

QUEST-35B serves with max_model_len=160000, so 16k was triggering the
condenser way earlier than needed and hurting throughput. 80k gives the
agent more headroom to accumulate tool responses before paying the
summarizer LLM round-trip; condenser still kicks in well before the
context window cap.

Files changed (1) hide show

app.py +1 -1

app.py CHANGED Viewed

@@ -1564,7 +1564,7 @@ MEMORY_TOKEN_THRESHOLD = int(
     os.getenv("MEMORY_THRESHOLD")
     or os.getenv("MEMORY_CONTEXT_THRESHOLD")
     or os.getenv("MEMORY_TOKEN_THRESHOLD")
-    or "16000"
 )
 # Azure OpenAI support — mirrors inference/tool_visit.py logic. When

     os.getenv("MEMORY_THRESHOLD")
     or os.getenv("MEMORY_CONTEXT_THRESHOLD")
     or os.getenv("MEMORY_TOKEN_THRESHOLD")
+    or "80000"
 )
 # Azure OpenAI support — mirrors inference/tool_visit.py logic. When