Spaces:

ycwhencpp
/

final-iteration

Paused

vaibhav12332112312 commited on 12 days ago

Commit

4299c91

1 Parent(s): 9fac734

Mandate tool calls in system prompt to debug zero-tool collapse

All 180 logged responses (before / round1 / after) had tool_calls = [].
The previous prompt only listed the catalog and hinted at posting hours,
giving the model no reason to call any query_*. Add an explicit tool
policy: required query_* on day 1 and predict_engagement + a fresh
query_* on later days. Concrete arg examples (segment_id, competitor_id)
keep validation from rejecting them.

Made-with: Cursor

Files changed (1) hide show

training/train_grpo.ipynb +14 -5

training/train_grpo.ipynb CHANGED Viewed

@@ -441,9 +441,18 @@
         "- topic:        free-form string\n",
         "- empty scheduled_actions = full day rest\"\"\")\n",
         "\n",
-        "SYSTEM_PROMPT = _SYSTEM_BASE + textwrap.dedent(\"\"\"\n",
-        "\n",
-        "HINT: schedule posts during/just before the audience_active_hours window — that is when your target users are online.\"\"\")\n",
         "SYSTEM_PROMPT_EVAL = SYSTEM_PROMPT\n",
         "SYSTEM_PROMPT_TRAIN = SYSTEM_PROMPT\n",
         "\n",
@@ -840,8 +849,8 @@
         "\n",
         "peft_model.eval()\n",
         "t0 = time.time()\n",
-    "results = run_llm_episodes_batched(peft_model, tokenizer, [(t, 42) for t in TASKS], verbose=True, eval=True, log_tag=\"after\")\n",
-    "after_results = {r[\"task\"]: r for r in results}\n",
         "\n",
         "print(\"\\n\" + \"=\" * 60)\n",
         "print(f\"AFTER TRAINING (took {time.time()-t0:.1f}s):\")\n",

         "- topic:        free-form string\n",
         "- empty scheduled_actions = full day rest\"\"\")\n",
         "\n",
+    "SYSTEM_PROMPT = _SYSTEM_BASE + textwrap.dedent(\"\"\"\n",
+    "\n",
+    "TOOL POLICY (MANDATORY — empty tool_calls on day 1 = wasted day):\n",
+    "- days_elapsed == 0  -> call AT LEAST these in tool_calls:\n",
+    "    {\"name\": \"query_trends\",        \"arguments\": {\"niche\": \"<one of TOPIC_CATEGORIES keys>\"}}\n",
+    "    {\"name\": \"query_audience\",      \"arguments\": {\"segment_id\": \"young_professionals\"}}\n",
+    "    {\"name\": \"query_creator_pool\",  \"arguments\": {}}\n",
+    "    {\"name\": \"query_competitor\",    \"arguments\": {\"competitor_id\": \"niche_expert\", \"window_days\": 7}}\n",
+    "- days_elapsed >= 1  -> before scheduling, call:\n",
+    "    {\"name\": \"predict_engagement\",  \"arguments\": {\"scheduled_actions\": [...]}}\n",
+    "  and at least one query_* tool whose result you don't already have in Tool results.\n",
+    "- audience_active_hours in the observation is a coarse hint; query_audience returns ranked topic affinities you cannot get otherwise.\"\"\")\n",
         "SYSTEM_PROMPT_EVAL = SYSTEM_PROMPT\n",
         "SYSTEM_PROMPT_TRAIN = SYSTEM_PROMPT\n",
         "\n",
         "\n",
         "peft_model.eval()\n",
         "t0 = time.time()\n",
+        "results = run_llm_episodes_batched(peft_model, tokenizer, [(t, 42) for t in TASKS], verbose=True, eval=True, log_tag=\"after\")\n",
+        "after_results = {r[\"task\"]: r for r in results}\n",
         "\n",
         "print(\"\\n\" + \"=\" * 60)\n",
         "print(f\"AFTER TRAINING (took {time.time()-t0:.1f}s):\")\n",