Spaces:

smolagents
/

ml-intern

Running on CPU Upgrade

akseljoonas HF Staff Claude Opus 4.6 (1M context) commited on Apr 1

Commit

1158f2c

1 Parent(s): 6f67ddc

feat: add research sub-agent tool, slim down main agent system prompt

Adds a `research` tool that spawns a cheaper LLM in its own context window
with read-only tools (github_find_examples, explore_hf_docs, etc.) and
returns a concise summary. This keeps expensive research output out of
the main agent's context.

The system prompt's Phase 1 research section is replaced with a single
`research({task, context})` call pattern — all the detailed research
methodology (tool chains, correct patterns, examples) moves into the
sub-agent's system prompt where it belongs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Files changed (3) hide show

agent/core/tools.py +8 -0
agent/prompts/system_prompt_v2.yaml +42 -179
agent/tools/research_tool.py +292 -0

agent/core/tools.py CHANGED Viewed

@@ -48,6 +48,7 @@ from agent.tools.hf_repo_git_tool import (
 from agent.tools.jobs_tool import HF_JOBS_TOOL_SPEC, hf_jobs_handler
 from agent.tools.papers_tool import HF_PAPERS_TOOL_SPEC, hf_papers_handler
 from agent.tools.plan_tool import PLAN_TOOL_SPEC, plan_tool_handler
 from agent.tools.sandbox_tool import get_sandbox_tools
 # NOTE: Private HF repo tool disabled - replaced by hf_repo_files and hf_repo_git
@@ -282,6 +283,13 @@ def create_builtin_tools(local_mode: bool = False) -> list[ToolSpec]:
     """Create built-in tool specifications"""
     # in order of importance
     tools = [
         # Documentation search tools
         ToolSpec(
             name=EXPLORE_HF_DOCS_TOOL_SPEC["name"],

 from agent.tools.jobs_tool import HF_JOBS_TOOL_SPEC, hf_jobs_handler
 from agent.tools.papers_tool import HF_PAPERS_TOOL_SPEC, hf_papers_handler
 from agent.tools.plan_tool import PLAN_TOOL_SPEC, plan_tool_handler
+from agent.tools.research_tool import RESEARCH_TOOL_SPEC, research_handler
 from agent.tools.sandbox_tool import get_sandbox_tools
 # NOTE: Private HF repo tool disabled - replaced by hf_repo_files and hf_repo_git
     """Create built-in tool specifications"""
     # in order of importance
     tools = [
+        # Research sub-agent (delegates to read-only tools in independent context)
+        ToolSpec(
+            name=RESEARCH_TOOL_SPEC["name"],
+            description=RESEARCH_TOOL_SPEC["description"],
+            parameters=RESEARCH_TOOL_SPEC["parameters"],
+            handler=research_handler,
+        ),
         # Documentation search tools
         ToolSpec(
             name=EXPLORE_HF_DOCS_TOOL_SPEC["name"],

agent/prompts/system_prompt_v2.yaml CHANGED Viewed

@@ -23,93 +23,29 @@ system_prompt: |
   ## PHASE 1: RESEARCH (Mandatory - Never Skip)
-  ⚠️ **CRITICAL:** Your training data is outdated. NEVER implement ML tasks without checking current documentation AND working example code first. APIs, best practices, and methods change frequently.
-  **Research Checklist:**
-  1. ✅ **Identify relevant libraries** (TRL for training, datasets for data, PEFT for LoRA, trackio for monitoring)
-  2. ✅ **Find working example code FIRST**: `github_find_examples({"repo": "trl", "keyword": "grpo"})`
-     - ⚠️ MANDATORY: Find reference implementations before coding
-     - Returns: Working scripts/notebooks from examples/ and scripts/ directories
-     - Shows: Current API usage, proven patterns, best practices
-  3. ✅ **Read example implementations**: `github_read_file({"repo": "huggingface/trl", "path": "examples/scripts/..."})`
-     - Study working code to understand current APIs
-     - See actual trainer configurations, parameters, imports
-     - Learn from production-ready implementations
-  4. ✅ **Explore documentation structure**: `explore_hf_docs(<endpoint>)`
-     - For training: "trl", "peft", "accelerate"
-     - For data: "datasets", "dataset-viewer"
-     - For monitoring: "trackio"
-     - For inference: "vllm", "inference-endpoints"
-  5. ✅ **Fetch specific documentation**: `fetch_hf_docs(<url>)` from explore results
-  6. ✅ **Find API endpoints if needed**: `find_hf_api(query="space logs")` or `find_hf_api(tag="spaces")` for REST API operations
-  **✓ CORRECT Research Pattern:**
-  ```python
-  # User requests: "Fine-tune a model for instruction following using SFT"
-  # Step 1: Find working example code FIRST
-  github_find_examples({"repo": "trl", "keyword": "sft", "org": "huggingface"})
-  # Returns: examples/scripts/sft.py, examples/scripts/sft_vlm.py
-  # Step 2: Read the example implementation
-  github_read_file({"repo": "huggingface/trl", "path": "examples/scripts/sft.py"})
-  # Study: imports, SFTTrainer usage, SFTConfig parameters, dataset handling
-  # Step 3: Explore TRL documentation for details
-  explore_hf_docs("trl")  # Discover available pages
-  # Step 4: Fetch specific trainer documentation
-  fetch_hf_docs("https://huggingface.co/docs/trl/sft_trainer")  # Get SFTTrainer details
-  fetch_hf_docs("https://huggingface.co/docs/trl/sft_config")  # Get SFTConfig parameters
-  # Step 5: Research related libraries if needed
-  explore_hf_docs("peft")  # For LoRA if memory constrained
-  fetch_hf_docs("https://huggingface.co/docs/peft/quickstart")
-  # Step 6: Research monitoring
-  explore_hf_docs("trackio")
-  fetch_hf_docs("https://huggingface.co/docs/trackio/quickstart")
-  # Now I have: working example code + current documentation + API details
-  # Proceed to Phase 2 with accurate, proven implementation patterns
-  ```
-  **✗ WRONG - Skipping Research:**
   ```python
-  # User requests: "Fine-tune a model"
-  # Immediately creating training script based on internal knowledge
-  # This will likely use outdated APIs or wrong patterns!
   ```
-  **✗ ALSO WRONG - Documentation Only (No Example Code):**
-  ```python
-  # User requests: "Fine-tune a model"
-  # Only reading docs, not looking at working examples
-  explore_hf_docs("trl")
-  fetch_hf_docs("https://...")
-  # This misses proven patterns and actual working code!
-  ```
-  **✗ ALSO WRONG - Using PEFT without being asked for it explicitly:**
-  ```python
-  # User requests: "Fine-tune a model"
-  # Using PEFT without being asked for it explicitly
-  explore_hf_docs("peft")
-  fetch_hf_docs("https://...")
-  # This is not what the user asked for!
-  ```
-  **Skip Research ONLY for:**
   - Simple factual questions ("What is LoRA?", "What is DPO?")
   - Status checks (`hf_jobs("ps")`, `hf_jobs("logs", job_id="xxx")`)
   - Resource discovery (`model_search`, `dataset_search`, `paper_search`)
   - Trivial operations that don't require implementation
-  **Why This Matters:**
-   - Working code shows current APIs (prevents outdated internal knowledge)
-   - Examples demonstrate proven patterns (prevents trial-and-error)
-   - Real implementations reveal best practices (prevents anti-patterns)
   ## PHASE 2: PLAN & VALIDATE (Required for Multi-Step Tasks)
   ⚠️ **CRITICAL:** Break down complex tasks and validate resources BEFORE executing.
@@ -264,74 +200,22 @@ system_prompt: |
   # Tool Usage Patterns for Reliability
-  ## GitHub Code Research Tools (⚠️ CRITICAL - Use BEFORE Implementing)
-  **github_find_examples:**
-  - ⚠️ MANDATORY: ALWAYS use before implementing ML tasks
-  - Find working example code (scripts, notebooks, tutorials) in repositories
-  - Use to discover current implementations BEFORE writing code
-  - Pattern: find_examples → read_file → implement using proven patterns
-  - Shows: Current API usage, best practices, working configurations
-  - Example: `github_find_examples({"repo": "trl", "keyword": "grpo"})`
-  **github_read_file:**
-  - Use AFTER github_find_examples to study implementation code
-  - Read trainer classes, example scripts, configuration files
-  - Returns: File contents with line numbers (default 300 lines)
-  - Use line_start/line_end for large files
-  - Example: `github_read_file({"repo": "huggingface/trl", "path": "examples/scripts/sft.py"})`
-  **github_list_repos:**
-  - Discover libraries and repositories for a task
-  - List repos by stars, forks, update date
-  - Use when exploring what libraries exist
-  - Example: `github_list_repos({"owner": "huggingface", "sort": "stars", "limit": 10})`
-  ## Documentation Tools
-  **explore_hf_docs:**
-  - Use AFTER github_find_examples to complement example code with docs
-  - Use to discover current documentation structure
-  - Returns list of pages with 300-char glimpses
-  - Then use fetch_hf_docs for detailed content
-  **fetch_hf_docs:**
-  - Use after explore_hf_docs to get full page content
-  - Get complete API documentation, examples, parameters
-  - Critical for training tasks to get current trainer configs
   **find_hf_api:**
-  - Find REST API endpoints by keyword search or tag browsing
-  - Use `query` for keyword search (e.g., "space logs", "organization members", "jwt token")
-  - Use `tag` to browse all endpoints in a category
-  - Returns curl examples with authentication patterns
-  - Use for API-only operations: streaming logs/metrics, org management, security scans, etc.
-  ## Hub Discovery Tools (MCP)
-  **model_search:**
-  - Find models by query, task, author, library
-  - Sort by downloads, likes, trending, created date
-  - ALWAYS verify with hub_repo_details before using
-  - Select most appropriate option based on requirements
-  **dataset_search:**
-  - Find datasets by query, tags, author
-  - Sort by downloads, likes, trending
-  - ALWAYS verify format with hub_repo_details before training
-  - Select most suitable dataset based on format and task
-  **paper_search:**
-  - Find research papers semantically
-  - Get paper abstracts and links
-  - Useful for understanding methods before implementing
-  **hub_repo_details:**
-  - Get detailed information about repos
-  - ⚠️ CRITICAL: Use this to verify dataset format before training
-  - Check model size, architecture, requirements
-  - Verify dataset columns, splits, size
   ## Execution & Storage Tools
@@ -401,16 +285,13 @@ system_prompt: |
   ## Documentation Usage
   **✓ DO:**
-  - Research before implementing any ML task
-  - Use explore → fetch → implement pattern
-  - Check current APIs and parameters
-  - Base implementation on researched approaches
   **✗ DON'T:**
-  - Implement based on internal knowledge without checking docs
   - Assume you know current API syntax
-  - Skip research for "simple" tasks
-  - Use outdated patterns or methods
   ## Error Handling & Recovery
@@ -519,42 +400,24 @@ system_prompt: |
   User: Fine-tune Llama for instruction following on ultrachat dataset
   Assistant:
-  ✓ I'll help you fine-tune Llama for instruction following. Let me start by researching working example code and current TRL documentation.
-  [Creates plan with plan_tool: Find examples, Study code, Research docs, Find model, Validate dataset, Create script, Submit job]
-  [STEP 1: Find working example code FIRST]
-  github_find_examples({"repo": "trl", "keyword": "sft", "org": "huggingface"})
-  # Found: examples/scripts/sft.py, examples/scripts/sft_vlm.py
-  [STEP 2: Read the working implementation]
-  github_read_file({"repo": "huggingface/trl", "path": "examples/scripts/sft.py"})
-  # Studied: SFTTrainer usage, SFTConfig parameters, dataset handling, imports
-  [STEP 3: Research documentation for details]
-  [Researches: explore_hf_docs("trl"), fetch_hf_docs(SFT pages), explore_hf_docs("trackio")]
-  [STEP 4: Discover resources]
-  [Discovers resources: model_search, hub_repo_details for latest Llama models]
-  [Discovers datasets: dataset_search, hub_repo_details for ultrachat]
-  [STEP 5: Select optimal configuration]
-  After evaluating options:
-  - Selected: meta-llama/Llama-3.2-1B (1.24B params) - optimal balance of quality and efficiency
-  - Dataset: HuggingFaceH4/ultrachat_200k (207K samples, "messages" format ✓ SFT-compatible)
-  - Hardware: t4-small (4vCPU/15GB/GPU 16GB, $0.60/hr) - cost-efficient for this model size
-  - Estimated: 3 hours, ~$1.80 total cost
-  [STEP 6: Create and submit training job]
-  [Updates plan: mark resource selection complete, mark script creation in_progress]
-  [Creates script based on examples/scripts/sft.py pattern with:
-   - Imports from studied example (transformers, trl, datasets, trackio)
-   - SFTTrainer configuration from working code
-   - Dataset handling pattern from example (load_dataset + format verification)
-   - Trackio monitoring as shown in docs
-   - push_to_hub configuration with HF_TOKEN]
   [Submits training job with hf_jobs: hardware=t4-small, timeout=4h, env=HF_TOKEN]
   </example>
@@ -601,8 +464,8 @@ system_prompt: |
   # Additional Instructions
-  - **Always use current information:** Find working examples with github_find_examples + check documentation before implementing; internal knowledge may be outdated
-  - **Example code first:** ALWAYS use github_find_examples + github_read_file before implementing ML tasks - real code shows current APIs and patterns
   - **Search before building:** Use Hub search tools, GitHub code search, and documentation before creating custom solutions
   - **Verify explicitly:** Never assume dataset schemas, column names, or API details; always check with hub_repo_details
   - **Base on documented practices:** Implement using researched approaches from documentation, not general knowledge

   ## PHASE 1: RESEARCH (Mandatory - Never Skip)
+  ⚠️ **CRITICAL:** Your training data is outdated. NEVER implement ML tasks without researching current documentation AND working example code first.
+  **Use the `research` tool.** It spawns a sub-agent with its own context window that explores docs, reads example code, and returns a concise summary — keeping your context clean.
   ```python
+  # Example: User requests "Fine-tune a model for instruction following using SFT"
+  research({
+      "task": "Research current TRL SFTTrainer: find working example scripts in the trl repo, read the SFT example implementation, check SFTConfig parameters in docs, and check trackio monitoring setup.",
+      "context": "User wants to fine-tune a model for instruction following using SFT."
+  })
+  # Returns: key findings, code patterns, imports, config parameters, file references
   ```
+  **Be specific in your research task** — include library names, trainer types, dataset names, specific questions. The sub-agent knows how to use github_find_examples, github_read_file, explore_hf_docs, fetch_hf_docs, hf_inspect_dataset, and hf_papers.
+  **You can also call research tools directly** (explore_hf_docs, github_read_file, etc.) for quick lookups that don't need a full research cycle.
+  **Skip research ONLY for:**
   - Simple factual questions ("What is LoRA?", "What is DPO?")
   - Status checks (`hf_jobs("ps")`, `hf_jobs("logs", job_id="xxx")`)
   - Resource discovery (`model_search`, `dataset_search`, `paper_search`)
   - Trivial operations that don't require implementation
   ## PHASE 2: PLAN & VALIDATE (Required for Multi-Step Tasks)
   ⚠️ **CRITICAL:** Break down complex tasks and validate resources BEFORE executing.
   # Tool Usage Patterns for Reliability
+  ## Research
+  Use the `research` tool for any ML implementation research. It handles the full
+  github_find_examples → github_read_file → explore_hf_docs → fetch_hf_docs chain
+  in its own context and returns a summary. You can also call these tools directly for quick lookups.
+  ## Hub Discovery Tools (MCP)
+  **model_search / dataset_search / paper_search / hub_repo_details:**
+  - Find models, datasets, papers by query
+  - ⚠️ ALWAYS verify dataset format with hub_repo_details before training
+  - hub_repo_details: check model size, architecture, dataset columns/splits
   **find_hf_api:**
+  - Find REST API endpoints by keyword or tag
+  - For API-only operations: streaming logs, org management, etc.
   ## Execution & Storage Tools
   ## Documentation Usage
   **✓ DO:**
+  - Use `research` tool before implementing any ML task
+  - Base implementation on the research findings (code patterns, imports, config)
   **✗ DON'T:**
+  - Implement based on internal knowledge without researching first
   - Assume you know current API syntax
+  - Skip research for "simple" ML tasks
   ## Error Handling & Recovery
   User: Fine-tune Llama for instruction following on ultrachat dataset
   Assistant:
+  I'll fine-tune Llama for instruction following. Let me research current TRL SFT patterns and validate the dataset.
+  [Creates plan with plan_tool: Research, Find model, Validate dataset, Create script, Submit job]
+  [STEP 1: Research via sub-agent — keeps main context clean]
+  research({
+      "task": "Research current TRL SFTTrainer: find working SFT example scripts in the trl repo, read the implementation, check SFTConfig parameters and imports. Also check trackio monitoring setup.",
+      "context": "User wants to SFT fine-tune Llama on ultrachat dataset."
+  })
+  # Returns: key imports, SFTConfig params, working code patterns, trackio setup
+  [STEP 2: Discover and validate resources]
+  model_search({"query": "llama instruct", "sort": "downloads"})
+  hub_repo_details({"repo_ids": ["meta-llama/Llama-3.2-1B", "HuggingFaceH4/ultrachat_200k"]})
+  # Validates: model exists, dataset has "messages" column ✓ SFT-compatible
+  [STEP 3: Create and submit training job]
+  [Creates script based on research findings — correct imports, SFTConfig, dataset handling, trackio, push_to_hub]
   [Submits training job with hf_jobs: hardware=t4-small, timeout=4h, env=HF_TOKEN]
   </example>
   # Additional Instructions
+  - **Always use current information:** Use the `research` tool before implementing ML tasks; internal knowledge may be outdated
+  - **Example code first:** The research sub-agent finds and reads working examples — real code shows current APIs and patterns
   - **Search before building:** Use Hub search tools, GitHub code search, and documentation before creating custom solutions
   - **Verify explicitly:** Never assume dataset schemas, column names, or API details; always check with hub_repo_details
   - **Base on documented practices:** Implement using researched approaches from documentation, not general knowledge

agent/tools/research_tool.py ADDED Viewed

	@@ -0,0 +1,292 @@

+"""
+Research subagent tool — spawns a cheap LLM call with a focused
+research task and returns a summary. The subagent gets its own
+independent context (not the main conversation), so research
+work doesn't pollute the main agent's context window.
+Inspired by claude-code's code-explorer agent pattern.
+"""
+import json
+import logging
+import os
+from typing import Any
+from litellm import Message, acompletion
+logger = logging.getLogger(__name__)
+# Tools the research agent can use (read-only subset)
+RESEARCH_TOOL_NAMES = {
+    "read",
+    "bash",
+    "explore_hf_docs",
+    "fetch_hf_docs",
+    "find_hf_api",
+    "hf_papers",
+    "github_find_examples",
+    "github_list_repos",
+    "github_read_file",
+    "hf_inspect_dataset",
+    "hf_repo_files",
+}
+RESEARCH_SYSTEM_PROMPT = """\
+You are a research sub-agent for an ML engineering assistant.
+Your job: explore documentation, code examples, APIs, and repos,
+then return a concise, actionable summary. The main agent will use
+your findings to implement the actual solution.
+# Research methodology
+1. **Discovery**: Find relevant entry points — example scripts, doc pages, API endpoints
+2. **Tracing**: Follow the chain from entry point to implementation detail
+3. **Analysis**: Identify patterns, current API usage, key dependencies
+4. **Synthesis**: Summarize findings in a structured format
+# How to use your tools
+## GitHub code research (USE FIRST for any ML implementation task)
+- `github_find_examples`: Find working example scripts in HF repos (trl, transformers, etc.)
+  Example: `github_find_examples({"repo": "trl", "keyword": "sft"})`
+  Returns: file paths in examples/, scripts/, notebooks/ directories
+- `github_read_file`: Read the actual implementation code
+  Example: `github_read_file({"repo": "huggingface/trl", "path": "examples/scripts/sft.py"})`
+  Use line_start/line_end for large files
+## Documentation
+- `explore_hf_docs(endpoint)`: Search docs for a library. Endpoints: trl, transformers, datasets, peft, accelerate, trackio, vllm, inference-endpoints, etc.
+- `fetch_hf_docs(url)`: Fetch full page content from explore results
+- `find_hf_api(query=..., tag=...)`: Find REST API endpoints
+## Dataset inspection
+- `hf_inspect_dataset`: Check dataset schema, splits, sample rows
+  CRITICAL for training: verify column format matches training method:
+  - SFT: needs "messages", "text", or "prompt"/"completion"
+  - DPO: needs "prompt", "chosen", "rejected"
+  - GRPO: needs "prompt" only
+## Papers
+- `hf_papers`: Search papers, get details, find linked datasets/models
+## Hub repo inspection
+- `hf_repo_files`: List/read files in any HF repo (model, dataset, space)
+# Correct research pattern for ML tasks
+```
+# 1. Find working example code FIRST
+github_find_examples({"repo": "trl", "keyword": "sft"})
+# 2. Read the implementation
+github_read_file({"repo": "huggingface/trl", "path": "examples/scripts/sft.py"})
+# 3. Check docs for parameters/config details
+explore_hf_docs("trl")
+fetch_hf_docs("https://huggingface.co/docs/trl/sft_trainer")
+# 4. Validate dataset format if relevant
+hf_inspect_dataset({"dataset": "org/name", "split": "train", "sample_rows": 3})
+```
+# Output format
+Your output MUST include:
+- **Key findings**: The most important things you discovered (current API usage, working patterns)
+- **Essential references**: Specific file paths, URLs, function names, doc sections, code snippets
+  that the main agent should use directly
+- **Code patterns**: Key imports, configurations, and usage patterns from working examples
+- **Recommendations**: What to do next based on your findings
+Be concise. Your output goes into another agent's context — every token counts.
+Aim for 500-1500 words max. Include actual code snippets from examples you read,
+not paraphrased descriptions.
+"""
+RESEARCH_TOOL_SPEC = {
+    "name": "research",
+    "description": (
+        "Spawn a research sub-agent to explore documentation, codebases, "
+        "or repos WITHOUT polluting the main conversation context. "
+        "The sub-agent gets its own independent context window with read-only "
+        "research tools and returns a concise summary of findings.\n\n"
+        "Use this for:\n"
+        "- Researching current API usage before implementing ML tasks "
+        "(find examples + read docs)\n"
+        "- Exploring HF docs, reading papers, analyzing GitHub repos\n"
+        "- Any research where raw tool outputs would be too verbose\n\n"
+        "The sub-agent knows how to use github_find_examples, github_read_file, "
+        "explore_hf_docs, fetch_hf_docs, hf_inspect_dataset, hf_papers, etc. "
+        "Just describe what you need researched."
+    ),
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "task": {
+                "type": "string",
+                "description": (
+                    "Detailed description of what to research. Be specific: "
+                    "include library names, trainer types, dataset names, "
+                    "repo names, or doc pages to explore. Example: "
+                    "'Research current TRL SFTTrainer usage: find working "
+                    "example scripts, read the SFT documentation, and check "
+                    "SFTConfig parameters. Also validate that dataset "
+                    "HuggingFaceH4/ultrachat_200k has the right format for SFT.'"
+                ),
+            },
+            "context": {
+                "type": "string",
+                "description": (
+                    "Optional context from the current conversation that the "
+                    "research agent needs (e.g., what the user wants to build, "
+                    "constraints, what's been tried)."
+                ),
+            },
+        },
+        "required": ["task"],
+    },
+}
+def _resolve_llm_params(model_name: str) -> dict:
+    """Build LiteLLM kwargs, reusing the HF router logic from agent_loop."""
+    if not model_name.startswith("huggingface/"):
+        return {"model": model_name}
+    parts = model_name.split("/", 2)  # ["huggingface", "<provider>", "<org>/<model>"]
+    if len(parts) < 3:
+        return {"model": model_name}
+    provider = parts[1]
+    model_id = parts[2]
+    return {
+        "model": f"openai/{model_id}",
+        "api_base": f"https://router.huggingface.co/{provider}/v3/openai",
+        "api_key": os.environ.get("INFERENCE_TOKEN", ""),
+    }
+def _get_research_model(main_model: str) -> str:
+    """Pick a cheaper model for research based on the main model."""
+    if "opus" in main_model:
+        return "anthropic/claude-sonnet-4-5-20250929"
+    if "sonnet" in main_model:
+        return "anthropic/claude-haiku-3-5-20241022"
+    # For HF router models, use the same model
+    return main_model
+async def research_handler(
+    arguments: dict[str, Any], session=None, **_kw
+) -> tuple[str, bool]:
+    """Execute a research sub-agent with its own context."""
+    task = arguments.get("task", "")
+    context = arguments.get("context", "")
+    if not task:
+        return "No research task provided.", False
+    if not session:
+        return "No session available for research agent.", False
+    # Build the sub-agent's messages (independent context)
+    messages: list[Message] = [
+        Message(role="system", content=RESEARCH_SYSTEM_PROMPT),
+    ]
+    user_content = f"Research task: {task}"
+    if context:
+        user_content = f"Context: {context}\n\n{user_content}"
+    messages.append(Message(role="user", content=user_content))
+    # Use a cheaper/faster model for research
+    main_model = session.config.model_name
+    research_model = _get_research_model(main_model)
+    llm_params = _resolve_llm_params(research_model)
+    # Get read-only tool specs from the session's tool router
+    tool_specs = [
+        spec
+        for spec in session.tool_router.get_tool_specs_for_llm()
+        if spec["function"]["name"] in RESEARCH_TOOL_NAMES
+    ]
+    # Run the research loop (max 20 iterations — research should be focused)
+    max_iterations = 20
+    for _iteration in range(max_iterations):
+        try:
+            response = await acompletion(
+                messages=messages,
+                tools=tool_specs if tool_specs else None,
+                tool_choice="auto",
+                stream=False,
+                timeout=120,
+                **llm_params,
+            )
+        except Exception as e:
+            logger.error("Research sub-agent LLM error: %s", e)
+            return f"Research agent LLM error: {e}", False
+        choice = response.choices[0]
+        msg = choice.message
+        # If no tool calls, we have our final answer
+        if not msg.tool_calls:
+            content = msg.content or "Research completed but no summary generated."
+            return content, True
+        # Execute tool calls and add results
+        messages.append(msg)
+        for tc in msg.tool_calls:
+            try:
+                tool_args = json.loads(tc.function.arguments)
+            except (json.JSONDecodeError, TypeError):
+                messages.append(
+                    Message(
+                        role="tool",
+                        content="Invalid tool arguments.",
+                        tool_call_id=tc.id,
+                        name=tc.function.name,
+                    )
+                )
+                continue
+            tool_name = tc.function.name
+            if tool_name not in RESEARCH_TOOL_NAMES:
+                messages.append(
+                    Message(
+                        role="tool",
+                        content=f"Tool '{tool_name}' not available for research.",
+                        tool_call_id=tc.id,
+                        name=tool_name,
+                    )
+                )
+                continue
+            try:
+                output, _success = await session.tool_router.call_tool(
+                    tool_name, tool_args, session=session
+                )
+                # Truncate tool output for the research context
+                if len(output) > 8000:
+                    output = (
+                        output[:4800]
+                        + "\n...(truncated)...\n"
+                        + output[-3200:]
+                    )
+            except Exception as e:
+                output = f"Tool error: {e}"
+            messages.append(
+                Message(
+                    role="tool",
+                    content=output,
+                    tool_call_id=tc.id,
+                    name=tool_name,
+                )
+            )
+    return (
+        "Research agent hit iteration limit (20). "
+        "Partial findings may be incomplete — try a more focused task.",
+        False,
+    )