Spaces:

Amogh-kal1
/

whipstudio

Sleeping

App Files Files Community

whipstudio / examples /README.md

Amogh-kal1

Upload folder using huggingface_hub

ffd85e1 verified about 1 month ago

preview code

raw

history blame contribute delete

4.66 kB

WhipStudio Examples

Example agent implementations for the WhipStudio ML Debugging environment.

Prerequisites

# Set environment variables
export HF_TOKEN="your_huggingface_token"
export API_BASE_URL="https://api-inference.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-Coder-32B-Instruct"

Available Examples

1. Simple Agent (`simple_agent.py`)

A minimal agent that directly generates and submits fixes without using tools. Good for understanding the basic API.

# Run on localhost
python examples/simple_agent.py --env-url http://localhost:7860

# Run on HF Space
python examples/simple_agent.py --env-url https://your-space.hf.space

# Run specific tasks
python examples/simple_agent.py --tasks task1 task2 task3 --max-attempts 3

Features:

Direct code submission
Multiple retry attempts
Uses LLM to generate fixes

2. Tool-Using Agent (`tool_agent.py`)

An advanced agent that uses debugging tools to iteratively analyze bugs before submitting a fix. Demonstrates the full tool-calling API.

# Run single task
python examples/tool_agent.py --env-url http://localhost:7860 --task task1

# Run all tasks with more turns
python examples/tool_agent.py --env-url http://localhost:7860 --all-tasks --max-turns 15

# Run a hard task
python examples/tool_agent.py --task task6 --max-turns 15

Features:

Uses execute_snippet to test hypotheses
Uses inspect_tensor to check shapes/gradients
Uses get_variable_state to inspect variables
Uses run_training_probe to test fixes
Uses inspect_diff to review changes
Iterative debugging before submission

Tool Usage Guide

execute_snippet

Run a quick Python snippet to test something:

action = {
    "action_type": "execute_snippet",
    "code": "import torch; print(torch.__version__)"
}

inspect_tensor

Check tensor properties:

action = {
    "action_type": "inspect_tensor",
    "setup_code": "import torch; t = torch.randn(3, 4, requires_grad=True)",
    "target_expression": "t"
}
# Returns: shape, dtype, requires_grad, grad_is_none, min/max/mean, is_nan, is_inf

get_variable_state

Evaluate multiple expressions:

action = {
    "action_type": "get_variable_state",
    "setup_code": "import torch; model = torch.nn.Linear(10, 2)",
    "expressions": ["model.weight.shape", "model.training", "list(model.parameters())"]
}

run_training_probe

Test if code trains properly:

action = {
    "action_type": "run_training_probe",
    "code": "import torch\n# Full training script",
    "steps": 5
}
# Returns: losses per step, gradient norms, optimizer param count

inspect_diff

Preview changes before submitting:

action = {
    "action_type": "inspect_diff",
    "proposed_code": "import torch\n# Your fixed code"
}
# Returns: unified diff, lines_changed, additions, deletions

submit_fix

Submit final solution:

action = {
    "action_type": "submit_fix",
    "fixed_code": "import torch\n# Complete fixed script"
}
# Returns: reward (0.0-1.0), episode_done=True

Debugging Tips

Start with inspection: Use get_variable_state to understand the code structure
Check shapes: Use inspect_tensor to verify tensor dimensions match
Test incrementally: Use run_training_probe with steps=2 to quickly verify fixes
Review before submit: Always use inspect_diff to catch mistakes
Handle errors gracefully: Tool outputs include error messages if execution fails

Expected Scores

Task	Difficulty	Expected Score (good fix)
task1	Easy	0.85-1.0
task2	Medium	0.75-1.0
task3	Hard	0.70-0.95
task4	Medium	0.75-1.0
task5	Medium	0.80-1.0
task6	Hard	0.65-0.95

API Reference

Reset Endpoint

POST /reset
{"task_id": "task1"}
# Returns observation with buggy_code, task_description

Step Endpoint

POST /step
{"action": {"action_type": "...", ...}}
# Returns observation, reward, done

Tools Endpoint

GET /tools
# Returns list of available tools with schemas

State Endpoint

GET /state
# Returns current session state (turn, task_id, submitted)

Environment Variables

Variable	Description	Default
`API_BASE_URL`	LLM API endpoint	`https://api-inference.huggingface.co/v1`
`MODEL_NAME`	Model identifier	`Qwen/Qwen2.5-Coder-32B-Instruct`
`HF_TOKEN`	HuggingFace token	Required
`OPENAI_API_KEY`	Alternative to HF_TOKEN	Optional