Spaces:
Sleeping
Sleeping
| # WhipStudio Debugging Tools Guide | |
| This guide explains how to use WhipStudio's debugging tools effectively. | |
| ## Overview | |
| WhipStudio provides 6 tools for iterative debugging: | |
| | Tool | Purpose | When to Use | | |
| |------|---------|-------------| | |
| | `execute_snippet` | Run quick code tests | Verify imports, check versions, test small fixes | | |
| | `inspect_tensor` | Examine tensor properties | Debug shape mismatches, gradient issues, NaN/Inf | | |
| | `run_training_probe` | Test training loop | Verify loss decreases, check gradient flow | | |
| | `get_variable_state` | Inspect multiple values | Check model state, optimizer config, data properties | | |
| | `inspect_diff` | Preview your changes | Review before submission, catch mistakes | | |
| | `submit_fix` | Submit final solution | When confident in your fix | | |
| ## Tool Usage Workflow | |
| ### Recommended Debugging Strategy | |
| ``` | |
| 1. Analyze buggy code (read carefully) | |
| ↓ | |
| 2. Form hypothesis about bug(s) | |
| ↓ | |
| 3. Use tools to verify hypothesis | |
| ├── execute_snippet: Test specific behavior | |
| ├── inspect_tensor: Check shapes/gradients | |
| └── get_variable_state: Check configuration | |
| ↓ | |
| 4. Develop fix based on findings | |
| ↓ | |
| 5. run_training_probe: Test if fix works | |
| ↓ | |
| 6. inspect_diff: Review your changes | |
| ↓ | |
| 7. submit_fix: Submit when confident | |
| ``` | |
| --- | |
| ## Tool Details | |
| ### 1. execute_snippet | |
| Run a short Python code snippet to test specific behaviors. | |
| **Best for:** | |
| - Testing if specific code runs without error | |
| - Checking library versions and availability | |
| - Verifying small code transformations | |
| - Quick experiments | |
| **Example:** | |
| ```python | |
| action = { | |
| "action_type": "execute_snippet", | |
| "code": """ | |
| import torch | |
| import torch.nn as nn | |
| # Test if softmax + log is the issue | |
| pred = torch.tensor([0.0, 1.0]) | |
| print("log(0):", torch.log(pred[0])) # Should be -inf | |
| print("log(1):", torch.log(pred[1])) # Should be 0 | |
| # Test fix: clamp before log | |
| pred_safe = pred.clamp(min=1e-7) | |
| print("log(clamped 0):", torch.log(pred_safe[0])) | |
| """ | |
| } | |
| ``` | |
| **Returns:** | |
| - `stdout`: Printed output | |
| - `stderr`: Error messages | |
| - `exit_code`: 0 for success, non-zero for errors | |
| - `timed_out`: True if execution exceeded 30 seconds | |
| --- | |
| ### 2. inspect_tensor | |
| Examine a tensor's properties in detail. | |
| **Best for:** | |
| - Debugging shape mismatches ("Expected [N, 10] got [N, 10, 1]") | |
| - Checking gradient flow (is grad None? is requires_grad set?) | |
| - Finding NaN/Inf values in tensors | |
| - Verifying data types | |
| **Example:** | |
| ```python | |
| action = { | |
| "action_type": "inspect_tensor", | |
| "setup_code": """ | |
| import torch | |
| import torch.nn as nn | |
| # Simulate the training setup | |
| model = nn.Linear(10, 2) | |
| x = torch.randn(32, 10) | |
| y = model(x) | |
| loss = y.sum() | |
| loss.backward() | |
| """, | |
| "target_expression": "model.weight.grad" | |
| } | |
| ``` | |
| **Returns:** | |
| - `shape`: List of dimensions, e.g., `[2, 10]` | |
| - `dtype`: Data type, e.g., `"torch.float32"` | |
| - `requires_grad`: Whether gradients are tracked | |
| - `grad_is_none`: True if `.grad` is None (no backward pass) | |
| - `min_val`, `max_val`, `mean_val`: Statistics | |
| - `is_nan`, `is_inf`: True if any NaN/Inf values found | |
| **Pro Tips:** | |
| - Check `grad_is_none: true` → backward() wasn't called or requires_grad=False | |
| - Check `is_nan: true` → numerical instability (log(0), div by 0, etc.) | |
| - Check shape mismatches between layers | |
| --- | |
| ### 3. run_training_probe | |
| Run a few training steps to observe the loss curve and gradients. | |
| **Best for:** | |
| - Verifying that loss decreases (training works) | |
| - Checking if gradients flow to all layers | |
| - Testing a potential fix before submission | |
| - Detecting exploding/vanishing gradients | |
| **Example:** | |
| ```python | |
| action = { | |
| "action_type": "run_training_probe", | |
| "code": """ | |
| import torch | |
| import torch.nn as nn | |
| torch.manual_seed(42) | |
| model = nn.Linear(10, 2) | |
| optimizer = torch.optim.Adam(model.parameters(), lr=0.01) | |
| criterion = nn.CrossEntropyLoss() | |
| X = torch.randn(100, 10) | |
| y = torch.randint(0, 2, (100,)) | |
| losses = [] | |
| for epoch in range(10): | |
| optimizer.zero_grad() | |
| out = model(X) | |
| loss = criterion(out, y) | |
| loss.backward() | |
| optimizer.step() | |
| losses.append(loss.item()) | |
| print(f"LOSSES:{losses}") | |
| """, | |
| "steps": 5 # Will capture first 5 steps | |
| } | |
| ``` | |
| **Returns:** | |
| - `losses`: List of loss values per step | |
| - `grad_norms`: Dict of layer name → gradient norm | |
| - `optimizer_param_count`: Number of parameters in optimizer | |
| - `final_loss`: Last loss value | |
| - `loss_is_nan`, `loss_is_inf`: True if loss became NaN/Inf | |
| - `timed_out`: True if exceeded timeout | |
| **Pro Tips:** | |
| - If `losses` are flat or increasing → fix not working | |
| - If `loss_is_nan` → numerical instability remains | |
| - If `grad_norms` has zeros → frozen layers or detached tensors | |
| - Compare grad_norms between layers to find problems | |
| --- | |
| ### 4. get_variable_state | |
| Evaluate multiple expressions and see their values. | |
| **Best for:** | |
| - Checking model configuration (training mode, layer count) | |
| - Inspecting optimizer settings (learning rate, param groups) | |
| - Verifying data shapes and types | |
| - Debugging complex state | |
| **Example:** | |
| ```python | |
| action = { | |
| "action_type": "get_variable_state", | |
| "setup_code": """ | |
| import torch | |
| import torch.nn as nn | |
| model = nn.Sequential( | |
| nn.Linear(10, 32), | |
| nn.ReLU(), | |
| nn.Linear(32, 2) | |
| ) | |
| model[0].requires_grad_(False) # Freeze first layer | |
| optimizer = torch.optim.Adam(model.parameters(), lr=0.01) | |
| """, | |
| "expressions": [ | |
| "model.training", | |
| "model[0].weight.requires_grad", | |
| "model[2].weight.requires_grad", | |
| "optimizer.param_groups[0]['lr']", | |
| "len(list(model.parameters()))", | |
| "sum(p.numel() for p in model.parameters() if p.requires_grad)" | |
| ] | |
| } | |
| ``` | |
| **Returns:** | |
| - `results`: Dict mapping expression → result info | |
| - `repr`: String representation | |
| - `type`: Python type name | |
| - `value`: Actual value (for scalars) | |
| - `shape`: Shape (for tensors/arrays) | |
| - `error`: Error message if evaluation failed | |
| **Pro Tips:** | |
| - Check `model.training` → should be True during training | |
| - Check `requires_grad` on layers you expect to train | |
| - Verify `lr` is reasonable (not 10.0, not 1e-10) | |
| - Count trainable params vs total params | |
| --- | |
| ### 5. inspect_diff | |
| Compare your proposed fix against the original buggy code. | |
| **Best for:** | |
| - Reviewing your changes before submission | |
| - Catching unintended modifications | |
| - Verifying you fixed all identified bugs | |
| - Counting lines changed | |
| **Example:** | |
| ```python | |
| action = { | |
| "action_type": "inspect_diff", | |
| "proposed_code": """ | |
| import torch | |
| import torch.nn as nn | |
| # Fixed: Changed lr from 10.0 to 0.01 | |
| optimizer = torch.optim.Adam(model.parameters(), lr=0.01) | |
| # Fixed: Correct order - backward before step | |
| loss.backward() | |
| optimizer.step() | |
| """ | |
| } | |
| ``` | |
| **Returns:** | |
| - `diff`: Unified diff format (like `git diff`) | |
| - `lines_changed`: Total lines modified | |
| - `additions`: Lines added (prefixed with +) | |
| - `deletions`: Lines removed (prefixed with -) | |
| **Pro Tips:** | |
| - Review diff for unintended changes (typos, removed seed) | |
| - Verify all bug fixes are visible in diff | |
| - Keep changes minimal - don't refactor unrelated code | |
| --- | |
| ### 6. submit_fix | |
| Submit your final solution for grading. | |
| **This is a terminal action** - after calling this, the episode ends. | |
| **Example:** | |
| ```python | |
| action = { | |
| "action_type": "submit_fix", | |
| "fixed_code": """ | |
| import torch | |
| import torch.nn as nn | |
| torch.manual_seed(42) | |
| # Complete fixed training script... | |
| # Must print LOSSES:[v1, v2, ...] | |
| # For some tasks: VAL_ACC:X.XX | |
| """, | |
| "explanation": "Fixed two bugs: 1) Changed lr from 10.0 to 0.01, 2) Moved step() after backward()" | |
| } | |
| ``` | |
| **Returns:** | |
| - `reward`: Score from 0.0 to 1.0 | |
| - `episode_done`: Always True | |
| - `error_log`: stdout/stderr from execution | |
| - `grader_details`: Task-specific grading info | |
| --- | |
| ## Common Debugging Patterns | |
| ### Pattern 1: Shape Mismatch Debugging | |
| ```python | |
| # Step 1: Check input shapes | |
| action1 = { | |
| "action_type": "get_variable_state", | |
| "setup_code": buggy_code, | |
| "expressions": ["X.shape", "y.shape", "model(X[:1]).shape"] | |
| } | |
| # Step 2: Inspect specific layer | |
| action2 = { | |
| "action_type": "inspect_tensor", | |
| "setup_code": buggy_code, | |
| "target_expression": "model.fc.weight" | |
| } | |
| ``` | |
| ### Pattern 2: Gradient Flow Debugging | |
| ```python | |
| # Step 1: Check if gradients exist | |
| action1 = { | |
| "action_type": "run_training_probe", | |
| "code": buggy_code, | |
| "steps": 3 | |
| } | |
| # Look at grad_norms - any zeros? | |
| # Step 2: Check specific layer | |
| action2 = { | |
| "action_type": "inspect_tensor", | |
| "setup_code": buggy_code + "\nloss.backward()", | |
| "target_expression": "backbone[0].weight.grad" | |
| } | |
| ``` | |
| ### Pattern 3: NaN Loss Debugging | |
| ```python | |
| # Step 1: Find where NaN appears | |
| action1 = { | |
| "action_type": "execute_snippet", | |
| "code": """ | |
| import torch | |
| pred = torch.tensor([0.0, 0.5, 1.0]) | |
| print("log(pred):", torch.log(pred)) | |
| print("Any NaN?:", torch.isnan(torch.log(pred)).any()) | |
| """ | |
| } | |
| # Step 2: Test fix | |
| action2 = { | |
| "action_type": "execute_snippet", | |
| "code": """ | |
| import torch | |
| pred = torch.tensor([0.0, 0.5, 1.0]) | |
| pred_safe = pred.clamp(min=1e-7) | |
| print("log(pred_safe):", torch.log(pred_safe)) | |
| print("Any NaN?:", torch.isnan(torch.log(pred_safe)).any()) | |
| """ | |
| } | |
| ``` | |
| ### Pattern 4: Loss Function Debugging | |
| ```python | |
| # Check what loss function expects vs what model outputs | |
| action = { | |
| "action_type": "get_variable_state", | |
| "setup_code": buggy_code, | |
| "expressions": [ | |
| "criterion", # What loss is being used | |
| "out.shape", # Model output shape | |
| "y.shape", # Label shape | |
| "y.dtype", # Label type (long vs float) | |
| "y[:3]" # Sample labels | |
| ] | |
| } | |
| ``` | |
| --- | |
| ## Tips for Efficient Tool Use | |
| 1. **Start broad, then narrow**: Use `get_variable_state` first to understand the code, then `inspect_tensor` for specific issues. | |
| 2. **Limit turns**: You have max 10 turns per episode. Plan your debugging strategy. | |
| 3. **Test fixes early**: Use `run_training_probe` with `steps=2-3` to quickly verify if a fix works. | |
| 4. **Always inspect_diff**: Before `submit_fix`, always review your changes. | |
| 5. **Read error messages**: Tool outputs include stderr - read it carefully. | |
| 6. **Keep setup_code minimal**: Don't include the entire script - just what's needed to evaluate the expression. | |
| 7. **Use multiple expressions**: `get_variable_state` can evaluate up to 10 expressions at once - use it! | |
| --- | |
| ## Security Restrictions | |
| Tools run in a sandboxed environment with these restrictions: | |
| **Allowed imports:** | |
| - torch, torch.nn, torch.optim, torch.utils.data | |
| - numpy, sklearn, pandas, matplotlib, scipy | |
| - math, random, os (read-only), sys | |
| - collections, itertools, functools | |
| - json, re, typing, copy, dataclasses | |
| **Blocked imports:** | |
| - socket, requests, httpx, urllib (no network) | |
| - subprocess, shutil (no shell access) | |
| **Other restrictions:** | |
| - 30 second timeout per tool call | |
| - File writes only to /tmp | |
| - No GPU access (CPU only) | |