Spaces:

Amogh-kal1
/

whipstudio

Sleeping

App Files Files Community

whipstudio / docs /TOOLS.md

Amogh-kal1

Upload folder using huggingface_hub

ffd85e1 verified about 1 month ago

preview code

raw

history blame contribute delete

10.9 kB

	# WhipStudio Debugging Tools Guide

	This guide explains how to use WhipStudio's debugging tools effectively.

	## Overview

	WhipStudio provides 6 tools for iterative debugging:

	\| Tool \| Purpose \| When to Use \|
	\|------\|---------\|-------------\|
	\| `execute_snippet` \| Run quick code tests \| Verify imports, check versions, test small fixes \|
	\| `inspect_tensor` \| Examine tensor properties \| Debug shape mismatches, gradient issues, NaN/Inf \|
	\| `run_training_probe` \| Test training loop \| Verify loss decreases, check gradient flow \|
	\| `get_variable_state` \| Inspect multiple values \| Check model state, optimizer config, data properties \|
	\| `inspect_diff` \| Preview your changes \| Review before submission, catch mistakes \|
	\| `submit_fix` \| Submit final solution \| When confident in your fix \|

	## Tool Usage Workflow

	### Recommended Debugging Strategy

	```
	1. Analyze buggy code (read carefully)
	↓
	2. Form hypothesis about bug(s)
	↓
	3. Use tools to verify hypothesis
	├── execute_snippet: Test specific behavior
	├── inspect_tensor: Check shapes/gradients
	└── get_variable_state: Check configuration
	↓
	4. Develop fix based on findings
	↓
	5. run_training_probe: Test if fix works
	↓
	6. inspect_diff: Review your changes
	↓
	7. submit_fix: Submit when confident
	```

	---

	## Tool Details

	### 1. execute_snippet

	Run a short Python code snippet to test specific behaviors.

	Best for:
	- Testing if specific code runs without error
	- Checking library versions and availability
	- Verifying small code transformations
	- Quick experiments

	Example:
	```python
	action = {
	"action_type": "execute_snippet",
	"code": """
	import torch
	import torch.nn as nn

	# Test if softmax + log is the issue
	pred = torch.tensor([0.0, 1.0])
	print("log(0):", torch.log(pred[0])) # Should be -inf
	print("log(1):", torch.log(pred[1])) # Should be 0

	# Test fix: clamp before log
	pred_safe = pred.clamp(min=1e-7)
	print("log(clamped 0):", torch.log(pred_safe[0]))
	"""
	}
	```

	Returns:
	- `stdout`: Printed output
	- `stderr`: Error messages
	- `exit_code`: 0 for success, non-zero for errors
	- `timed_out`: True if execution exceeded 30 seconds

	---

	### 2. inspect_tensor

	Examine a tensor's properties in detail.

	Best for:
	- Debugging shape mismatches ("Expected [N, 10] got [N, 10, 1]")
	- Checking gradient flow (is grad None? is requires_grad set?)
	- Finding NaN/Inf values in tensors
	- Verifying data types

	Example:
	```python
	action = {
	"action_type": "inspect_tensor",
	"setup_code": """
	import torch
	import torch.nn as nn

	# Simulate the training setup
	model = nn.Linear(10, 2)
	x = torch.randn(32, 10)
	y = model(x)
	loss = y.sum()
	loss.backward()
	""",
	"target_expression": "model.weight.grad"
	}
	```

	Returns:
	- `shape`: List of dimensions, e.g., `[2, 10]`
	- `dtype`: Data type, e.g., `"torch.float32"`
	- `requires_grad`: Whether gradients are tracked
	- `grad_is_none`: True if `.grad` is None (no backward pass)
	- `min_val`, `max_val`, `mean_val`: Statistics
	- `is_nan`, `is_inf`: True if any NaN/Inf values found

	Pro Tips:
	- Check `grad_is_none: true` → backward() wasn't called or requires_grad=False
	- Check `is_nan: true` → numerical instability (log(0), div by 0, etc.)
	- Check shape mismatches between layers

	---

	### 3. run_training_probe

	Run a few training steps to observe the loss curve and gradients.

	Best for:
	- Verifying that loss decreases (training works)
	- Checking if gradients flow to all layers
	- Testing a potential fix before submission
	- Detecting exploding/vanishing gradients

	Example:
	```python
	action = {
	"action_type": "run_training_probe",
	"code": """
	import torch
	import torch.nn as nn

	torch.manual_seed(42)

	model = nn.Linear(10, 2)
	optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
	criterion = nn.CrossEntropyLoss()

	X = torch.randn(100, 10)
	y = torch.randint(0, 2, (100,))

	losses = []
	for epoch in range(10):
	optimizer.zero_grad()
	out = model(X)
	loss = criterion(out, y)
	loss.backward()
	optimizer.step()
	losses.append(loss.item())

	print(f"LOSSES:{losses}")
	""",
	"steps": 5 # Will capture first 5 steps
	}
	```

	Returns:
	- `losses`: List of loss values per step
	- `grad_norms`: Dict of layer name → gradient norm
	- `optimizer_param_count`: Number of parameters in optimizer
	- `final_loss`: Last loss value
	- `loss_is_nan`, `loss_is_inf`: True if loss became NaN/Inf
	- `timed_out`: True if exceeded timeout

	Pro Tips:
	- If `losses` are flat or increasing → fix not working
	- If `loss_is_nan` → numerical instability remains
	- If `grad_norms` has zeros → frozen layers or detached tensors
	- Compare grad_norms between layers to find problems

	---

	### 4. get_variable_state

	Evaluate multiple expressions and see their values.

	Best for:
	- Checking model configuration (training mode, layer count)
	- Inspecting optimizer settings (learning rate, param groups)
	- Verifying data shapes and types
	- Debugging complex state

	Example:
	```python
	action = {
	"action_type": "get_variable_state",
	"setup_code": """
	import torch
	import torch.nn as nn

	model = nn.Sequential(
	nn.Linear(10, 32),
	nn.ReLU(),
	nn.Linear(32, 2)
	)
	model[0].requires_grad_(False) # Freeze first layer

	optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
	""",
	"expressions": [
	"model.training",
	"model[0].weight.requires_grad",
	"model[2].weight.requires_grad",
	"optimizer.param_groups[0]['lr']",
	"len(list(model.parameters()))",
	"sum(p.numel() for p in model.parameters() if p.requires_grad)"
	]
	}
	```

	Returns:
	- `results`: Dict mapping expression → result info
	- `repr`: String representation
	- `type`: Python type name
	- `value`: Actual value (for scalars)
	- `shape`: Shape (for tensors/arrays)
	- `error`: Error message if evaluation failed

	Pro Tips:
	- Check `model.training` → should be True during training
	- Check `requires_grad` on layers you expect to train
	- Verify `lr` is reasonable (not 10.0, not 1e-10)
	- Count trainable params vs total params

	---

	### 5. inspect_diff

	Compare your proposed fix against the original buggy code.

	Best for:
	- Reviewing your changes before submission
	- Catching unintended modifications
	- Verifying you fixed all identified bugs
	- Counting lines changed

	Example:
	```python
	action = {
	"action_type": "inspect_diff",
	"proposed_code": """
	import torch
	import torch.nn as nn

	# Fixed: Changed lr from 10.0 to 0.01
	optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

	# Fixed: Correct order - backward before step
	loss.backward()
	optimizer.step()
	"""
	}
	```

	Returns:
	- `diff`: Unified diff format (like `git diff`)
	- `lines_changed`: Total lines modified
	- `additions`: Lines added (prefixed with +)
	- `deletions`: Lines removed (prefixed with -)

	Pro Tips:
	- Review diff for unintended changes (typos, removed seed)
	- Verify all bug fixes are visible in diff
	- Keep changes minimal - don't refactor unrelated code

	---

	### 6. submit_fix

	Submit your final solution for grading.

	This is a terminal action - after calling this, the episode ends.

	Example:
	```python
	action = {
	"action_type": "submit_fix",
	"fixed_code": """
	import torch
	import torch.nn as nn

	torch.manual_seed(42)

	# Complete fixed training script...
	# Must print LOSSES:[v1, v2, ...]
	# For some tasks: VAL_ACC:X.XX
	""",
	"explanation": "Fixed two bugs: 1) Changed lr from 10.0 to 0.01, 2) Moved step() after backward()"
	}
	```

	Returns:
	- `reward`: Score from 0.0 to 1.0
	- `episode_done`: Always True
	- `error_log`: stdout/stderr from execution
	- `grader_details`: Task-specific grading info

	---

	## Common Debugging Patterns

	### Pattern 1: Shape Mismatch Debugging

	```python
	# Step 1: Check input shapes
	action1 = {
	"action_type": "get_variable_state",
	"setup_code": buggy_code,
	"expressions": ["X.shape", "y.shape", "model(X[:1]).shape"]
	}

	# Step 2: Inspect specific layer
	action2 = {
	"action_type": "inspect_tensor",
	"setup_code": buggy_code,
	"target_expression": "model.fc.weight"
	}
	```

	### Pattern 2: Gradient Flow Debugging

	```python
	# Step 1: Check if gradients exist
	action1 = {
	"action_type": "run_training_probe",
	"code": buggy_code,
	"steps": 3
	}
	# Look at grad_norms - any zeros?

	# Step 2: Check specific layer
	action2 = {
	"action_type": "inspect_tensor",
	"setup_code": buggy_code + "\nloss.backward()",
	"target_expression": "backbone[0].weight.grad"
	}
	```

	### Pattern 3: NaN Loss Debugging

	```python
	# Step 1: Find where NaN appears
	action1 = {
	"action_type": "execute_snippet",
	"code": """
	import torch
	pred = torch.tensor([0.0, 0.5, 1.0])
	print("log(pred):", torch.log(pred))
	print("Any NaN?:", torch.isnan(torch.log(pred)).any())
	"""
	}

	# Step 2: Test fix
	action2 = {
	"action_type": "execute_snippet",
	"code": """
	import torch
	pred = torch.tensor([0.0, 0.5, 1.0])
	pred_safe = pred.clamp(min=1e-7)
	print("log(pred_safe):", torch.log(pred_safe))
	print("Any NaN?:", torch.isnan(torch.log(pred_safe)).any())
	"""
	}
	```

	### Pattern 4: Loss Function Debugging

	```python
	# Check what loss function expects vs what model outputs
	action = {
	"action_type": "get_variable_state",
	"setup_code": buggy_code,
	"expressions": [
	"criterion", # What loss is being used
	"out.shape", # Model output shape
	"y.shape", # Label shape
	"y.dtype", # Label type (long vs float)
	"y[:3]" # Sample labels
	]
	}
	```

	---

	## Tips for Efficient Tool Use

	1. Start broad, then narrow: Use `get_variable_state` first to understand the code, then `inspect_tensor` for specific issues.

	2. Limit turns: You have max 10 turns per episode. Plan your debugging strategy.

	3. Test fixes early: Use `run_training_probe` with `steps=2-3` to quickly verify if a fix works.

	4. Always inspect_diff: Before `submit_fix`, always review your changes.

	5. Read error messages: Tool outputs include stderr - read it carefully.

	6. Keep setup_code minimal: Don't include the entire script - just what's needed to evaluate the expression.

	7. Use multiple expressions: `get_variable_state` can evaluate up to 10 expressions at once - use it!

	---

	## Security Restrictions

	Tools run in a sandboxed environment with these restrictions:

	Allowed imports:
	- torch, torch.nn, torch.optim, torch.utils.data
	- numpy, sklearn, pandas, matplotlib, scipy
	- math, random, os (read-only), sys
	- collections, itertools, functools
	- json, re, typing, copy, dataclasses

	Blocked imports:
	- socket, requests, httpx, urllib (no network)
	- subprocess, shutil (no shell access)

	Other restrictions:
	- 30 second timeout per tool call
	- File writes only to /tmp
	- No GPU access (CPU only)