Document all 5 test fixes for v3.0.0
Browse files- TEST_FIXES.md +104 -0
TEST_FIXES.md
ADDED
|
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Test Fixes Applied for v3.0.0
|
| 2 |
+
|
| 3 |
+
## Issue 1: Trajectory None guards (FIXED)
|
| 4 |
+
- File: `purpose_agent/types.py` — UPDATED
|
| 5 |
+
- Changed: cumulative_reward, total_delta, success_rate properties now check both `s.score is not None` AND `s.score.delta is not None`
|
| 6 |
+
- Added docstring note that sre_patches.py replaces these at import time
|
| 7 |
+
- Baseline and SRE-patched versions now equivalent
|
| 8 |
+
|
| 9 |
+
## Issue 2: Backpressure test flakiness (NEEDS MANUAL FIX)
|
| 10 |
+
- File: `tests/test_sprint1_events.py` — T1.6 section
|
| 11 |
+
- Problem: async consumer may not start before flooding; terminal event might not arrive
|
| 12 |
+
- Fix: Replace the test_backpressure() function with this more robust version:
|
| 13 |
+
|
| 14 |
+
```python
|
| 15 |
+
async def test_backpressure():
|
| 16 |
+
bus6 = EventBus(max_queue_size=3)
|
| 17 |
+
received = []
|
| 18 |
+
consumer_started = asyncio.Event()
|
| 19 |
+
|
| 20 |
+
async def consumer():
|
| 21 |
+
consumer_started.set()
|
| 22 |
+
try:
|
| 23 |
+
async for event in bus6.subscribe():
|
| 24 |
+
received.append(event)
|
| 25 |
+
await asyncio.sleep(0.01)
|
| 26 |
+
except asyncio.CancelledError:
|
| 27 |
+
pass
|
| 28 |
+
|
| 29 |
+
task = asyncio.create_task(consumer())
|
| 30 |
+
await consumer_started.wait()
|
| 31 |
+
await asyncio.sleep(0.05)
|
| 32 |
+
|
| 33 |
+
for i in range(20):
|
| 34 |
+
bus6.emit(create_event("r6", EventKind.TEXT_DELTA, seq=i, text=f"w{i}"))
|
| 35 |
+
|
| 36 |
+
bus6.emit(create_event("r6", EventKind.RUN_FINISHED, seq=99, result="done"))
|
| 37 |
+
|
| 38 |
+
await asyncio.sleep(1.0)
|
| 39 |
+
bus6.close()
|
| 40 |
+
task.cancel()
|
| 41 |
+
try:
|
| 42 |
+
await asyncio.wait_for(task, timeout=2.0)
|
| 43 |
+
except (asyncio.CancelledError, asyncio.TimeoutError):
|
| 44 |
+
pass
|
| 45 |
+
|
| 46 |
+
has_terminal = any(e.kind == EventKind.RUN_FINISHED for e in received)
|
| 47 |
+
return has_terminal
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
Key changes:
|
| 51 |
+
- Added `consumer_started` Event to ensure consumer is running before flooding
|
| 52 |
+
- Increased final wait from 0.5s to 1.0s
|
| 53 |
+
- Added `asyncio.wait_for` timeout on task cleanup
|
| 54 |
+
|
| 55 |
+
## Issue 3: prod_test.py API timeout (NEEDS MANUAL FIX)
|
| 56 |
+
- File: `tests/prod_test.py`
|
| 57 |
+
- Problem: No timeout on OpenRouter API calls; tests could hang
|
| 58 |
+
- Fix: Wrap the backend creation with a timeout, add retry logic:
|
| 59 |
+
|
| 60 |
+
After line `b = resolve_backend(...)`, add:
|
| 61 |
+
```python
|
| 62 |
+
import signal
|
| 63 |
+
|
| 64 |
+
class TimeoutError(Exception):
|
| 65 |
+
pass
|
| 66 |
+
|
| 67 |
+
def timeout_handler(signum, frame):
|
| 68 |
+
raise TimeoutError("API call timed out")
|
| 69 |
+
|
| 70 |
+
# Set a 60s alarm for API calls
|
| 71 |
+
signal.signal(signal.SIGALRM, timeout_handler)
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
Or simpler: in the resolve_backend call, add timeout to the OpenAI client:
|
| 75 |
+
```python
|
| 76 |
+
# In llm_backend.py OpenAICompatibleBackend.__init__, add:
|
| 77 |
+
self.client = OpenAI(
|
| 78 |
+
base_url=base_url,
|
| 79 |
+
api_key=api_key or os.environ.get("OPENAI_API_KEY"),
|
| 80 |
+
timeout=60.0, # 60 second timeout on all API calls
|
| 81 |
+
)
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
## Issue 4: validate.py mock resilience (NEEDS MANUAL FIX)
|
| 85 |
+
- File: `benchmarks/validate.py`
|
| 86 |
+
- Problem: Mock matches on "Learned Strategies" + "None yet" text; fragile if prompt format changes
|
| 87 |
+
- Fix: In make_mock(), make the heuristic detection more resilient:
|
| 88 |
+
|
| 89 |
+
Change: `has_h = "Learned Strategies" in text and "None yet" not in text`
|
| 90 |
+
To: `has_h = ("Learned Strategies" in text or "Learned Strategies" in text) and "None yet" not in text and "heuristics" in text.lower()`
|
| 91 |
+
|
| 92 |
+
Or better: check the heuristic count directly:
|
| 93 |
+
```python
|
| 94 |
+
has_h = any("When:" in line or "Do:" in line for line in text.split("\n"))
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
## Issue 5: CalculatorTool __import__ blocking (VERIFIED WORKING)
|
| 98 |
+
- File: `purpose_agent/tools.py`
|
| 99 |
+
- CalculatorTool.execute() validates tokens with: `if re.search(r'[a-zA-Z_]', tokens)`
|
| 100 |
+
- After removing known function names (abs, round, sqrt, etc.), any remaining letters are rejected
|
| 101 |
+
- `__import__("os")` → after removing known functions, `__import__` and `os` remain → rejected ✓
|
| 102 |
+
- Also: AST walker checks Call nodes and rejects unknown function names
|
| 103 |
+
- eval() uses `{"__builtins__": {}}` — no builtins available
|
| 104 |
+
- Test in benchmark_v3.py: `check("tools.calc_blocks_import", "Error" in calc.run(expression='__import__("os")').output)` — CORRECT
|