Rohan03 commited on
Commit
2f097f7
·
verified ·
1 Parent(s): 2259ebe

Document all 5 test fixes for v3.0.0

Browse files
Files changed (1) hide show
  1. TEST_FIXES.md +104 -0
TEST_FIXES.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Test Fixes Applied for v3.0.0
2
+
3
+ ## Issue 1: Trajectory None guards (FIXED)
4
+ - File: `purpose_agent/types.py` — UPDATED
5
+ - Changed: cumulative_reward, total_delta, success_rate properties now check both `s.score is not None` AND `s.score.delta is not None`
6
+ - Added docstring note that sre_patches.py replaces these at import time
7
+ - Baseline and SRE-patched versions now equivalent
8
+
9
+ ## Issue 2: Backpressure test flakiness (NEEDS MANUAL FIX)
10
+ - File: `tests/test_sprint1_events.py` — T1.6 section
11
+ - Problem: async consumer may not start before flooding; terminal event might not arrive
12
+ - Fix: Replace the test_backpressure() function with this more robust version:
13
+
14
+ ```python
15
+ async def test_backpressure():
16
+ bus6 = EventBus(max_queue_size=3)
17
+ received = []
18
+ consumer_started = asyncio.Event()
19
+
20
+ async def consumer():
21
+ consumer_started.set()
22
+ try:
23
+ async for event in bus6.subscribe():
24
+ received.append(event)
25
+ await asyncio.sleep(0.01)
26
+ except asyncio.CancelledError:
27
+ pass
28
+
29
+ task = asyncio.create_task(consumer())
30
+ await consumer_started.wait()
31
+ await asyncio.sleep(0.05)
32
+
33
+ for i in range(20):
34
+ bus6.emit(create_event("r6", EventKind.TEXT_DELTA, seq=i, text=f"w{i}"))
35
+
36
+ bus6.emit(create_event("r6", EventKind.RUN_FINISHED, seq=99, result="done"))
37
+
38
+ await asyncio.sleep(1.0)
39
+ bus6.close()
40
+ task.cancel()
41
+ try:
42
+ await asyncio.wait_for(task, timeout=2.0)
43
+ except (asyncio.CancelledError, asyncio.TimeoutError):
44
+ pass
45
+
46
+ has_terminal = any(e.kind == EventKind.RUN_FINISHED for e in received)
47
+ return has_terminal
48
+ ```
49
+
50
+ Key changes:
51
+ - Added `consumer_started` Event to ensure consumer is running before flooding
52
+ - Increased final wait from 0.5s to 1.0s
53
+ - Added `asyncio.wait_for` timeout on task cleanup
54
+
55
+ ## Issue 3: prod_test.py API timeout (NEEDS MANUAL FIX)
56
+ - File: `tests/prod_test.py`
57
+ - Problem: No timeout on OpenRouter API calls; tests could hang
58
+ - Fix: Wrap the backend creation with a timeout, add retry logic:
59
+
60
+ After line `b = resolve_backend(...)`, add:
61
+ ```python
62
+ import signal
63
+
64
+ class TimeoutError(Exception):
65
+ pass
66
+
67
+ def timeout_handler(signum, frame):
68
+ raise TimeoutError("API call timed out")
69
+
70
+ # Set a 60s alarm for API calls
71
+ signal.signal(signal.SIGALRM, timeout_handler)
72
+ ```
73
+
74
+ Or simpler: in the resolve_backend call, add timeout to the OpenAI client:
75
+ ```python
76
+ # In llm_backend.py OpenAICompatibleBackend.__init__, add:
77
+ self.client = OpenAI(
78
+ base_url=base_url,
79
+ api_key=api_key or os.environ.get("OPENAI_API_KEY"),
80
+ timeout=60.0, # 60 second timeout on all API calls
81
+ )
82
+ ```
83
+
84
+ ## Issue 4: validate.py mock resilience (NEEDS MANUAL FIX)
85
+ - File: `benchmarks/validate.py`
86
+ - Problem: Mock matches on "Learned Strategies" + "None yet" text; fragile if prompt format changes
87
+ - Fix: In make_mock(), make the heuristic detection more resilient:
88
+
89
+ Change: `has_h = "Learned Strategies" in text and "None yet" not in text`
90
+ To: `has_h = ("Learned Strategies" in text or "Learned Strategies" in text) and "None yet" not in text and "heuristics" in text.lower()`
91
+
92
+ Or better: check the heuristic count directly:
93
+ ```python
94
+ has_h = any("When:" in line or "Do:" in line for line in text.split("\n"))
95
+ ```
96
+
97
+ ## Issue 5: CalculatorTool __import__ blocking (VERIFIED WORKING)
98
+ - File: `purpose_agent/tools.py`
99
+ - CalculatorTool.execute() validates tokens with: `if re.search(r'[a-zA-Z_]', tokens)`
100
+ - After removing known function names (abs, round, sqrt, etc.), any remaining letters are rejected
101
+ - `__import__("os")` → after removing known functions, `__import__` and `os` remain → rejected ✓
102
+ - Also: AST walker checks Call nodes and rejects unknown function names
103
+ - eval() uses `{"__builtins__": {}}` — no builtins available
104
+ - Test in benchmark_v3.py: `check("tools.calc_blocks_import", "Error" in calc.run(expression='__import__("os")').output)` — CORRECT