Rohan03
/

purpose-agent

+# Test Fixes Applied for v3.0.0
+## Issue 1: Trajectory None guards (FIXED)
+- File: `purpose_agent/types.py` — UPDATED
+- Changed: cumulative_reward, total_delta, success_rate properties now check both `s.score is not None` AND `s.score.delta is not None`
+- Added docstring note that sre_patches.py replaces these at import time
+- Baseline and SRE-patched versions now equivalent
+## Issue 2: Backpressure test flakiness (NEEDS MANUAL FIX)
+- File: `tests/test_sprint1_events.py` — T1.6 section
+- Problem: async consumer may not start before flooding; terminal event might not arrive
+- Fix: Replace the test_backpressure() function with this more robust version:
+```python
+async def test_backpressure():
+    bus6 = EventBus(max_queue_size=3)
+    received = []
+    consumer_started = asyncio.Event()
+    async def consumer():
+        consumer_started.set()
+        try:
+            async for event in bus6.subscribe():
+                received.append(event)
+                await asyncio.sleep(0.01)
+        except asyncio.CancelledError:
+            pass
+    task = asyncio.create_task(consumer())
+    await consumer_started.wait()
+    await asyncio.sleep(0.05)
+    for i in range(20):
+        bus6.emit(create_event("r6", EventKind.TEXT_DELTA, seq=i, text=f"w{i}"))
+    bus6.emit(create_event("r6", EventKind.RUN_FINISHED, seq=99, result="done"))
+    await asyncio.sleep(1.0)
+    bus6.close()
+    task.cancel()
+    try:
+        await asyncio.wait_for(task, timeout=2.0)
+    except (asyncio.CancelledError, asyncio.TimeoutError):
+        pass
+    has_terminal = any(e.kind == EventKind.RUN_FINISHED for e in received)
+    return has_terminal
+```
+Key changes:
+- Added `consumer_started` Event to ensure consumer is running before flooding
+- Increased final wait from 0.5s to 1.0s
+- Added `asyncio.wait_for` timeout on task cleanup
+## Issue 3: prod_test.py API timeout (NEEDS MANUAL FIX)
+- File: `tests/prod_test.py`
+- Problem: No timeout on OpenRouter API calls; tests could hang
+- Fix: Wrap the backend creation with a timeout, add retry logic:
+After line `b = resolve_backend(...)`, add:
+```python
+import signal
+class TimeoutError(Exception):
+    pass
+def timeout_handler(signum, frame):
+    raise TimeoutError("API call timed out")
+# Set a 60s alarm for API calls
+signal.signal(signal.SIGALRM, timeout_handler)
+```
+Or simpler: in the resolve_backend call, add timeout to the OpenAI client:
+```python
+# In llm_backend.py OpenAICompatibleBackend.__init__, add:
+self.client = OpenAI(
+    base_url=base_url,
+    api_key=api_key or os.environ.get("OPENAI_API_KEY"),
+    timeout=60.0,  # 60 second timeout on all API calls
+)
+```
+## Issue 4: validate.py mock resilience (NEEDS MANUAL FIX)
+- File: `benchmarks/validate.py`
+- Problem: Mock matches on "Learned Strategies" + "None yet" text; fragile if prompt format changes
+- Fix: In make_mock(), make the heuristic detection more resilient:
+Change: `has_h = "Learned Strategies" in text and "None yet" not in text`
+To: `has_h = ("Learned Strategies" in text or "Learned Strategies" in text) and "None yet" not in text and "heuristics" in text.lower()`
+Or better: check the heuristic count directly:
+```python
+has_h = any("When:" in line or "Do:" in line for line in text.split("\n"))
+```
+## Issue 5: CalculatorTool __import__ blocking (VERIFIED WORKING)
+- File: `purpose_agent/tools.py`
+- CalculatorTool.execute() validates tokens with: `if re.search(r'[a-zA-Z_]', tokens)`
+- After removing known function names (abs, round, sqrt, etc.), any remaining letters are rejected
+- `__import__("os")` → after removing known functions, `__import__` and `os` remain → rejected ✓
+- Also: AST walker checks Call nodes and rejects unknown function names
+- eval() uses `{"__builtins__": {}}` — no builtins available
+- Test in benchmark_v3.py: `check("tools.calc_blocks_import", "Error" in calc.run(expression='__import__("os")').output)` — CORRECT