| # Test Fixes Applied for v3.0.0 |
|
|
| ## Issue 1: Trajectory None guards (FIXED) |
| - File: `purpose_agent/types.py` β UPDATED |
| - Changed: cumulative_reward, total_delta, success_rate properties now check both `s.score is not None` AND `s.score.delta is not None` |
| - Added docstring note that sre_patches.py replaces these at import time |
| - Baseline and SRE-patched versions now equivalent |
|
|
| ## Issue 2: Backpressure test flakiness (NEEDS MANUAL FIX) |
| - File: `tests/test_sprint1_events.py` β T1.6 section |
| - Problem: async consumer may not start before flooding; terminal event might not arrive |
| - Fix: Replace the test_backpressure() function with this more robust version: |
| |
| ```python |
| async def test_backpressure(): |
| bus6 = EventBus(max_queue_size=3) |
| received = [] |
| consumer_started = asyncio.Event() |
| |
| async def consumer(): |
| consumer_started.set() |
| try: |
| async for event in bus6.subscribe(): |
| received.append(event) |
| await asyncio.sleep(0.01) |
| except asyncio.CancelledError: |
| pass |
| |
| task = asyncio.create_task(consumer()) |
| await consumer_started.wait() |
| await asyncio.sleep(0.05) |
| |
| for i in range(20): |
| bus6.emit(create_event("r6", EventKind.TEXT_DELTA, seq=i, text=f"w{i}")) |
| |
| bus6.emit(create_event("r6", EventKind.RUN_FINISHED, seq=99, result="done")) |
| |
| await asyncio.sleep(1.0) |
| bus6.close() |
| task.cancel() |
| try: |
| await asyncio.wait_for(task, timeout=2.0) |
| except (asyncio.CancelledError, asyncio.TimeoutError): |
| pass |
| |
| has_terminal = any(e.kind == EventKind.RUN_FINISHED for e in received) |
| return has_terminal |
| ``` |
| |
| Key changes: |
| - Added `consumer_started` Event to ensure consumer is running before flooding |
| - Increased final wait from 0.5s to 1.0s |
| - Added `asyncio.wait_for` timeout on task cleanup |
|
|
| ## Issue 3: prod_test.py API timeout (NEEDS MANUAL FIX) |
| - File: `tests/prod_test.py` |
| - Problem: No timeout on OpenRouter API calls; tests could hang |
| - Fix: Wrap the backend creation with a timeout, add retry logic: |
|
|
| After line `b = resolve_backend(...)`, add: |
| ```python |
| import signal |
| |
| class TimeoutError(Exception): |
| pass |
| |
| def timeout_handler(signum, frame): |
| raise TimeoutError("API call timed out") |
| |
| # Set a 60s alarm for API calls |
| signal.signal(signal.SIGALRM, timeout_handler) |
| ``` |
|
|
| Or simpler: in the resolve_backend call, add timeout to the OpenAI client: |
| ```python |
| # In llm_backend.py OpenAICompatibleBackend.__init__, add: |
| self.client = OpenAI( |
| base_url=base_url, |
| api_key=api_key or os.environ.get("OPENAI_API_KEY"), |
| timeout=60.0, # 60 second timeout on all API calls |
| ) |
| ``` |
| |
| ## Issue 4: validate.py mock resilience (NEEDS MANUAL FIX) |
| - File: `benchmarks/validate.py` |
| - Problem: Mock matches on "Learned Strategies" + "None yet" text; fragile if prompt format changes |
| - Fix: In make_mock(), make the heuristic detection more resilient: |
| |
| Change: `has_h = "Learned Strategies" in text and "None yet" not in text` |
| To: `has_h = ("Learned Strategies" in text or "Learned Strategies" in text) and "None yet" not in text and "heuristics" in text.lower()` |
|
|
| Or better: check the heuristic count directly: |
| ```python |
| has_h = any("When:" in line or "Do:" in line for line in text.split("\n")) |
| ``` |
|
|
| ## Issue 5: CalculatorTool __import__ blocking (VERIFIED WORKING) |
| - File: `purpose_agent/tools.py` |
| - CalculatorTool.execute() validates tokens with: `if re.search(r'[a-zA-Z_]', tokens)` |
| - After removing known function names (abs, round, sqrt, etc.), any remaining letters are rejected |
| - `__import__("os")` β after removing known functions, `__import__` and `os` remain β rejected β |
| - Also: AST walker checks Call nodes and rejects unknown function names |
| - eval() uses `{"__builtins__": {}}` β no builtins available |
| - Test in benchmark_v3.py: `check("tools.calc_blocks_import", "Error" in calc.run(expression='__import__("os")').output)` β CORRECT |
| |