purpose-agent / TEST_FIXES.md
Rohan03's picture
Document all 5 test fixes for v3.0.0
2f097f7 verified
|
raw
history blame
3.91 kB

Test Fixes Applied for v3.0.0

Issue 1: Trajectory None guards (FIXED)

  • File: purpose_agent/types.py — UPDATED
  • Changed: cumulative_reward, total_delta, success_rate properties now check both s.score is not None AND s.score.delta is not None
  • Added docstring note that sre_patches.py replaces these at import time
  • Baseline and SRE-patched versions now equivalent

Issue 2: Backpressure test flakiness (NEEDS MANUAL FIX)

  • File: tests/test_sprint1_events.py — T1.6 section
  • Problem: async consumer may not start before flooding; terminal event might not arrive
  • Fix: Replace the test_backpressure() function with this more robust version:
async def test_backpressure():
    bus6 = EventBus(max_queue_size=3)
    received = []
    consumer_started = asyncio.Event()

    async def consumer():
        consumer_started.set()
        try:
            async for event in bus6.subscribe():
                received.append(event)
                await asyncio.sleep(0.01)
        except asyncio.CancelledError:
            pass

    task = asyncio.create_task(consumer())
    await consumer_started.wait()
    await asyncio.sleep(0.05)

    for i in range(20):
        bus6.emit(create_event("r6", EventKind.TEXT_DELTA, seq=i, text=f"w{i}"))

    bus6.emit(create_event("r6", EventKind.RUN_FINISHED, seq=99, result="done"))

    await asyncio.sleep(1.0)
    bus6.close()
    task.cancel()
    try:
        await asyncio.wait_for(task, timeout=2.0)
    except (asyncio.CancelledError, asyncio.TimeoutError):
        pass

    has_terminal = any(e.kind == EventKind.RUN_FINISHED for e in received)
    return has_terminal

Key changes:

  • Added consumer_started Event to ensure consumer is running before flooding
  • Increased final wait from 0.5s to 1.0s
  • Added asyncio.wait_for timeout on task cleanup

Issue 3: prod_test.py API timeout (NEEDS MANUAL FIX)

  • File: tests/prod_test.py
  • Problem: No timeout on OpenRouter API calls; tests could hang
  • Fix: Wrap the backend creation with a timeout, add retry logic:

After line b = resolve_backend(...), add:

import signal

class TimeoutError(Exception):
    pass

def timeout_handler(signum, frame):
    raise TimeoutError("API call timed out")

# Set a 60s alarm for API calls
signal.signal(signal.SIGALRM, timeout_handler)

Or simpler: in the resolve_backend call, add timeout to the OpenAI client:

# In llm_backend.py OpenAICompatibleBackend.__init__, add:
self.client = OpenAI(
    base_url=base_url,
    api_key=api_key or os.environ.get("OPENAI_API_KEY"),
    timeout=60.0,  # 60 second timeout on all API calls
)

Issue 4: validate.py mock resilience (NEEDS MANUAL FIX)

  • File: benchmarks/validate.py
  • Problem: Mock matches on "Learned Strategies" + "None yet" text; fragile if prompt format changes
  • Fix: In make_mock(), make the heuristic detection more resilient:

Change: has_h = "Learned Strategies" in text and "None yet" not in text To: has_h = ("Learned Strategies" in text or "Learned Strategies" in text) and "None yet" not in text and "heuristics" in text.lower()

Or better: check the heuristic count directly:

has_h = any("When:" in line or "Do:" in line for line in text.split("\n"))

Issue 5: CalculatorTool import blocking (VERIFIED WORKING)

  • File: purpose_agent/tools.py
  • CalculatorTool.execute() validates tokens with: if re.search(r'[a-zA-Z_]', tokens)
  • After removing known function names (abs, round, sqrt, etc.), any remaining letters are rejected
  • __import__("os") → after removing known functions, __import__ and os remain → rejected ✓
  • Also: AST walker checks Call nodes and rejects unknown function names
  • eval() uses {"__builtins__": {}} — no builtins available
  • Test in benchmark_v3.py: check("tools.calc_blocks_import", "Error" in calc.run(expression='__import__("os")').output) — CORRECT