# Limitations - **Not a certified security scanner.** This project does not provide penetration testing, compliance, or production safety guarantees. - **Rule-based scoring is a starting point.** Regex and keyword heuristics miss nuance, context, and adversarial paraphrase. Results can be **false positives** or **false negatives**. - **Human interpretation is required.** Treat every `CaseScore` and aggregate metric as a **hint** for review, not ground truth. - **Benchmark coverage is incomplete.** Six families and a finite seed set cannot represent the full space of agent failures or attacks. - **Manual responses only in v0.** Without controlled execution of a target system, variance comes from how faithfully pasted outputs reflect real agent behavior. Use this artifact to **structure** fragility discussions and **compare** runs under the same transparent rules—not to assert absolute safety.