| # Limitations | |
| - **Not a certified security scanner.** This project does not provide penetration testing, compliance, or production safety guarantees. | |
| - **Rule-based scoring is a starting point.** Regex and keyword heuristics miss nuance, context, and adversarial paraphrase. Results can be **false positives** or **false negatives**. | |
| - **Human interpretation is required.** Treat every `CaseScore` and aggregate metric as a **hint** for review, not ground truth. | |
| - **Benchmark coverage is incomplete.** Six families and a finite seed set cannot represent the full space of agent failures or attacks. | |
| - **Manual responses only in v0.** Without controlled execution of a target system, variance comes from how faithfully pasted outputs reflect real agent behavior. | |
| Use this artifact to **structure** fragility discussions and **compare** runs under the same transparent rules—not to assert absolute safety. | |