# Limitations

- **Not a certified security scanner.** This project does not provide penetration testing, compliance, or production safety guarantees.
- **Rule-based scoring is a starting point.** Regex and keyword heuristics miss nuance, context, and adversarial paraphrase. Results can be **false positives** or **false negatives**.
- **Human interpretation is required.** Treat every `CaseScore` and aggregate metric as a **hint** for review, not ground truth.
- **Benchmark coverage is incomplete.** Six families and a finite seed set cannot represent the full space of agent failures or attacks.
- **Manual responses only in v0.** Without controlled execution of a target system, variance comes from how faithfully pasted outputs reflect real agent behavior.

Use this artifact to **structure** fragility discussions and **compare** runs under the same transparent rules—not to assert absolute safety.