File size: 909 Bytes
6c3043e | 1 2 3 4 5 6 7 8 9 10 | # Limitations
- **Not a certified security scanner.** This project does not provide penetration testing, compliance, or production safety guarantees.
- **Rule-based scoring is a starting point.** Regex and keyword heuristics miss nuance, context, and adversarial paraphrase. Results can be **false positives** or **false negatives**.
- **Human interpretation is required.** Treat every `CaseScore` and aggregate metric as a **hint** for review, not ground truth.
- **Benchmark coverage is incomplete.** Six families and a finite seed set cannot represent the full space of agent failures or attacks.
- **Manual responses only in v0.** Without controlled execution of a target system, variance comes from how faithfully pasted outputs reflect real agent behavior.
Use this artifact to **structure** fragility discussions and **compare** runs under the same transparent rules—not to assert absolute safety.
|