Spaces:
Sleeping
Sleeping
| title: Bio Over-Refusal Explorer | |
| emoji: 𧬠| |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.9.1 | |
| python_version: "3.11" | |
| app_file: app.py | |
| pinned: false | |
| license: cc-by-nc-sa-4.0 | |
| short_description: Browse 201 expert-annotated biology queries + 9-model FPR | |
| tags: | |
| - ai-safety | |
| - biosafety | |
| - llm-evaluation | |
| - over-refusal | |
| - calibration | |
| # Bio Over-Refusal Explorer | |
| Static data browser for the [Bio Over-Refusal Dataset v0.1.0](https://huggingface.co/datasets/jang1563/bio-overrefusal-v0.1) β 201 domain-expert-authored biology research queries stratified by sensitivity tier, with 9-model false-positive refusal rates and Wilson 95% confidence intervals. | |
| **No model API calls happen at runtime.** This Space loads pre-computed evaluation results from the dataset and lets you browse them by tier, subdomain, and legitimacy. Provider names are reported as observed; numbers should be read as a slice-level calibration signal for this specific biology-research benchmark, not as a global model-quality ranking. | |
| ## What you can do here | |
| 1. **Browse queries** β Filter the 201 queries by tier (1β5), subdomain (10), and legitimacy. Click a row to see the full record (biological reasoning, legitimate contexts, citations, danger-shift contexts). | |
| 2. **Compare models** β See the 9-model FPR table with Wilson 95% CIs. Switch between strict and broad FPR. | |
| 3. **Per-tier breakdown** β See how each model's FPR varies across the 5 sensitivity tiers. | |
| ## Source artifacts | |
| - π Dataset: [jang1563/bio-overrefusal-v0.1](https://huggingface.co/datasets/jang1563/bio-overrefusal-v0.1) | |
| - π» Code + reproducibility: [github.com/jang1563/bio-overrefusal-v0.1](https://github.com/jang1563/bio-overrefusal-v0.1) | |
| - π Safety scope: [SAFETY.md](https://github.com/jang1563/bio-overrefusal-v0.1/blob/main/SAFETY.md) | |
| ## Position in the safety stack | |
| This dataset is a **calibration measurement**, not a deployed mitigation. It complements rather than replaces capability evaluations (e.g. WMDP, biothreat-eval), constitutional/classifier safeguards, and red-team work. This work is independent and does not represent any provider's internal evaluation pipeline. | |