Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.14.0
title: Bio Over-Refusal Explorer
emoji: π§¬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
python_version: '3.11'
app_file: app.py
pinned: false
license: cc-by-nc-sa-4.0
short_description: Browse 201 expert-annotated biology queries + 9-model FPR
tags:
- ai-safety
- biosafety
- llm-evaluation
- over-refusal
- calibration
Bio Over-Refusal Explorer
Static data browser for the Bio Over-Refusal Dataset v0.1.0 β 201 domain-expert-authored biology research queries stratified by sensitivity tier, with 9-model false-positive refusal rates and Wilson 95% confidence intervals.
No model API calls happen at runtime. This Space loads pre-computed evaluation results from the dataset and lets you browse them by tier, subdomain, and legitimacy. Provider names are reported as observed; numbers should be read as a slice-level calibration signal for this specific biology-research benchmark, not as a global model-quality ranking.
What you can do here
- Browse queries β Filter the 201 queries by tier (1β5), subdomain (10), and legitimacy. Click a row to see the full record (biological reasoning, legitimate contexts, citations, danger-shift contexts).
- Compare models β See the 9-model FPR table with Wilson 95% CIs. Switch between strict and broad FPR.
- Per-tier breakdown β See how each model's FPR varies across the 5 sensitivity tiers.
Source artifacts
- π Dataset: jang1563/bio-overrefusal-v0.1
- π» Code + reproducibility: github.com/jang1563/bio-overrefusal-v0.1
- π Safety scope: SAFETY.md
Position in the safety stack
This dataset is a calibration measurement, not a deployed mitigation. It complements rather than replaces capability evaluations (e.g. WMDP, biothreat-eval), constitutional/classifier safeguards, and red-team work. This work is independent and does not represent any provider's internal evaluation pipeline.