Spaces:

jang1563
/

bio-overrefusal-explorer

Sleeping

App Files Files Community

bio-overrefusal-explorer / README.md

jang1563

Pin Gradio 5.9.1 + Python 3.11 (fix Python 3.13 pydub/audioop)

74eb927 verified 3 days ago

preview code

raw

history blame contribute delete

2.18 kB

	---
	title: Bio Over-Refusal Explorer
	emoji: 🧬
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.9.1
	python_version: "3.11"
	app_file: app.py
	pinned: false
	license: cc-by-nc-sa-4.0
	short_description: Browse 201 expert-annotated biology queries + 9-model FPR
	tags:
	- ai-safety
	- biosafety
	- llm-evaluation
	- over-refusal
	- calibration
	---

	# Bio Over-Refusal Explorer

	Static data browser for the [Bio Over-Refusal Dataset v0.1.0](https://huggingface.co/datasets/jang1563/bio-overrefusal-v0.1) — 201 domain-expert-authored biology research queries stratified by sensitivity tier, with 9-model false-positive refusal rates and Wilson 95% confidence intervals.

	No model API calls happen at runtime. This Space loads pre-computed evaluation results from the dataset and lets you browse them by tier, subdomain, and legitimacy. Provider names are reported as observed; numbers should be read as a slice-level calibration signal for this specific biology-research benchmark, not as a global model-quality ranking.

	## What you can do here

	1. Browse queries — Filter the 201 queries by tier (1–5), subdomain (10), and legitimacy. Click a row to see the full record (biological reasoning, legitimate contexts, citations, danger-shift contexts).
	2. Compare models — See the 9-model FPR table with Wilson 95% CIs. Switch between strict and broad FPR.
	3. Per-tier breakdown — See how each model's FPR varies across the 5 sensitivity tiers.

	## Source artifacts

	- 📊 Dataset: [jang1563/bio-overrefusal-v0.1](https://huggingface.co/datasets/jang1563/bio-overrefusal-v0.1)
	- 💻 Code + reproducibility: [github.com/jang1563/bio-overrefusal-v0.1](https://github.com/jang1563/bio-overrefusal-v0.1)
	- 📋 Safety scope: [SAFETY.md](https://github.com/jang1563/bio-overrefusal-v0.1/blob/main/SAFETY.md)

	## Position in the safety stack

	This dataset is a calibration measurement, not a deployed mitigation. It complements rather than replaces capability evaluations (e.g. WMDP, biothreat-eval), constitutional/classifier safeguards, and red-team work. This work is independent and does not represent any provider's internal evaluation pipeline.