--- title: >- BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning license: cc-by-4.0 task_categories: - text-generation - question-answering language: - en tags: - biology - protocol - benchmark - ai4science ---

# BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning [![ArXiv](https://img.shields.io/badge/ArXiv-paper-B31B1B.svg?logo=arXiv&logoColor=Red)](https://arxiv.org/pdf/2505.07889) [![Hugging Face](https://img.shields.io/badge/Hugging%20Face-Dataset-FFD210.svg?logo=HuggingFace&logoColor=black)](https://huggingface.co/BioProBench) [![GitHub](https://img.shields.io/badge/GitHub-Code-181717.svg?logo=github&logoColor=white)](https://github.com/YuyangSunshine/bioprotocolbench) [![Project Page](https://img.shields.io/badge/Project-Page-blue.svg?logo=google-chrome&logoColor=white)](https://yuyangsunshine.github.io/BioPro-Project/) [![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/) --- ## 📢 Latest News * ✨ **[2026-03-31]** **Data Split Update!** We have officially released the **Train/Test splits** for each task (PQA, ORD, ERR, GEN, REA), making it easier for the community to train and evaluate models consistently. * 🔥 **[2026-03-18]** Our **BioProAgent** is now live on AI4S LAB! [Try it out and order wet-lab experiments here](https://yuyangsunshine.github.io/BioPro-Project/). * 🎉 **[2026-03-03]** Our BioProAgent has been accepted by the **ICLR 2026 LLA Workshop!** * 📝 **[2026-01-21]** BioProBench paper has been updated with new experimental results.[arXiv](https://arxiv.org/pdf/2505.07889). * 🚀 **[2025-12-01]** Code and dataset (v1.0) are released on GitHub. --- ## 🌟 Introduction **BioProBench** is the first large-scale, integrated multi-task benchmark for biological protocol understanding and reasoning, specifically designed for Large Language Models (LLMs). It moves beyond simple QA to encompass a comprehensive suite of tasks critical for procedural text comprehension in life sciences.

### Key Features: * 📚 **Large-scale Data:** Built upon **27K original biological protocols**, yielding nearly **556K high-quality structured instances**. * 🎯 **Comprehensive Tasks:** 5 core tasks: **PQA** (Question Answering), **ORD** (Step Ordering), **ERR** (Error Correction), **GEN** (Generation), and **REA** (Reasoning). * 🧬 **Broad Domain Coverage:** Covers **16 biological subdomains** from 6 major repositories. * 🔬 **Standardized Evaluation:** A robust framework combining NLP metrics with novel domain-specific measures. --- ## 📊 Dataset Structure & Tasks

We provide standardized JSON files for each task, now including **Train** and **Test** splits: | Task | Description | Files | | :--- | :--- | :--- | | **PQA** | Protocol Question Answering | `PQA_train.json`, `PQA_test.json` | | **ORD** | Step Ordering | `ORD_train.json`, `ORD_test.json` | | **ERR** | Error Correction | `ERR_train.json`, `ERR_test.json` | | **GEN** | Protocol Generation | `GEN_train.json`, `GEN_test.json` | | **Raw** | Full Protocol Corpus | `Bio-protocol.json`, `Protocol-io.json`, etc. | ### 🔗 Useful Links * **Official Website:** [BioPro-Project Page](https://yuyangsunshine.github.io/BioPro-Project/) * **GitHub Repository:** [bioprotocolbench](https://github.com/YuyangSunshine/bioprotocolbench) (Code for evaluation & training) --- ## 🔬 Key Findings We evaluated 12 mainstream LLMs. Our findings reveal: * **Surface vs. Deep Understanding:** Models perform well on QA (~70% Acc) but struggle with deep procedural logic. * **Reasoning Bottleneck:** Performance drops significantly on **Step Ordering** and **Protocol Generation** (BLEU < 15%), highlighting the difficulty of managing temporal dependencies. * **Bio-specific Models:** Interestingly, some bio-specific models lag behind general LLMs in capturing intricate procedural dependencies, suggesting a need for larger reasoning capacity. --- ## 🤝 Contributing & Contact We welcome contributions such as new protocol sources, additional domains, or novel tasks! - **Email:** sunshineliuyuyang@gmail.com - **Issues:** Feel free to open an issue on our [GitHub](https://github.com/YuyangSunshine/bioprotocolbench). ## 📜 Citation ```bibtex @misc{bioprotocolbench2025, title={BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning}, author={Yuyang Liu, Liuzhenghao Lv, Xiancheng Zhang, Jingya Wang, Li Yuan, Yonghong Tian}, year={2025}, url={[https://arxiv.org/pdf/2505.07889](https://arxiv.org/pdf/2505.07889)} }