File size: 2,181 Bytes
017b3a7 e9e6671 017b3a7 e9e6671 60966fd 017b3a7 e9e6671 017b3a7 e9e6671 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | ---
title: RPC-Bench Leaderboard
emoji: π
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
python_version: 3.12
app_file: app.py
pinned: false
license: mit
---
<p align="center">
π <a href="https://rpc-bench.github.io/" target="_blank">Project Page</a> β’
π» <a href="https://github.com/RPC-Bench/PRC-Bench" target="_blank">GitHub</a> β’
π <a href="https://arxiv.org/abs/2601.14289" target="_blank">Paper</a> β’
π€ <a href="https://huggingface.co" target="_blank">Hugging Face</a> β’
π§ <a href="https://community.modelscope.cn/" target="_blank">ModelScope</a>
</p>
# RPC-Bench Leaderboard
RPC-Bench is a benchmark for research paper comprehension. This Space provides two functions:
- a public leaderboard for published submissions
- a submission entry for uploading new evaluation files
## Expected repository layout
The Space is designed to work with a separate submission dataset repository.
```text
space/
βββ app.py
βββ constants.py
βββ eval.py
βββ requirements.txt
βββ benchmark/
βββ dev.json
βββ test.json
```
If `benchmark/dev.json` and `benchmark/test.json` are not bundled in the Space repo, set `RPC_BENCH_GOLD_DIR` or `RPC_BENCH_GOLD_PATH` through Space secrets / variables.
The static leaderboard seed is stored in `leaderboard_seed.csv`. `index.html` is only used locally to generate that CSV and should not be uploaded to the Space repository.
## Submission format
Uploaded files should be JSONL with one answer per line:
```json
{"id":"...", "part_idx":1, "question":"...", "gen_answer":"...", "category":"..."}
```
## Required environment variables
- `HF_TOKEN`: token for cloning and pushing the submission repository
- `SUBMISSION_REPO_ID`: dataset repo used to store leaderboard results
- `RPC_BENCH_GOLD_DIR`: optional directory containing `dev.json` and `test.json`
- `OPENAI_API_KEY`: optional, required if you want the Space to run LLM-based judging inline
- `OPENAI_BASE_URL`: optional, for OpenAI-compatible endpoints
The Space can still accept uploads when the judge variables are missing, but evaluation will be marked as pending.
|