| --- |
| title: RPC-Bench Leaderboard |
| emoji: π |
| colorFrom: indigo |
| colorTo: purple |
| sdk: gradio |
| sdk_version: 4.44.1 |
| python_version: 3.12 |
| app_file: app.py |
| pinned: false |
| license: mit |
| --- |
| |
| <p align="center"> |
| π <a href="https://rpc-bench.github.io/" target="_blank">Project Page</a> β’ |
| π» <a href="https://github.com/RPC-Bench/PRC-Bench" target="_blank">GitHub</a> β’ |
| π <a href="https://arxiv.org/abs/2601.14289" target="_blank">Paper</a> β’ |
| π€ <a href="https://huggingface.co" target="_blank">Hugging Face</a> β’ |
| π§ <a href="https://community.modelscope.cn/" target="_blank">ModelScope</a> |
| </p> |
| |
| # RPC-Bench Leaderboard |
|
|
| RPC-Bench is a benchmark for research paper comprehension. This Space provides two functions: |
|
|
| - a public leaderboard for published submissions |
| - a submission entry for uploading new evaluation files |
|
|
| ## Expected repository layout |
|
|
| The Space is designed to work with a separate submission dataset repository. |
|
|
| ```text |
| space/ |
| βββ app.py |
| βββ constants.py |
| βββ eval.py |
| βββ requirements.txt |
| βββ benchmark/ |
| βββ dev.json |
| βββ test.json |
| ``` |
|
|
| If `benchmark/dev.json` and `benchmark/test.json` are not bundled in the Space repo, set `RPC_BENCH_GOLD_DIR` or `RPC_BENCH_GOLD_PATH` through Space secrets / variables. |
|
|
| The static leaderboard seed is stored in `leaderboard_seed.csv`. `index.html` is only used locally to generate that CSV and should not be uploaded to the Space repository. |
|
|
| ## Submission format |
|
|
| Uploaded files should be JSONL with one answer per line: |
|
|
| ```json |
| {"id":"...", "part_idx":1, "question":"...", "gen_answer":"...", "category":"..."} |
| ``` |
|
|
| ## Required environment variables |
|
|
| - `HF_TOKEN`: token for cloning and pushing the submission repository |
| - `SUBMISSION_REPO_ID`: dataset repo used to store leaderboard results |
| - `RPC_BENCH_GOLD_DIR`: optional directory containing `dev.json` and `test.json` |
| - `OPENAI_API_KEY`: optional, required if you want the Space to run LLM-based judging inline |
| - `OPENAI_BASE_URL`: optional, for OpenAI-compatible endpoints |
|
|
| The Space can still accept uploads when the judge variables are missing, but evaluation will be marked as pending. |
|
|