test / README.md
QAQ123's picture
Upload RPC-Bench Space
60966fd verified

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: RPC-Bench Leaderboard
emoji: πŸ“Š
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 4.44.1
python_version: 3.12
app_file: app.py
pinned: false
license: mit

🌐 Project Page β€’ πŸ’» GitHub β€’ πŸ“– Paper β€’ πŸ€— Hugging Face β€’ 🧭 ModelScope

RPC-Bench Leaderboard

RPC-Bench is a benchmark for research paper comprehension. This Space provides two functions:

  • a public leaderboard for published submissions
  • a submission entry for uploading new evaluation files

Expected repository layout

The Space is designed to work with a separate submission dataset repository.

space/
β”œβ”€β”€ app.py
β”œβ”€β”€ constants.py
β”œβ”€β”€ eval.py
β”œβ”€β”€ requirements.txt
└── benchmark/
    β”œβ”€β”€ dev.json
    └── test.json

If benchmark/dev.json and benchmark/test.json are not bundled in the Space repo, set RPC_BENCH_GOLD_DIR or RPC_BENCH_GOLD_PATH through Space secrets / variables.

The static leaderboard seed is stored in leaderboard_seed.csv. index.html is only used locally to generate that CSV and should not be uploaded to the Space repository.

Submission format

Uploaded files should be JSONL with one answer per line:

{"id":"...", "part_idx":1, "question":"...", "gen_answer":"...", "category":"..."}

Required environment variables

  • HF_TOKEN: token for cloning and pushing the submission repository
  • SUBMISSION_REPO_ID: dataset repo used to store leaderboard results
  • RPC_BENCH_GOLD_DIR: optional directory containing dev.json and test.json
  • OPENAI_API_KEY: optional, required if you want the Space to run LLM-based judging inline
  • OPENAI_BASE_URL: optional, for OpenAI-compatible endpoints

The Space can still accept uploads when the judge variables are missing, but evaluation will be marked as pending.