Spaces:

Autonomous-Scientific-Agents
/

chemgraph-leaderboard

Running

Add daily ChemGraph eval pipeline and redesign leaderboard

by tdphamm - opened 9 days ago

Autonomous Scientific Agents org 9 days ago

Add daily evaluation pipeline (scripts/daily_eval.sh + scripts/chemgraph_to_leaderboard.py) that runs ChemGraph benchmarks, transforms results into leaderboard format, and pushes to HF Hub datasets
Redesign leaderboard tasks: Replace 15 old experiment-based tasks with 12 category-based tasks (SMILES Lookup, Optimization, Vibrations, Thermochemistry, Dipole, Energy, Reaction Energy)
Add Trends tab with Plotly chart showing model performance over time and 1-day/3-day/7-day rolling average summary table
Add local development mode (--local flag) to skip HF Hub downloads and scheduler for local testing
Fix Python 3.13+ compatibility by replacing make_dataclass with a plain class for AutoEvalColumn
Handle API-only models (OpenAI, Anthropic, etc.) gracefully by skipping HF Hub model checks
Update citation to published Communications Chemistry paper (2026)
Add plotly to requirements.txt and dataset/model_map.json for model name mapping

File	Changes
app.py	Trends tab, local mode, graceful error handling
dataset/model_map.json	New model name mapping
requirements.txt	Add plotly
scripts/chemgraph_to_leaderboard.py	New ETL script
scripts/daily_eval.sh	New daily eval orchestration
src/about.py	12 new task categories, updated docs/citation
src/display/utils.py	Plain class AutoEvalColumn, trend columns
src/leaderboard/aggregate.py	New trend aggregation module
src/leaderboard/read_evals.py	Category-based scoring, eval_date support
src/populate.py	Trend data loading, multi-date result handling

Autonomous Scientific Agents org 9 days ago

Closing - created accidentally while setting up the PR.

tdphamm changed discussion status to closed 9 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment