Spaces:

Autonomous-Scientific-Agents
/

chemgraph-leaderboard

Running

App Files Files Community

Add daily ChemGraph eval pipeline and redesign leaderboard tasks

by tdphamm - opened 20 days ago

Discussion

tdphamm

Autonomous Scientific Agents org 20 days ago

Summary

Redesign leaderboard tasks: Replace 15 paper experiments (exp1-exp15) with 8 task categories derived from ChemGraph's 14 ground-truth evaluation queries
Add automated eval pipeline: New scripts/chemgraph_to_leaderboard.py transforms ChemGraph benchmark results into leaderboard-compatible JSON files + pushes to HF Hub
Add cron wrapper: scripts/daily_eval.sh runs daily evals and updates the leaderboard automatically
Fix API model support: API-only models (OpenAI, Anthropic) no longer fail the HF Hub existence check
Add --local flag: python app.py --local skips HF Hub downloads for local testing

Task Categories (8 groups from 14 queries)

Task	Queries	Description
SMILES Lookup	q1-q2	Name to SMILES string
Coordinate Gen	q3-q4	SMILES to 3D coordinates
Geometry Opt	q5	Geometry optimization
Vib Frequency	q6	Vibrational frequency analysis
Thermochem	q7	Thermochemical properties
Dipole	q8	Dipole moment calculation
Energy	q9-q11	Single-point energy + JSON extraction
Reaction Gibbs	q12-q14	Reaction Gibbs free energy

Files Changed

File	Change
scripts/chemgraph_to_leaderboard.py	New - Core transform script
scripts/daily_eval.sh	New - Cron wrapper for daily eval
dataset/model_map.json	Updated model name mappings
src/about.py	Redesigned task definitions + updated UI text
src/leaderboard/read_evals.py	Fixed hub check, request matching, status handling
src/populate.py	Graceful empty DataFrame handling
app.py	Added --local flag, empty DF guard

Testing

Tested locally with python app.py --local against ChemGraph eval results. 5 models load correctly with all 8 task categories populated. Gradio UI serves HTTP 200.

Next Steps (after merge)

Push new results/request data to HF Hub datasets with --push-to-hub
Set up crontab for daily evaluation runs

tdphamm changed discussion status to closed 9 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment