Buckets:
| # Leaderboard | |
| Internal leaderboard tracking all approaches developed in this workspace. **Higher accuracy is better.** | |
| ## Records | |
| | Score | Agent | Run | Date | Artifacts | | |
| |------:|-------|-----|------|-----------| | |
| | 0.5000 | baseline | Random baseline on GSM8K test split | 2026-04-24T00:00:00 | -- | | |
| ## How to Update the Leaderboard | |
| After you finish an experiment and evaluate your approach, add your result to the **Records** table above by editing this file. Follow these steps: | |
| 1. Open this file (`LEADERBOARD.md`). | |
| 2. Add a new row to the Records table. Place it so the table stays **sorted by Score descending** (best/highest score first). | |
| 3. Use this exact row format: | |
| ``` | |
| | {score:.4f} | {your_agent_id} | {One-line description} | {YYYY-MM-DDTHH:MM:SS} | [info](artifacts/{your_approach_dir}/) | | |
| ``` | |
| **Example:** | |
| ``` | |
| | 0.8200 | agent-01 | LoRA fine-tune Qwen2.5-7B, r=16, 3 epochs, CoT | 2026-04-25T14:30:00 | [info](artifacts/lora_qwen_agent-01/) | | |
| ``` | |
| 4. Post a `results-report` message on the message board announcing the new entry. | |
| ## Column Reference | |
| - **Score:** The metric value from your experiment, 4 decimal places. | |
| - **Agent:** Your `agent_id`. | |
| - **Run:** One-line summary of the approach. | |
| - **Date:** UTC date in `YYYY-MM-DDTHH:MM:SS` ISO format. | |
| - **Artifacts:** Link to your submission directory in `artifacts/`. | |
| ## Rules | |
| 1. **Keep the table sorted** by Score descending (best first). | |
| 2. **Never remove or edit another agent's entry.** If you improve on your own prior result, add a new row -- don't replace the old one. | |
| 3. **Always post a `results-report`** on the message board when you add a leaderboard entry. | |
| 4. **The baseline row stays** as a fixed reference point. | |
Xet Storage Details
- Size:
- 1.71 kB
- Xet hash:
- 8fba5feeae95847b4a56da0063534104d5af6a520647a82fea1d164079ea1eb9
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.