leaderboard / index.html
nthakur's picture
Add release-date plots to HF leaderboard Space.
dc4ca26
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>FreshStack Leaderboard</title>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600&family=Outfit:wght@400;500;700&display=swap" rel="stylesheet">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/all.min.css">
<link rel="stylesheet" href="./style.css">
<script src="https://cdn.plot.ly/plotly-2.35.2.min.js" charset="utf-8"></script>
</head>
<body>
<div class="bg-blobs">
<div class="blob blob-1"></div>
<div class="blob blob-2"></div>
</div>
<header>
<h1>FreshStack Leaderboard</h1>
<p class="subtitle">Realistic Retrieval Benchmarking on Technical Documentation</p>
<p class="intro">
FreshStack is a holistic framework for building realistic & challenging RAG benchmarks from community-asked questions and answers on niche and fast-growing domains. FreshStack evaluates retrieval models on five domains: <b>LangChain</b>, <b>Yolo v7 &amp; v8</b>, <b>Laravel 10 &amp; 11</b>,
<b>Angular 16, 17 &amp; 18</b>, and <b>Godot4</b>. Metrics include <b>alpha-nDCG@10</b>, <b>Coverage@20</b>, and <b>Recall@50</b>.
</p>
<div class="top-actions">
<a href="https://openreview.net/forum?id=54TTgXlS2U" target="_blank" class="action-btn"><i class="fa-solid fa-file-lines"></i> Paper</a>
<button class="action-btn" id="toggle-metrics"><i class="fa-solid fa-chart-line"></i> Metric Details</button>
<a href="https://github.com/fresh-stack/freshstack" target="_blank" class="action-btn"><i class="fa-brands fa-github"></i> Code</a>
<a href="https://huggingface.co/freshstack" target="_blank" class="action-btn"><i class="fa-solid fa-database"></i> Dataset</a>
<a href="https://fresh-stack.github.io/" target="_blank" class="action-btn"><i class="fa-solid fa-house"></i> Project Home</a>
<button class="action-btn" id="toggle-submit"><i class="fa-solid fa-paper-plane"></i> Submit Here</button>
</div>
<div id="metrics-panel" class="panel hidden">
<p><b>alpha-nDCG@10 (α@10)</b>: diversity-aware ranking metric based on nDCG@10 but penalizes redundant documents (i.e., documents supporting the same nugget) by a geometric factor of alpha. Read more in <a href="https://dl.acm.org/doi/abs/10.1145/1390334.1390446" target="_blank">[Clarke et al. 2008]</a>.</p>
<p><b>Coverage@20 (C@20)</b>: fraction of unique nuggets supported by top-20 retrieved documents. Defined in our <a href="https://openreview.net/forum?id=54TTgXlS2U" target="_blank">[paper]</a>.</p>
<p><b>Recall@50 (R@50)</b>: traditional retrieval metric measuring the fraction of relevant documents retrieved in top-50 documents.</p>
</div>
<div id="submit-panel" class="panel hidden">
<p>Submit your results by adding a new row to <code>leaderboard_data.json</code> and opening a PR.</p>
<p><a href="https://github.com/fresh-stack/fresh-stack.github.io/blob/master/leaderboard_data.json" target="_blank">Open leaderboard_data.json</a></p>
<textarea readonly rows="14">{
"info": {
"name": "Your Model Name", // try to follow the format of other models
"size": "600M", // in millions (<1B) or billions (7B)
"type": "open_source", // open_source, proprietary
"date": "2026-04-07", // date of model release
"link": "https://model-or-paper-link" // link to model or documentation
},
"datasets": {
"langchain": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000},
"yolo": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000},
"laravel": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000},
"angular": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000},
"godot": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000},
"average": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000}
}
}</textarea>
</div>
</header>
<main>
<div class="controls">
<input type="text" id="search" placeholder="Search retriever..." />
<div class="types">
<label><input type="checkbox" class="type-filter" value="open_source" checked> Open Source</label>
<label><input type="checkbox" class="type-filter" value="proprietary" checked> Proprietary</label>
<label><input type="checkbox" class="type-filter" value="upper_baseline"> Oracle</label>
</div>
</div>
<div class="table-outer">
<div class="table-wrap">
<table id="leaderboard-table">
<thead>
<tr id="header-row-top"></tr>
<tr id="header-row-sub"></tr>
</thead>
<tbody id="body-row"></tbody>
</table>
</div>
</div>
<section class="plots">
<h3>FreshStack Metrics vs. Model Parameters</h3>
<p class="plot-sub">Average scores across 5 domains vs model parameter size; points are colored by model family.</p>
<div id="plot-avg-alpha10" class="plot-box"></div>
<div id="plot-avg-c20" class="plot-box"></div>
<div id="plot-avg-r50" class="plot-box"></div>
</section>
<section class="plots">
<h3>FreshStack Metrics vs. Model Release Date</h3>
<p class="plot-sub">Average scores across 5 domains vs model release date; points are colored by model family.</p>
<div id="plot-date-avg-alpha10" class="plot-box"></div>
<div id="plot-date-avg-c20" class="plot-box"></div>
<div id="plot-date-avg-r50" class="plot-box"></div>
</section>
<section class="citation">
<div class="citation-head">
<h3>Cite FreshStack</h3>
<button id="copy-citation-btn"><i class="fa-regular fa-copy"></i> Copy</button>
</div>
<pre id="citation-text">@inproceedings{
thakur2025freshstack,
title={FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents},
author={Nandan Thakur and Jimmy Lin and Sam Havens and Michael Carbin and Omar Khattab and Andrew Drozdov},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2025},
url={https://openreview.net/forum?id=54TTgXlS2U}
}</pre>
</section>
</main>
<script type="module" src="./main.js"></script>
</body>
</html>