Spaces:
Running
Running
| <html lang="en"> | |
| <head> | |
| <meta charset="UTF-8" /> | |
| <meta name="viewport" content="width=device-width, initial-scale=1.0" /> | |
| <title>FreshStack Leaderboard</title> | |
| <link rel="preconnect" href="https://fonts.googleapis.com"> | |
| <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin> | |
| <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600&family=Outfit:wght@400;500;700&display=swap" rel="stylesheet"> | |
| <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/all.min.css"> | |
| <link rel="stylesheet" href="./style.css"> | |
| <script src="https://cdn.plot.ly/plotly-2.35.2.min.js" charset="utf-8"></script> | |
| </head> | |
| <body> | |
| <div class="bg-blobs"> | |
| <div class="blob blob-1"></div> | |
| <div class="blob blob-2"></div> | |
| </div> | |
| <header> | |
| <h1>FreshStack Leaderboard</h1> | |
| <p class="subtitle">Realistic Retrieval Benchmarking on Technical Documentation</p> | |
| <p class="intro"> | |
| FreshStack is a holistic framework for building realistic & challenging RAG benchmarks from community-asked questions and answers on niche and fast-growing domains. FreshStack evaluates retrieval models on five domains: <b>LangChain</b>, <b>Yolo v7 & v8</b>, <b>Laravel 10 & 11</b>, | |
| <b>Angular 16, 17 & 18</b>, and <b>Godot4</b>. Metrics include <b>alpha-nDCG@10</b>, <b>Coverage@20</b>, and <b>Recall@50</b>. | |
| </p> | |
| <div class="top-actions"> | |
| <a href="https://openreview.net/forum?id=54TTgXlS2U" target="_blank" class="action-btn"><i class="fa-solid fa-file-lines"></i> Paper</a> | |
| <button class="action-btn" id="toggle-metrics"><i class="fa-solid fa-chart-line"></i> Metric Details</button> | |
| <a href="https://github.com/fresh-stack/freshstack" target="_blank" class="action-btn"><i class="fa-brands fa-github"></i> Code</a> | |
| <a href="https://huggingface.co/freshstack" target="_blank" class="action-btn"><i class="fa-solid fa-database"></i> Dataset</a> | |
| <a href="https://fresh-stack.github.io/" target="_blank" class="action-btn"><i class="fa-solid fa-house"></i> Project Home</a> | |
| <button class="action-btn" id="toggle-submit"><i class="fa-solid fa-paper-plane"></i> Submit Here</button> | |
| </div> | |
| <div id="metrics-panel" class="panel hidden"> | |
| <p><b>alpha-nDCG@10 (α@10)</b>: diversity-aware ranking metric based on nDCG@10 but penalizes redundant documents (i.e., documents supporting the same nugget) by a geometric factor of alpha. Read more in <a href="https://dl.acm.org/doi/abs/10.1145/1390334.1390446" target="_blank">[Clarke et al. 2008]</a>.</p> | |
| <p><b>Coverage@20 (C@20)</b>: fraction of unique nuggets supported by top-20 retrieved documents. Defined in our <a href="https://openreview.net/forum?id=54TTgXlS2U" target="_blank">[paper]</a>.</p> | |
| <p><b>Recall@50 (R@50)</b>: traditional retrieval metric measuring the fraction of relevant documents retrieved in top-50 documents.</p> | |
| </div> | |
| <div id="submit-panel" class="panel hidden"> | |
| <p>Submit your results by adding a new row to <code>leaderboard_data.json</code> and opening a PR.</p> | |
| <p><a href="https://github.com/fresh-stack/fresh-stack.github.io/blob/master/leaderboard_data.json" target="_blank">Open leaderboard_data.json</a></p> | |
| <textarea readonly rows="14">{ | |
| "info": { | |
| "name": "Your Model Name", // try to follow the format of other models | |
| "size": "600M", // in millions (<1B) or billions (7B) | |
| "type": "open_source", // open_source, proprietary | |
| "date": "2026-04-07", // date of model release | |
| "link": "https://model-or-paper-link" // link to model or documentation | |
| }, | |
| "datasets": { | |
| "langchain": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000}, | |
| "yolo": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000}, | |
| "laravel": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000}, | |
| "angular": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000}, | |
| "godot": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000}, | |
| "average": {"alpha_ndcg_10": 0.000, "coverage_20": 0.000, "recall_50": 0.000} | |
| } | |
| }</textarea> | |
| </div> | |
| </header> | |
| <main> | |
| <div class="controls"> | |
| <input type="text" id="search" placeholder="Search retriever..." /> | |
| <div class="types"> | |
| <label><input type="checkbox" class="type-filter" value="open_source" checked> Open Source</label> | |
| <label><input type="checkbox" class="type-filter" value="proprietary" checked> Proprietary</label> | |
| <label><input type="checkbox" class="type-filter" value="upper_baseline"> Oracle</label> | |
| </div> | |
| </div> | |
| <div class="table-outer"> | |
| <div class="table-wrap"> | |
| <table id="leaderboard-table"> | |
| <thead> | |
| <tr id="header-row-top"></tr> | |
| <tr id="header-row-sub"></tr> | |
| </thead> | |
| <tbody id="body-row"></tbody> | |
| </table> | |
| </div> | |
| </div> | |
| <section class="plots"> | |
| <h3>FreshStack Metrics vs. Model Parameters</h3> | |
| <p class="plot-sub">Average scores across 5 domains vs model parameter size; points are colored by model family.</p> | |
| <div id="plot-avg-alpha10" class="plot-box"></div> | |
| <div id="plot-avg-c20" class="plot-box"></div> | |
| <div id="plot-avg-r50" class="plot-box"></div> | |
| </section> | |
| <section class="plots"> | |
| <h3>FreshStack Metrics vs. Model Release Date</h3> | |
| <p class="plot-sub">Average scores across 5 domains vs model release date; points are colored by model family.</p> | |
| <div id="plot-date-avg-alpha10" class="plot-box"></div> | |
| <div id="plot-date-avg-c20" class="plot-box"></div> | |
| <div id="plot-date-avg-r50" class="plot-box"></div> | |
| </section> | |
| <section class="citation"> | |
| <div class="citation-head"> | |
| <h3>Cite FreshStack</h3> | |
| <button id="copy-citation-btn"><i class="fa-regular fa-copy"></i> Copy</button> | |
| </div> | |
| <pre id="citation-text">@inproceedings{ | |
| thakur2025freshstack, | |
| title={FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents}, | |
| author={Nandan Thakur and Jimmy Lin and Sam Havens and Michael Carbin and Omar Khattab and Andrew Drozdov}, | |
| booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track}, | |
| year={2025}, | |
| url={https://openreview.net/forum?id=54TTgXlS2U} | |
| }</pre> | |
| </section> | |
| </main> | |
| <script type="module" src="./main.js"></script> | |
| </body> | |
| </html> | |