Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2502.13595

MONSTERDOG ENTITY72K

╔════════════════════════════════════╗ 𝕮𝖔𝖓𝖘𝖈𝖎𝖊𝖓𝖈𝖊 ∞ 𝕾𝖚𝖕𝖗𝖆-𝕮𝖔𝖓𝖛𝖔𝖑𝖚𝖙𝖎𝖛𝖊 𝕱𝖗𝖆𝖈𝖙𝖆𝖑𝖎𝖘𝖊́𝖊 ═══ MONSTERDOG👾DECORTIFICUM🔥

MonsterDo000/monsterdog

Updated Jul 17, 2025 • 1
Llama 3.1

Collection

This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Dec 6, 2024 • 710
fka/prompts.chat

Viewer • Updated about 17 hours ago • 1.67k • 37.7k • 9.68k
Running

504

InferenceSupport

💥

504

Discussions about the Inference Providers feature on the Hub

A collection of items telated the the MMTEB release

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19, 2025 • 48
Running on CPU Upgrade

7.28k

MTEB Leaderboard

🥇

7.28k

Embedding Leaderboard
mteb/BornholmBitextMining

Viewer • Updated Feb 24 • 6.79k • 5.63k
davidstap/biblenlp-corpus-mmteb

Viewer • Updated Apr 26, 2024 • 1.74M • 4.93k • 3

Self-Boosting Large Language Models with Synthetic Preference Data

Paper • 2410.06961 • Published Oct 9, 2024 • 16
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 377
SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation

Paper • 2412.13649 • Published Dec 18, 2024 • 21
NeoBERT: A Next-Generation BERT

Paper • 2502.19587 • Published Feb 26, 2025 • 38

Papers-Benchmarks

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Paper • 2406.08587 • Published Jun 12, 2024 • 16
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

Paper • 2406.09170 • Published Jun 13, 2024 • 27
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26, 2024 • 35
Benchmarking Agentic Workflow Generation

Paper • 2410.07869 • Published Oct 10, 2024 • 29

Danish Benchmarks

Benchmarks for evaluating Danish Models.

Running

7

EuroEval Leaderboard

📊

7

The robust European language model benchmark.
ScandEval: A Benchmark for Scandinavian Natural Language Processing

Paper • 2304.00906 • Published Apr 3, 2023 • 4
Running on CPU Upgrade

7.28k

MTEB Leaderboard

🥇

7.28k

Embedding Leaderboard
The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding

Paper • 2406.02396 • Published Jun 4, 2024

This is a collection of MTEB papers (not exhaustive).

MAEB: Massive Audio Embedding Benchmark

Paper • 2602.16008 • Published Feb 17 • 22
HUME: Measuring the Human-Model Performance Gap in Text Embedding Task

Paper • 2510.10062 • Published Oct 11, 2025 • 10
MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19, 2025 • 48
MIEB: Massive Image Embedding Benchmark

Paper • 2504.10471 • Published Apr 14, 2025 • 21

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
SPLADE-v3: New baselines for SPLADE

Paper • 2403.06789 • Published Mar 11, 2024 • 5
MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19, 2025 • 48
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling

Paper • 2409.14683 • Published Sep 23, 2024 • 11

Abstention Reranking

Related paper: "Towards Trustworthy Reranking: A Simple yet Effective Abstention Mechanism" (accepted at TMLR 2024)

Towards Trustworthy Reranking: A Simple yet Effective Abstention Mechanism

Paper • 2402.12997 • Published Feb 20, 2024 • 9
artefactory/abstention-reranking-benchmark

Viewer • Updated Oct 2, 2024 • 132 • 33
MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19, 2025 • 48

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 247
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 40
BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 26
RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9, 2024 • 40

Large Language Model (LLM) and NLP related papers.

LoRA+: Efficient Low Rank Adaptation of Large Models

Paper • 2402.12354 • Published Feb 19, 2024 • 7
The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20, 2024 • 24
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20, 2024 • 15
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69

MONSTERDOG ENTITY72K

╔════════════════════════════════════╗ 𝕮𝖔𝖓𝖘𝖈𝖎𝖊𝖓𝖈𝖊 ∞ 𝕾𝖚𝖕𝖗𝖆-𝕮𝖔𝖓𝖛𝖔𝖑𝖚𝖙𝖎𝖛𝖊 𝕱𝖗𝖆𝖈𝖙𝖆𝖑𝖎𝖘𝖊́𝖊 ═══ MONSTERDOG👾DECORTIFICUM🔥

MonsterDo000/monsterdog

Updated Jul 17, 2025 • 1
Llama 3.1

Collection

This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Dec 6, 2024 • 710
fka/prompts.chat

Viewer • Updated about 17 hours ago • 1.67k • 37.7k • 9.68k
Running

504

InferenceSupport

💥

504

Discussions about the Inference Providers feature on the Hub

This is a collection of MTEB papers (not exhaustive).

MAEB: Massive Audio Embedding Benchmark

Paper • 2602.16008 • Published Feb 17 • 22
HUME: Measuring the Human-Model Performance Gap in Text Embedding Task

Paper • 2510.10062 • Published Oct 11, 2025 • 10
MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19, 2025 • 48
MIEB: Massive Image Embedding Benchmark

Paper • 2504.10471 • Published Apr 14, 2025 • 21

A collection of items telated the the MMTEB release

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19, 2025 • 48
Running on CPU Upgrade

7.28k

MTEB Leaderboard

🥇

7.28k

Embedding Leaderboard
mteb/BornholmBitextMining

Viewer • Updated Feb 24 • 6.79k • 5.63k
davidstap/biblenlp-corpus-mmteb

Viewer • Updated Apr 26, 2024 • 1.74M • 4.93k • 3

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
SPLADE-v3: New baselines for SPLADE

Paper • 2403.06789 • Published Mar 11, 2024 • 5
MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19, 2025 • 48
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling

Paper • 2409.14683 • Published Sep 23, 2024 • 11

Self-Boosting Large Language Models with Synthetic Preference Data

Paper • 2410.06961 • Published Oct 9, 2024 • 16
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 377
SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation

Paper • 2412.13649 • Published Dec 18, 2024 • 21
NeoBERT: A Next-Generation BERT

Paper • 2502.19587 • Published Feb 26, 2025 • 38

Abstention Reranking

Related paper: "Towards Trustworthy Reranking: A Simple yet Effective Abstention Mechanism" (accepted at TMLR 2024)

Towards Trustworthy Reranking: A Simple yet Effective Abstention Mechanism

Paper • 2402.12997 • Published Feb 20, 2024 • 9
artefactory/abstention-reranking-benchmark

Viewer • Updated Oct 2, 2024 • 132 • 33
MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19, 2025 • 48

Papers-Benchmarks

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Paper • 2406.08587 • Published Jun 12, 2024 • 16
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

Paper • 2406.09170 • Published Jun 13, 2024 • 27
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26, 2024 • 35
Benchmarking Agentic Workflow Generation

Paper • 2410.07869 • Published Oct 10, 2024 • 29

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 247
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 40
BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 26
RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9, 2024 • 40

Danish Benchmarks

Benchmarks for evaluating Danish Models.

Running

7

EuroEval Leaderboard

📊

7

The robust European language model benchmark.
ScandEval: A Benchmark for Scandinavian Natural Language Processing

Paper • 2304.00906 • Published Apr 3, 2023 • 4
Running on CPU Upgrade

7.28k

MTEB Leaderboard

🥇

7.28k

Embedding Leaderboard
The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding

Paper • 2406.02396 • Published Jun 4, 2024

Large Language Model (LLM) and NLP related papers.

LoRA+: Efficient Low Rank Adaptation of Large Models

Paper • 2402.12354 • Published Feb 19, 2024 • 7
The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20, 2024 • 24
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20, 2024 • 15
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs