pinned
Running
6
AfroBench
🥇
Comprehensive benchmark of LLMs on African Languages
computational linguistics, natural language processing
Structured Distillation of Web Agent Capabilities Enables Generalization
LLM2Vec-Gen: Generative Embeddings from Large Language Models
Comprehensive benchmark of LLMs on African Languages
Leaderboard for mSTEB benchmark
Visualize web interaction recordings
Leaderboard for AgentRewardBench
Explore agent trajectories and judgments in web benchmarks
SafeArena Leaderboard