hallucinations-leaderboard

community

https://www.neuralnoise.com

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

pingnieuk authored a paper 7 days ago

ClawBench: Can AI Agents Complete Everyday Online Tasks?

pingnieuk authored a paper 12 days ago

Watch Before You Answer: Learning from Visually Grounded Post-Training

pminervini authored a paper about 1 month ago

Agentic Uncertainty Reveals Agentic Overconfidence

View all activity

authored a paper 7 days ago

ClawBench: Can AI Agents Complete Everyday Online Tasks?

Paper • 2604.08523 • Published 12 days ago • 255

authored a paper 12 days ago

Watch Before You Answer: Learning from Visually Grounded Post-Training

Paper • 2604.05117 • Published 15 days ago • 35

authored 3 papers about 1 month ago

Agentic Uncertainty Reveals Agentic Overconfidence

Paper • 2602.06948 • Published Feb 6

Complex Query Answering with Neural Link Predictors

Paper • 2011.03459 • Published Nov 6, 2020

Rethinking the Harmonic Loss via Non-Euclidean Distance Layers

Paper • 2603.10225 • Published Mar 10

authored a paper about 1 month ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published Mar 12 • 65

authored 3 papers 2 months ago

MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

Paper • 2412.07067 • Published Dec 10, 2024

TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting

Paper • 2504.09588 • Published Apr 13, 2025

Context Forcing: Consistent Autoregressive Video Generation with Long Context

Paper • 2602.06028 • Published Feb 5 • 36

authored 2 papers 2 months ago

Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs

Paper • 2512.05648 • Published Dec 5, 2025

The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?

Paper • 2601.23045 • Published Jan 30

authored a paper 6 months ago

OpenSIR: Open-Ended Self-Improving Reasoner

Paper • 2511.00602 • Published Nov 1, 2025 • 21

authored a paper 6 months ago

VisCoder2: Building Multi-Language Visualization Coding Agents

Paper • 2510.23642 • Published Oct 24, 2025 • 22

authored 7 papers 6 months ago

Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs

Paper • 2410.15438 • Published Oct 20, 2024

PosterSum: A Multimodal Benchmark for Scientific Poster Summarization

Paper • 2502.17540 • Published Feb 24, 2025 • 3

Self-Training Large Language Models for Tool-Use Without Demonstrations

Paper • 2502.05867 • Published Feb 9, 2025

Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression

Paper • 2503.02812 • Published Mar 4, 2025 • 10

Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain

Paper • 2307.03042 • Published Jul 6, 2023

An Analysis of Decoding Methods for LLM-based Agents for Faithful Multi-Hop Question Answering

Paper • 2503.23415 • Published Mar 30, 2025 • 1

MedDistant19: Towards an Accurate Benchmark for Broad-Coverage Biomedical Relation Extraction

Paper • 2204.04779 • Published Apr 10, 2022