Patronus AI

company

Verified

https://patronus.ai

Activity Feed Request to join this org

AI & ML interests

LLM Evaluation

Recent Activity

akkikiki authored a paper 1 day ago

Unlocking Prompt Infilling Capability for Diffusion Language Models

DarshanDeshpande submitted a paper 2 months ago

Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis

akkikiki authored a paper 6 months ago

Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge

View all activity

Papers

Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis

MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments

View all Papers

PatronusAI 's Spaces 5

TRAIL Leaderboard

Trace Reasoning and Agentic Issue Localization Leaderboard

Enterprise Scenarios Leaderboard

BLUR Leaderboard

BLUR leaderboard.

GLIDER

GLIDER: Grading LLM Interactions and Decisions using Explain

LynxDemo

Evaluate answer fidelity to document