Yale University

university

Verified

https://www.yale.edu/

AI & ML interests

None defined yet.

Recent Activity

RTT1 authored a paper about 1 month ago

EvoClaw: Evaluating AI Agents on Continuous Software Evolution

davidlin7777 published a Space about 1 month ago

YaleUniversity/README

qqggez submitted a paper about 1 month ago

ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution

View all activity

Papers

ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution

QEDBENCH: Quantifying the Alignment Gap in Automated Evaluation of University-Level Mathematical Proofs

View all Papers

RTT1

authored a paper about 1 month ago

EvoClaw: Evaluating AI Agents on Continuous Software Evolution

Paper • 2603.13428 • Published Mar 13 • 21

submitted a paper to Daily Papers about 1 month ago

Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training

Paper • 2603.12246 • Published Mar 12 • 5

published a Space about 1 month ago

README

submitted 2 papers to Daily Papers about 1 month ago

ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution

Paper • 2603.02510 • Published Mar 3 • 3

QEDBENCH: Quantifying the Alignment Gap in Automated Evaluation of University-Level Mathematical Proofs

Paper • 2602.20629 • Published Feb 24 • 4

authored a paper about 2 months ago

References Improve LLM Alignment in Non-Verifiable Domains

Paper • 2602.16802 • Published Feb 18 • 2

submitted a paper to Daily Papers about 2 months ago

References Improve LLM Alignment in Non-Verifiable Domains

Paper • 2602.16802 • Published Feb 18 • 2

submitted a paper to Daily Papers about 2 months ago

ResearchGym: Evaluating Language Model Agents on Real-World AI Research

Paper • 2602.15112 • Published Feb 16 • 21

RTT1

submitted a paper to Daily Papers 2 months ago

LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning

Paper • 2602.07075 • Published Feb 6 • 19

submitted a paper to Daily Papers 4 months ago

Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation

Paper • 2512.20352 • Published Dec 23, 2025 • 3

RTT1

authored a paper 9 months ago

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Paper • 2507.06229 • Published Jul 8, 2025 • 77

authored a paper 11 months ago

MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs

Paper • 2505.24858 • Published May 30, 2025 • 17

authored a paper about 1 year ago

PHYSICS: Benchmarking Foundation Models on University-Level Physics Problem Solving

Paper • 2503.21821 • Published Mar 26, 2025 • 21

RTT1

authored 2 papers about 1 year ago

MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents

Paper • 2503.01935 • Published Mar 3, 2025 • 30

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21, 2025 • 84

authored a paper about 1 year ago

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21, 2025 • 84

authored 2 papers over 1 year ago

M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models

Paper • 2411.04075 • Published Nov 6, 2024 • 16

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Paper • 2410.23266 • Published Oct 30, 2024 • 20

authored a paper over 1 year ago

MDCure: A Scalable Pipeline for Multi-Document Instruction-Following

Paper • 2410.23463 • Published Oct 30, 2024 • 3

authored a paper over 1 year ago

Learning Thresholds with Latent Values and Censored Feedback

Paper • 2312.04653 • Published Dec 7, 2023