nlphuji

university

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

itay1itzhak authored a paper 1 day ago

ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs

itay1itzhak authored a paper 1 day ago

Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures

itay1itzhak authored a paper 1 day ago

Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens

View all activity

authored 3 papers 1 day ago

ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs

Paper • 2510.00857 • Published Oct 1, 2025 • 1

Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures

Paper • 2510.24081 • Published Oct 28, 2025 • 22

Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens

Paper • 2108.11193 • Published Jun 8, 2022

submitted a paper to Daily Papers 5 days ago

ScheMatiQ: From Research Question to Structured Data through Interactive Schema Discovery

Paper • 2604.09237 • Published 8 days ago • 9

authored a paper 3 months ago

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 176

updated 2 datasets 8 months ago

nlphuji/PromptSuite

Viewer • Updated Aug 17, 2025 • 36.5k • 4 • 1

nlphuji/AI_Regulation

Viewer • Updated Aug 14, 2025 • 41.6k • 3

published a dataset 8 months ago

nlphuji/AI_Regulation

Viewer • Updated Aug 14, 2025 • 41.6k • 3

updated 2 datasets 8 months ago

nlphuji/DOVE_Lite

Viewer • Updated Aug 14, 2025 • 254M • 14 • 3

nlphuji/DOVE

Viewer • Updated Aug 14, 2025 • 254M • 9 • 4

published a dataset 8 months ago

nlphuji/PromptSuite

Viewer • Updated Aug 17, 2025 • 36.5k • 4 • 1

authored a paper 9 months ago

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

Paper • 2507.07186 • Published Jul 9, 2025 • 3

in nlphuji/DOVE 10 months ago

Update README.md

#6 opened 10 months ago by

authored a paper about 1 year ago

DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation

Paper • 2503.01622 • Published Mar 3, 2025

authored 2 papers about 1 year ago

Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20, 2025 • 96

WildIFEval: Instruction Following in the Wild

Paper • 2503.06573 • Published Mar 9, 2025 • 14

published 2 datasets about 1 year ago

nlphuji/DOVE_Lite

Viewer • Updated Aug 14, 2025 • 254M • 14 • 3

nlphuji/DOVE

Viewer • Updated Aug 14, 2025 • 254M • 9 • 4

authored a paper about 1 year ago

Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models

Paper • 2502.08130 • Published Feb 12, 2025 • 9

authored a paper about 1 year ago

Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs

Paper • 2502.12964 • Published Feb 18, 2025 • 3