Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2506.14028

MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Paper • 2506.14028 • Published Jun 16, 2025 • 94
One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published Jul 11, 2025 • 32
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17, 2025 • 263

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5, 2025 • 135
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Paper • 2506.14028 • Published Jun 16, 2025 • 94
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities

Paper • 2507.19766 • Published Jul 26, 2025 • 15

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2, 2025 • 9
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10, 2025 • 44
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14, 2025 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14, 2025 • 85

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 32
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1, 2025 • 110
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 27
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 107

Papers-Benchmarks

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Paper • 2406.08587 • Published Jun 12, 2024 • 16
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

Paper • 2406.09170 • Published Jun 13, 2024 • 27
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26, 2024 • 35
Benchmarking Agentic Workflow Generation

Paper • 2410.07869 • Published Oct 10, 2024 • 29

MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Paper • 2506.14028 • Published Jun 16, 2025 • 94

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Paper • 2505.13227 • Published May 19, 2025 • 45
facebook/natural_reasoning

Viewer • Updated Feb 21, 2025 • 1.15M • 1.49k • 562
nvidia/OpenMathReasoning

Viewer • Updated May 27, 2025 • 5.68M • 17.4k • 453
Search Arena: Analyzing Search-Augmented LLMs

Paper • 2506.05334 • Published Jun 5, 2025 • 18

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28, 2025 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29, 2025 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23, 2025 • 13

LLM + Datasets : Finance

MMMU/MMMU

Viewer • Updated Feb 12 • 11.6k • 71.9k • 325
takala/financial_phrasebank

Updated Dec 15, 2025 • 11.3k • 251
zeroshot/twitter-financial-news-sentiment

Viewer • Updated Feb 23, 2024 • 11.9k • 4.93k • 167
yixuantt/FinEntity

Viewer • Updated Jan 24, 2024 • 979 • 23 • 4

Large Language Model (LLM) and NLP related papers.

LoRA+: Efficient Low Rank Adaptation of Large Models

Paper • 2402.12354 • Published Feb 19, 2024 • 7
The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20, 2024 • 24
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20, 2024 • 15
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69

MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Paper • 2506.14028 • Published Jun 16, 2025 • 94
One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published Jul 11, 2025 • 32
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17, 2025 • 263

MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Paper • 2506.14028 • Published Jun 16, 2025 • 94

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5, 2025 • 135
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Paper • 2506.14028 • Published Jun 16, 2025 • 94
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities

Paper • 2507.19766 • Published Jul 26, 2025 • 15

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Paper • 2505.13227 • Published May 19, 2025 • 45
facebook/natural_reasoning

Viewer • Updated Feb 21, 2025 • 1.15M • 1.49k • 562
nvidia/OpenMathReasoning

Viewer • Updated May 27, 2025 • 5.68M • 17.4k • 453
Search Arena: Analyzing Search-Augmented LLMs

Paper • 2506.05334 • Published Jun 5, 2025 • 18

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2, 2025 • 9
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10, 2025 • 44
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14, 2025 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14, 2025 • 85

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28, 2025 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29, 2025 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23, 2025 • 13

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 32
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1, 2025 • 110
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 27
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 107

LLM + Datasets : Finance

MMMU/MMMU

Viewer • Updated Feb 12 • 11.6k • 71.9k • 325
takala/financial_phrasebank

Updated Dec 15, 2025 • 11.3k • 251
zeroshot/twitter-financial-news-sentiment

Viewer • Updated Feb 23, 2024 • 11.9k • 4.93k • 167
yixuantt/FinEntity

Viewer • Updated Jan 24, 2024 • 979 • 23 • 4

Papers-Benchmarks

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Paper • 2406.08587 • Published Jun 12, 2024 • 16
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

Paper • 2406.09170 • Published Jun 13, 2024 • 27
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26, 2024 • 35
Benchmarking Agentic Workflow Generation

Paper • 2410.07869 • Published Oct 10, 2024 • 29

Large Language Model (LLM) and NLP related papers.

LoRA+: Efficient Low Rank Adaptation of Large Models

Paper • 2402.12354 • Published Feb 19, 2024 • 7
The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20, 2024 • 24
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20, 2024 • 15
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs