Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2511.21750

BAAI/Emu3.5

Any-to-Any • Updated Dec 25, 2025 • 760 • 169
SO-Bench: A Structural Output Evaluation of Multimodal LLMs

Paper • 2511.21750 • Published Nov 23, 2025 • 6
NewBie-AI/NewBie-image-Exp0.1

Text-to-Image • Updated Feb 18 • 303 • 281
deepseek-ai/DeepSeek-OCR-2

Image-Text-to-Text • 3B • Updated Feb 3 • 1.35M • 916

agents-course/notebooks

Updated Feb 10 • 547
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31, 2025 • 305
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published Nov 9, 2025 • 134
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

Paper • 2511.21678 • Published Nov 26, 2025 • 12

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Paper • 2410.14059 • Published Oct 17, 2024 • 63
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7, 2025 • 46
Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6, 2025 • 96
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13, 2025 • 53

GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond

Paper • 2309.16583 • Published Sep 28, 2023 • 13
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 57
SO-Bench: A Structural Output Evaluation of Multimodal LLMs

Paper • 2511.21750 • Published Nov 23, 2025 • 6
LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics

Paper • 2512.21010 • Published Dec 24, 2025 • 4

microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated Dec 17, 2025 • 15.5k • 1.43k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14, 2025 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17, 2025 • 96 • 17
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15, 2025 • 63

Model Evaluation

Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon

Paper • 2502.07445 • Published Feb 11, 2025 • 11
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning

Paper • 2502.04689 • Published Feb 7, 2025 • 9
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

Paper • 2502.03032 • Published Feb 5, 2025 • 60
Preference Leakage: A Contamination Problem in LLM-as-a-judge

Paper • 2502.01534 • Published Feb 3, 2025 • 40

Multimodal Benchmarks

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9, 2024 • 47
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 35
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Paper • 2407.11691 • Published Jul 16, 2024 • 16
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5, 2024 • 62

Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 40
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 57
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Paper • 2411.04282 • Published Nov 6, 2024 • 37
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 25

BAAI/Emu3.5

Any-to-Any • Updated Dec 25, 2025 • 760 • 169
SO-Bench: A Structural Output Evaluation of Multimodal LLMs

Paper • 2511.21750 • Published Nov 23, 2025 • 6
NewBie-AI/NewBie-image-Exp0.1

Text-to-Image • Updated Feb 18 • 303 • 281
deepseek-ai/DeepSeek-OCR-2

Image-Text-to-Text • 3B • Updated Feb 3 • 1.35M • 916

microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated Dec 17, 2025 • 15.5k • 1.43k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14, 2025 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17, 2025 • 96 • 17
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15, 2025 • 63

agents-course/notebooks

Updated Feb 10 • 547
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31, 2025 • 305
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published Nov 9, 2025 • 134
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

Paper • 2511.21678 • Published Nov 26, 2025 • 12

Model Evaluation

Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon

Paper • 2502.07445 • Published Feb 11, 2025 • 11
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning

Paper • 2502.04689 • Published Feb 7, 2025 • 9
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

Paper • 2502.03032 • Published Feb 5, 2025 • 60
Preference Leakage: A Contamination Problem in LLM-as-a-judge

Paper • 2502.01534 • Published Feb 3, 2025 • 40

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Paper • 2410.14059 • Published Oct 17, 2024 • 63
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7, 2025 • 46
Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6, 2025 • 96
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13, 2025 • 53

Multimodal Benchmarks

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9, 2024 • 47
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 35
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Paper • 2407.11691 • Published Jul 16, 2024 • 16
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5, 2024 • 62

GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond

Paper • 2309.16583 • Published Sep 28, 2023 • 13
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 57
SO-Bench: A Structural Output Evaluation of Multimodal LLMs

Paper • 2511.21750 • Published Nov 23, 2025 • 6
LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics

Paper • 2512.21010 • Published Dec 24, 2025 • 4

Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 40
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 57
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Paper • 2411.04282 • Published Nov 6, 2024 • 37
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 25

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs