LatentThinkingPKU

community

AI & ML interests

None defined yet.

Recent Activity

THUdyh authored a paper 10 days ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

DogNeverSleep authored a paper 11 days ago

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

THUdyh authored a paper 11 days ago

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

View all activity

authored a paper 10 days ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published 12 days ago • 233

authored a paper 11 days ago

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

Paper • 2604.03016 • Published 15 days ago • 37

authored a paper 11 days ago

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Paper • 2604.04901 • Published 12 days ago • 40

authored a paper 11 days ago

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

Paper • 2604.04707 • Published 12 days ago • 200

authored a paper 16 days ago

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Paper • 2603.26653 • Published 21 days ago • 18

submitted a paper to Daily Papers 16 days ago

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Paper • 2603.26653 • Published 21 days ago • 18

authored a paper 25 days ago

Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2603.18118 • Published about 1 month ago • 12

submitted a paper to Daily Papers 25 days ago

Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2603.18118 • Published about 1 month ago • 12

authored a paper 26 days ago

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Paper • 2603.15030 • Published Mar 16 • 21

authored a paper 26 days ago

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Paper • 2603.15030 • Published Mar 16 • 21

submitted a paper to Daily Papers 29 days ago

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Paper • 2603.15030 • Published Mar 16 • 21

authored 5 papers about 2 months ago

BrowseComp-$V^3$: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents

Paper • 2602.12876 • Published Feb 13 • 12

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Paper • 2602.04804 • Published Feb 4 • 50

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks

Paper • 2602.01630 • Published Feb 2 • 50

DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models

Paper • 2601.19267 • Published Jan 27

The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss

Paper • 2512.08374 • Published Dec 9, 2025

authored a paper 2 months ago

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

Paper • 2602.08439 • Published Feb 9 • 28

submitted a paper to Daily Papers 2 months ago

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

Paper • 2602.08439 • Published Feb 9 • 28

authored a paper 2 months ago

Kimi K2.5: Visual Agentic Intelligence

Paper • 2602.02276 • Published Feb 2 • 264

authored a paper 3 months ago

CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

Paper • 2601.10061 • Published Jan 15 • 32