Vlm - a ZhitongGao Collection

ZhitongGao 's Collections

FlexTok AR Models

Vlm

updated Feb 9, 2025

MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models

Paper • 2502.00698 • Published Feb 2, 2025 • 24
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models

Paper • 2502.01142 • Published Feb 3, 2025 • 25
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Paper • 2502.01100 • Published Feb 3, 2025 • 21
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles

Paper • 2502.01081 • Published Feb 3, 2025 • 13
Improved Training Technique for Latent Consistency Models

Paper • 2502.01441 • Published Feb 3, 2025 • 8
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

Paper • 2502.01584 • Published Feb 3, 2025 • 9
COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation

Paper • 2502.02589 • Published Feb 4, 2025 • 9
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

Paper • 2502.03032 • Published Feb 5, 2025 • 60
Great Models Think Alike and this Undermines AI Oversight

Paper • 2502.04313 • Published Feb 6, 2025 • 33
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published Feb 6, 2025 • 29
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features

Paper • 2502.04320 • Published Feb 6, 2025 • 36
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation

Paper • 2502.03860 • Published Feb 6, 2025 • 25