19 48 60

xiangan

https://anxiangsir.github.io/

anxiangsir

AI & ML interests

None yet

Recent Activity

updated a dataset 1 day ago

mvp-lab/Sisyphus

upvoted a paper 9 days ago

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

liked a model 9 days ago

InnovatorLab/Innovator-VL-8B-Instruct

View all activity

Organizations

upvoted a paper 9 days ago

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Paper • 2604.04901 • Published 10 days ago • 40

upvoted 2 papers about 1 month ago

LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model

Paper • 2603.01068 • Published Mar 1 • 22

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Paper • 2510.14979 • Published Oct 16, 2025 • 69

upvoted an article about 1 month ago

Article

NEO-unify: Building Native Multimodal Unified Models End to End

Mar 5

•

122

upvoted a paper about 1 month ago

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

Paper • 2603.03241 • Published Mar 3 • 87

upvoted a changelog about 1 month ago

Hugging Face Changelog

Public Storage Add-ons

Feb 26

• 167

upvoted a collection about 2 months ago

onevision-encoder

Collection

2 items • Updated Feb 10 • 6

upvoted 3 papers about 2 months ago

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Paper • 2602.12279 • Published Feb 12 • 20

CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

Paper • 2602.13191 • Published Feb 13 • 31

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Paper • 2602.08683 • Published Feb 9 • 52

upvoted a paper 2 months ago

GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

Paper • 2602.12099 • Published Feb 12 • 61

upvoted 3 papers 3 months ago

Innovator-VL: A Multimodal Large Language Model for Scientific Discovery

Paper • 2601.19325 • Published Jan 27 • 81

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Paper • 2601.10611 • Published Jan 15 • 34

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

Paper • 2601.10305 • Published Jan 15 • 36

upvoted a collection 4 months ago

OneVision-Encoder

Collection

HEVC-Style Vision Transformer • 2 items • Updated Feb 10 • 3

upvoted an article 4 months ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

Dec 1, 2025

•

309

upvoted 2 papers 4 months ago

SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal Overhead

Paper • 2512.00903 • Published Nov 30, 2025 • 7

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Paper • 2511.20785 • Published Nov 25, 2025 • 189

upvoted 2 papers 5 months ago

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 96

Cambrian-S: Towards Spatial Supersensing in Video

Paper • 2511.04670 • Published Nov 6, 2025 • 39