Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2601.10477

Urban Socio-Semantic Segmentation with Vision-Language Reasoning

Paper • 2601.10477 • Published Jan 15 • 156

moonshotai/Kimi-K2-Instruct-0905

Text Generation • 1T • Updated Jan 30 • 368k • • 698
Running

1

Xxx Wacth Videos Fb Id 1000908070605040302010

👁

1

Watch videos from a specific Facebook ID
Running

504

InferenceSupport

💥

504

Discussions about the Inference Providers feature on the Hub
facebook/sam3

Mask Generation • 0.9B • Updated Nov 20, 2025 • 2.14M • 1.89k

learning_from_papers

WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7, 2025 • 142
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5, 2025 • 140
SpatialLM: Training Large Language Models for Structured Indoor Modeling

Paper • 2506.07491 • Published Jun 9, 2025 • 51
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos

Paper • 2508.14041 • Published Aug 19, 2025 • 59

Multimodal Reasoning

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17, 2025 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4, 2025 • 23
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17, 2025 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

Paper • 2601.23143 • Published Jan 30 • 39
PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published Jan 30 • 223
Agentic Reasoning for Large Language Models

Paper • 2601.12538 • Published Jan 18 • 204
BabyVision: Visual Reasoning Beyond Language

Paper • 2601.06521 • Published Jan 10 • 201

VQ-Seg: Vector-Quantized Token Perturbation for Semi-Supervised Medical Image Segmentation

Paper • 2601.10124 • Published Jan 15 • 4
Urban Socio-Semantic Segmentation with Vision-Language Reasoning

Paper • 2601.10477 • Published Jan 15 • 156
Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation

Paper • 2601.10880 • Published Jan 15 • 15
SAMTok: Representing Any Mask with Two Words

Paper • 2601.16093 • Published Jan 22 • 43

segmentation plus report

ReXGroundingCT: A 3D Chest CT Dataset for Segmentation of Findings from Free-Text Reports

Paper • 2507.22030 • Published Jul 29, 2025 • 4
Unlocking the Potential of MLLMs in Referring Expression Segmentation via a Light-weight Mask Decode

Paper • 2508.04107 • Published Aug 6, 2025 • 4
Phrase-grounded Fact-checking for Automatically Generated Chest X-ray Reports

Paper • 2509.21356 • Published Sep 20, 2025
Learning Segmentation from Radiology Reports

Paper • 2507.05582 • Published Jul 8, 2025 • 1

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

Paper • 2411.04952 • Published Nov 7, 2024 • 29
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

Paper • 2411.05005 • Published Nov 7, 2024 • 13
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models

Paper • 2411.04075 • Published Nov 6, 2024 • 16
Self-Consistency Preference Optimization

Paper • 2411.04109 • Published Nov 6, 2024 • 19

Urban Socio-Semantic Segmentation with Vision-Language Reasoning

Paper • 2601.10477 • Published Jan 15 • 156

THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

Paper • 2601.23143 • Published Jan 30 • 39
PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published Jan 30 • 223
Agentic Reasoning for Large Language Models

Paper • 2601.12538 • Published Jan 18 • 204
BabyVision: Visual Reasoning Beyond Language

Paper • 2601.06521 • Published Jan 10 • 201

moonshotai/Kimi-K2-Instruct-0905

Text Generation • 1T • Updated Jan 30 • 368k • • 698
Running

1

Xxx Wacth Videos Fb Id 1000908070605040302010

👁

1

Watch videos from a specific Facebook ID
Running

504

InferenceSupport

💥

504

Discussions about the Inference Providers feature on the Hub
facebook/sam3

Mask Generation • 0.9B • Updated Nov 20, 2025 • 2.14M • 1.89k

VQ-Seg: Vector-Quantized Token Perturbation for Semi-Supervised Medical Image Segmentation

Paper • 2601.10124 • Published Jan 15 • 4
Urban Socio-Semantic Segmentation with Vision-Language Reasoning

Paper • 2601.10477 • Published Jan 15 • 156
Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation

Paper • 2601.10880 • Published Jan 15 • 15
SAMTok: Representing Any Mask with Two Words

Paper • 2601.16093 • Published Jan 22 • 43

learning_from_papers

WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7, 2025 • 142
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5, 2025 • 140
SpatialLM: Training Large Language Models for Structured Indoor Modeling

Paper • 2506.07491 • Published Jun 9, 2025 • 51
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos

Paper • 2508.14041 • Published Aug 19, 2025 • 59

segmentation plus report

ReXGroundingCT: A 3D Chest CT Dataset for Segmentation of Findings from Free-Text Reports

Paper • 2507.22030 • Published Jul 29, 2025 • 4
Unlocking the Potential of MLLMs in Referring Expression Segmentation via a Light-weight Mask Decode

Paper • 2508.04107 • Published Aug 6, 2025 • 4
Phrase-grounded Fact-checking for Automatically Generated Chest X-ray Reports

Paper • 2509.21356 • Published Sep 20, 2025
Learning Segmentation from Radiology Reports

Paper • 2507.05582 • Published Jul 8, 2025 • 1

Multimodal Reasoning

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17, 2025 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4, 2025 • 23
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17, 2025 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

Paper • 2411.04952 • Published Nov 7, 2024 • 29
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

Paper • 2411.05005 • Published Nov 7, 2024 • 13
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models

Paper • 2411.04075 • Published Nov 6, 2024 • 16
Self-Consistency Preference Optimization

Paper • 2411.04109 • Published Nov 6, 2024 • 19

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs