MFM - Multimodal Foundation Models - a LeafInTheTree Collection

LeafInTheTree 's Collections

Speech-2-Speech

MFM - Multimodal Foundation Models

MFM - Multimodal Foundation Models

updated Mar 2

Paused

Agents

Featured

102

Idefics3

📊

102

Generate text based on an image and prompt
Runtime error

Agents

162

VideoLLaMA2

🎥

162

Media understanding
Runtime error

Agents

54

GroundingDINO ⚔ OWL

🦖

54

Identify objects in images using text queries
Runtime error

Agents

85

Paligemma HF

🤗

85

Generate text and segment images using PaliGemma
Paused

Agents

Featured

315

PaliGemma Demo

🤲

315

Annotate and describe images with text prompts
Runtime error

Agents

Featured

515

Florence2 + SAM2

🔥

515

Segment and caption objects in images and videos
Running on Zero

Agents

11

Florence 2 Vision Model V1

💻

11

Analyze images to caption, detect objects, and extract text
Build error

Agents

2

Marketing Vision

👁

2
Runtime error

Agents

2

Idefics3

📊

2
Paused

Agents

10

Theia

⚡

10

Generate detailed image analyses and depth predictions
Runtime error

Agents

16

XGen MM

💻

16

Generate detailed descriptions from images and questions
Sleeping

Agents

LLaMA 3.1 Vision

🦙
Runtime error

Agents

Featured

79

Chameleon 30b

🔥

79

Chat about images and get instant answers
Running

Agents

Featured

513

InternVL

⚡

513

Chat with AI using text and images for analysis
Running on Zero

Agents

Featured

840

Florence 2

📉

840

Perform image captioning, detection, OCR and more with Florence‑2
Running on Zero

Agents

Featured

222

Phi 3.5 Vision

🔥

222

Generate answers to questions about any image
Runtime error

Agents

Featured

886

MiniGPT-4

🚀

886
Runtime error

Agents

40

Mistral Pixtral Demo

👀

40

Chat with Pixtral 12B using Mistral Inference
Runtime error

Agents

Featured

323

Ovis1.6 Gemma2 9B

🐑

323

Interact with a chatbot that understands text and images
meta-llama/Llama-Guard-3-11B-Vision

Image-Text-to-Text • 11B • Updated Nov 18, 2024 • 2.02k • 71
Running

Agents

Featured

103

Owlv2

👀

103

State-of-the-art Zero-shot Object Detection
Running on Zero

Agents

Featured

390

Llama-Vision-11B

🚀

390

Chat with Llama about images and text
Runtime error

Agents

144

SmolVLM

📊

144

Generate text from images and queries
Paused

Agents

7

GLM-Edge-V-5B Space

📷

7

Generate text responses based on images and chat history
Running on Zero

Agents

17

Paligemma2 Detection

😻

17

Paligemma2 Detection with Supervision
Runtime error

Agents

40

Florence Llama

💬

40

Generate text responses from images and text input
Runtime error

Agents

6

Paligemma2 10b Ft Docci 448

📉

6
Runtime error

Agents

5

VisPer-LM

🔍

5

Visualize image depth, segmentation, and generation
Runtime error

Agents

Featured

2.02k

Chat With Janus-Pro-7B

🌍

2.02k

A unified multimodal understanding and generation model.