DualTowerVLM is a dual-tower Vision-Language Model (VLM) architecture that processes images and text through separate towers before combining their representations.

For more information, check out the repository.

Usage:

from models.dual_tower.dual_tower import DualTowerVLM
from models.config import VLMConfig

cfg = VLMConfig()
model = DualTowerVLM.from_pretrained("patrickamadeus/dualtower-cauldron-9000")

Downloads last month: -

Safetensors

Model size

0.4B params

Tensor type

F32

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support