DualTowerVLM is a dual-tower Vision-Language Model (VLM) architecture that processes images and text through separate towers before combining their representations.
For more information, check out the repository.
Usage:
from models.dual_tower.dual_tower import DualTowerVLM
from models.config import VLMConfig
cfg = VLMConfig()
model = DualTowerVLM.from_pretrained("patrickamadeus/dualtower-cauldron-9000")