BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Paper • 2308.09936 • Published • 1
None defined yet.
PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction
Pose Recognition with Cascade Transformers