Vittle-7B (F) weight

Vittle (F) is the fixed-prior variant of Vittle (NeurIPS 2025), a VLM instruction tuning framework that improves robustness to distribution shifts via variational information bottleneck.

Paper: arXiv:2505.13946
Code: deeplearning-wisc/vittle

Usage

import torch
from vittle.model.language_model.vittle_llama import VittleLlamaForCausalLM
from transformers import AutoTokenizer

model = VittleLlamaForCausalLM.from_pretrained(
    "changdae/vittle-7b-F",
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",
)
tokenizer = AutoTokenizer.from_pretrained("changdae/vittle-7b-F", use_fast=False)

Refer to the evaluation guide for full inference instructions.

Model Details

Property	Value
Base model	`lmsys/vicuna-7b-v1.5`
Vision encoder	`openai/clip-vit-large-patch14-336`
Bottleneck layer	24
Interpolation coefficient (alpha)	0.5
KLD strength (beta)	0.1
Learnable prior	No (fixed)
Training dtype	bfloat16

Citation

@inproceedings{
  oh2025visual,
  title={Visual Instruction Bottleneck Tuning},
  author={Changdae Oh and Jiatong Li and Shawn Im and Sharon Li},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year={2025},
  url={https://openreview.net/forum?id=yzHiEmLSk8}
}

Downloads last month: 44

Safetensors

Model size

7B params

Tensor type

BF16

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for changdae/vittle-7b-F

Base model

lmsys/vicuna-7b-v1.5

Finetuned

(67)

this model

Dataset used to train changdae/vittle-7b-F

Paper for changdae/vittle-7b-F

Visual Instruction Bottleneck Tuning

Paper • 2505.13946 • Published May 20, 2025 • 10