OSCaR LLaVA v1.5 13B Mixed Adapter

This repository contains the adapter artifact staged for the OSCaR public release.

Artifact Type

Local staging directory: llava-v1.5-13b-lora-mixed-adapter
Public repo id: ali-vosoughi/oscar-llava-v1.5-13b-mixed-adapter
Training data condition: OSCaR plus the upstream LLaVA v1.5 mixed visual-instruction manifest (llava_final.json)

Files

adapter_model.bin
adapter_config.json
config.json
non_lora_trainables.bin

Loading

This is an adapter-only release. Pass --model-base with the matching Vicuna checkpoint when loading it.

Example:

python -m llava.serve.cli --model-path ali-vosoughi/oscar-llava-v1.5-13b-mixed-adapter --model-base lmsys/vicuna-13b-v1.5 --image-file /path/to/image.jpg

Training Configuration

LLaVA v1.5 stack
CLIP ViT-L/336 vision tower
LoRA rank 128
LoRA alpha 256
learning rate 2e-4
1 epoch
max sequence length 2048

Related Resources

Code: https://github.com/nguyennm1024/OSCaR
Dataset: https://huggingface.co/datasets/ali-vosoughi/oscar-dataset
Paper: https://arxiv.org/abs/2402.17128

Downloads last month: 16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ali-vosoughi/oscar-llava-v1.5-13b-mixed-adapter

Base model

lmsys/vicuna-13b-v1.5

Adapter

(123)

this model

Paper for ali-vosoughi/oscar-llava-v1.5-13b-mixed-adapter

OSCaR: Object State Captioning and State Change Representation

Paper • 2402.17128 • Published Apr 2, 2024