ali-vosoughi
/

oscar-llava-v1.5-13b-projector

Text Generation

vision-language

egocentric-video

Model card Files Files and versions

OSCaR LLaVA v1.5 13B Projector

This repository contains the projector artifact staged for the OSCaR public release.

Artifact Type

Local staging directory: llava-v1.5-13b-pretrain-projector
Public repo id: ali-vosoughi/oscar-llava-v1.5-13b-projector
Training data condition: projector pretraining assets used before OSCaR LoRA fine-tuning

Files

config.json
mm_projector.bin

Loading

This is a projector-only release. It is intended for the pretraining and fine-tuning workflow documented in the OSCaR code repository.

Example:

bash scripts/train/pretrain_v1_5_13b_projector.sh

Training Configuration

LLaVA v1.5 stack
CLIP ViT-L/336 vision tower
LoRA rank 128
LoRA alpha 256
learning rate 2e-4
1 epoch
max sequence length 2048

Related Resources

Code: https://github.com/nguyennm1024/OSCaR
Dataset: https://huggingface.co/datasets/ali-vosoughi/oscar-dataset
Paper: https://arxiv.org/abs/2402.17128

Downloads last month: 92

Model tree for ali-vosoughi/oscar-llava-v1.5-13b-projector

Base model

openai/clip-vit-large-patch14-336

Finetuned

(32)

this model

Paper for ali-vosoughi/oscar-llava-v1.5-13b-projector

OSCaR: Object State Captioning and State Change Representation

Paper • 2402.17128 • Published Apr 2, 2024