OSCaR LLaVA v1.5 13B Projector

This repository contains the projector artifact staged for the OSCaR public release.

Artifact Type

  • Local staging directory: llava-v1.5-13b-pretrain-projector
  • Public repo id: ali-vosoughi/oscar-llava-v1.5-13b-projector
  • Training data condition: projector pretraining assets used before OSCaR LoRA fine-tuning

Files

  • config.json
  • mm_projector.bin

Loading

This is a projector-only release. It is intended for the pretraining and fine-tuning workflow documented in the OSCaR code repository.

Example:

bash scripts/train/pretrain_v1_5_13b_projector.sh

Training Configuration

  • LLaVA v1.5 stack
  • CLIP ViT-L/336 vision tower
  • LoRA rank 128
  • LoRA alpha 256
  • learning rate 2e-4
  • 1 epoch
  • max sequence length 2048

Related Resources

  • Code: https://github.com/nguyennm1024/OSCaR
  • Dataset: https://huggingface.co/datasets/ali-vosoughi/oscar-dataset
  • Paper: https://arxiv.org/abs/2402.17128
Downloads last month
92
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ali-vosoughi/oscar-llava-v1.5-13b-projector

Finetuned
(32)
this model

Paper for ali-vosoughi/oscar-llava-v1.5-13b-projector