Configuration Parsing Warning:In adapter_config.json: "peft.base_model_name_or_path" must be a string

Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

ViT-B/16 LoRA Fine-Tuned on Oxford-IIIT Pet

This repository contains a LoRA fine-tuned image classification model based on torchvision.models.vit_b_16, trained on the Oxford-IIIT Pet dataset.

Model Overview

  • Base Model: torchvision.models.vit_b_16 (IMAGENET1K_V1)
  • Task: Image Classification
  • Dataset: Oxford-IIIT Pet
  • Number of Classes: 37 pet breeds
  • Fine-Tuning Method: LoRA (PEFT)

LoRA Configuration

  • Rank (r): 16
  • Alpha: 16
  • Dropout: 0.1
  • Target Modules:
    • out_proj
    • mlp.0
    • mlp.3
  • Modules Saved:
    • heads.head

Training Details

  • Training Split: trainval
  • Evaluation Split: test
  • Optimizer: AdamW
  • Learning Rate: 1e-3
  • Weight Decay: 1e-4
  • Batch Size: 32
  • Epochs: 3

Evaluation Result

  • Test Accuracy: approximately 91.5%

Important Note

This is not a standard Hugging Face transformers model.
The adapter was trained on a torchvision ViT model, so inference requires:

  1. loading the base model from torchvision
  2. replacing the classification head for 37 classes
  3. applying the LoRA adapter with PEFT

Usage

1. Install dependencies

pip install torch torchvision peft pillow

2. Load the model

import torch
import torch.nn as nn
import torchvision.models as models
from peft import PeftModel

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
num_classes = 37

base_model = models.vit_b_16(weights="IMAGENET1K_V1")
base_model.heads.head = nn.Linear(base_model.heads.head.in_features, num_classes)

model = PeftModel.from_pretrained(
    base_model,
    "yeseul0-0/oxfordiiit-pet-vit-b16-lora"
)

model = model.to(device)
model.eval()

3. Run inference on a single image

from PIL import Image
from torchvision import transforms

class_names = [
    "Abyssinian", "Bengal", "Birman", "Bombay", "British_Shorthair",
    "Egyptian_Mau", "Maine_Coon", "Persian", "Ragdoll", "Russian_Blue",
    "Siamese", "Sphynx", "american_bulldog", "american_pit_bull_terrier",
    "basset_hound", "beagle", "boxer", "chihuahua", "english_cocker_spaniel",
    "english_setter", "german_shorthaired", "great_pyrenees", "havanese",
    "japanese_chin", "keeshond", "leonberger", "miniature_pinscher",
    "newfoundland", "pomeranian", "pug", "saint_bernard", "samoyed",
    "scottish_terrier", "shiba_inu", "staffordshire_bull_terrier",
    "wheaten_terrier", "yorkshire_terrier"
]

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

image = Image.open("example.jpg").convert("RGB")
image_tensor = transform(image).unsqueeze(0).to(device)

with torch.no_grad():
    outputs = model(image_tensor)
    pred_idx = torch.argmax(outputs, dim=1).item()

print("Predicted class:", class_names[pred_idx])

Notes

  • This repository contains a LoRA adapter, not a fully merged standalone model.
  • The base model must be reconstructed with the same architecture before loading the adapter.
  • Since this model is built on torchvision, it cannot be loaded directly with AutoModel.from_pretrained() from Hugging Face Transformers.

License

Please check the Oxford-IIIT Pet dataset license and the corresponding torchvision model usage terms before using this model in production.

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support