Configuration Parsing Warning:In adapter_config.json: "peft.base_model_name_or_path" must be a string

Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

ViT-B/16 LoRA Fine-Tuned on Oxford-IIIT Pet

This repository contains a LoRA fine-tuned image classification model based on torchvision.models.vit_b_16, trained on the Oxford-IIIT Pet dataset.

Model Overview

Base Model: torchvision.models.vit_b_16 (IMAGENET1K_V1)
Task: Image Classification
Dataset: Oxford-IIIT Pet
Number of Classes: 37 pet breeds
Fine-Tuning Method: LoRA (PEFT)

LoRA Configuration

Rank (r): 16
Alpha: 16
Dropout: 0.1
Target Modules:
- out_proj
- mlp.0
- mlp.3
Modules Saved:
- heads.head

Training Details

Training Split: trainval
Evaluation Split: test
Optimizer: AdamW
Learning Rate: 1e-3
Weight Decay: 1e-4
Batch Size: 32
Epochs: 3

Evaluation Result

Test Accuracy: approximately 91.5%

Important Note

This is not a standard Hugging Face transformers model.
The adapter was trained on a torchvision ViT model, so inference requires:

loading the base model from torchvision
replacing the classification head for 37 classes
applying the LoRA adapter with PEFT

Usage

1. Install dependencies

pip install torch torchvision peft pillow

2. Load the model

import torch
import torch.nn as nn
import torchvision.models as models
from peft import PeftModel

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
num_classes = 37

base_model = models.vit_b_16(weights="IMAGENET1K_V1")
base_model.heads.head = nn.Linear(base_model.heads.head.in_features, num_classes)

model = PeftModel.from_pretrained(
    base_model,
    "yeseul0-0/oxfordiiit-pet-vit-b16-lora"
)

model = model.to(device)
model.eval()

3. Run inference on a single image

from PIL import Image
from torchvision import transforms

class_names = [
    "Abyssinian", "Bengal", "Birman", "Bombay", "British_Shorthair",
    "Egyptian_Mau", "Maine_Coon", "Persian", "Ragdoll", "Russian_Blue",
    "Siamese", "Sphynx", "american_bulldog", "american_pit_bull_terrier",
    "basset_hound", "beagle", "boxer", "chihuahua", "english_cocker_spaniel",
    "english_setter", "german_shorthaired", "great_pyrenees", "havanese",
    "japanese_chin", "keeshond", "leonberger", "miniature_pinscher",
    "newfoundland", "pomeranian", "pug", "saint_bernard", "samoyed",
    "scottish_terrier", "shiba_inu", "staffordshire_bull_terrier",
    "wheaten_terrier", "yorkshire_terrier"
]

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

image = Image.open("example.jpg").convert("RGB")
image_tensor = transform(image).unsqueeze(0).to(device)

with torch.no_grad():
    outputs = model(image_tensor)
    pred_idx = torch.argmax(outputs, dim=1).item()

print("Predicted class:", class_names[pred_idx])

Notes

This repository contains a LoRA adapter, not a fully merged standalone model.
The base model must be reconstructed with the same architecture before loading the adapter.
Since this model is built on torchvision, it cannot be loaded directly with AutoModel.from_pretrained() from Hugging Face Transformers.

License

Please check the Oxford-IIIT Pet dataset license and the corresponding torchvision model usage terms before using this model in production.

Downloads last month: 16