Configuration Parsing Warning:In adapter_config.json: "peft.base_model_name_or_path" must be a string
Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string
ViT-B/16 LoRA Fine-Tuned on Oxford-IIIT Pet
This repository contains a LoRA fine-tuned image classification model based on torchvision.models.vit_b_16, trained on the Oxford-IIIT Pet dataset.
Model Overview
- Base Model:
torchvision.models.vit_b_16(IMAGENET1K_V1) - Task: Image Classification
- Dataset: Oxford-IIIT Pet
- Number of Classes: 37 pet breeds
- Fine-Tuning Method: LoRA (PEFT)
LoRA Configuration
- Rank (r): 16
- Alpha: 16
- Dropout: 0.1
- Target Modules:
out_projmlp.0mlp.3
- Modules Saved:
heads.head
Training Details
- Training Split:
trainval - Evaluation Split:
test - Optimizer: AdamW
- Learning Rate:
1e-3 - Weight Decay:
1e-4 - Batch Size:
32 - Epochs:
3
Evaluation Result
- Test Accuracy: approximately 91.5%
Important Note
This is not a standard Hugging Face transformers model.
The adapter was trained on a torchvision ViT model, so inference requires:
- loading the base model from
torchvision - replacing the classification head for 37 classes
- applying the LoRA adapter with PEFT
Usage
1. Install dependencies
pip install torch torchvision peft pillow
2. Load the model
import torch
import torch.nn as nn
import torchvision.models as models
from peft import PeftModel
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
num_classes = 37
base_model = models.vit_b_16(weights="IMAGENET1K_V1")
base_model.heads.head = nn.Linear(base_model.heads.head.in_features, num_classes)
model = PeftModel.from_pretrained(
base_model,
"yeseul0-0/oxfordiiit-pet-vit-b16-lora"
)
model = model.to(device)
model.eval()
3. Run inference on a single image
from PIL import Image
from torchvision import transforms
class_names = [
"Abyssinian", "Bengal", "Birman", "Bombay", "British_Shorthair",
"Egyptian_Mau", "Maine_Coon", "Persian", "Ragdoll", "Russian_Blue",
"Siamese", "Sphynx", "american_bulldog", "american_pit_bull_terrier",
"basset_hound", "beagle", "boxer", "chihuahua", "english_cocker_spaniel",
"english_setter", "german_shorthaired", "great_pyrenees", "havanese",
"japanese_chin", "keeshond", "leonberger", "miniature_pinscher",
"newfoundland", "pomeranian", "pug", "saint_bernard", "samoyed",
"scottish_terrier", "shiba_inu", "staffordshire_bull_terrier",
"wheaten_terrier", "yorkshire_terrier"
]
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
image = Image.open("example.jpg").convert("RGB")
image_tensor = transform(image).unsqueeze(0).to(device)
with torch.no_grad():
outputs = model(image_tensor)
pred_idx = torch.argmax(outputs, dim=1).item()
print("Predicted class:", class_names[pred_idx])
Notes
- This repository contains a LoRA adapter, not a fully merged standalone model.
- The base model must be reconstructed with the same architecture before loading the adapter.
- Since this model is built on
torchvision, it cannot be loaded directly withAutoModel.from_pretrained()from Hugging Face Transformers.
License
Please check the Oxford-IIIT Pet dataset license and the corresponding torchvision model usage terms before using this model in production.
- Downloads last month
- 16