Model Card for Fruit Classification ViT Model

Model Details

Model Description

This model is a Vision Transformer (ViT) fine-tuned for fruit image classification. It classifies images into six categories:

  • Banana
  • Mango
  • Orange
  • Pitaya
  • Pomegranate
  • Tomatoes

The model is based on transfer learning using a pretrained ViT architecture and has been fine-tuned on a subset of the Fruit Recognition dataset.

  • Developed by: Mario Soler Vidal
  • Model type: Vision Transformer (ViT) for image classification
  • Language(s): Not applicable (Computer Vision)
  • License: MIT (or leave blank if unsure)
  • Finetuned from model: google/vit-base-patch16-224

Model Sources


Uses

Direct Use

This model can be used to classify images of fruits into one of the six supported categories. It is suitable for:

  • Educational demonstrations
  • Image classification tasks
  • Computer vision experiments

Downstream Use

The model can be integrated into applications such as:

  • Web apps (e.g., Gradio, Streamlit)
  • Retail or inventory systems
  • Automated fruit recognition pipelines

Out-of-Scope Use

  • Classification of fruits outside the six trained classes
  • Complex real-world environments with heavy occlusion
  • Non-fruit image classification

Bias, Risks, and Limitations

The model was trained on images with mostly clean and controlled backgrounds, which may introduce bias.

Potential limitations:

  • Reduced performance in highly cluttered or noisy environments
  • Limited generalization to unseen fruit types
  • Sensitivity to extreme lighting or image distortions

Recommendations

  • Use the model primarily for the six trained fruit classes
  • Validate performance in real-world scenarios before deployment
  • Consider further fine-tuning for more diverse datasets

Training Details

Training Data

The model was trained on a subset of the Fruit Recognition dataset, which contains over 44,000 images.

Key characteristics:

  • Images captured in a controlled lab environment
  • Resolution: 320 ร— 258 pixels
  • Mostly clean backgrounds
  • Variations in lighting, shadows, and pose

Training Procedure

Preprocessing

  • Images resized to 224 ร— 224 pixels
  • Normalized using ImageNet statistics
  • Converted to RGB format

Data augmentation:

  • Random horizontal flip
  • Rotation
  • Color jitter

Training Hyperparameters

  • Training regime: fp32
  • Approach: Transfer learning (frozen backbone, trained classification head)

Evaluation

Testing Data, Factors & Metrics

Testing Data

Validation and test splits from the Fruit Recognition dataset.

Metrics

  • Accuracy
  • Macro F1 Score

Results

  • Accuracy: ~0.999
  • Macro F1 Score: ~0.999

Summary

The model achieves very high performance with confident predictions across all classes.


Technical Specifications

Model Architecture and Objective

  • Vision Transformer (ViT)
  • Image classification objective

Compute Infrastructure

Hardware

GPU (training environment)

Software

  • Python
  • Hugging Face Transformers
  • PyTorch

Model Card Authors

Mario Soler Vidal

Downloads last month
48
Safetensors
Model size
85.8M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using Mariosolerzhawhugging/fruit-vit-model 1