Model Card for Fruit Classification ViT Model

Model Details

Model Description

This model is a Vision Transformer (ViT) fine-tuned for fruit image classification. It classifies images into six categories:

Banana
Mango
Orange
Pitaya
Pomegranate
Tomatoes

The model is based on transfer learning using a pretrained ViT architecture and has been fine-tuned on a subset of the Fruit Recognition dataset.

Developed by: Mario Soler Vidal
Model type: Vision Transformer (ViT) for image classification
Language(s): Not applicable (Computer Vision)
License: MIT (or leave blank if unsure)
Finetuned from model: google/vit-base-patch16-224

Model Sources

Repository: https://huggingface.co/Mariosolerzhawhugging/fruit-vit-model
Demo: https://huggingface.co/spaces/Mariosolerzhawhugging/fruit-classification-app

Uses

Direct Use

This model can be used to classify images of fruits into one of the six supported categories. It is suitable for:

Educational demonstrations
Image classification tasks
Computer vision experiments

Downstream Use

The model can be integrated into applications such as:

Web apps (e.g., Gradio, Streamlit)
Retail or inventory systems
Automated fruit recognition pipelines

Out-of-Scope Use

Classification of fruits outside the six trained classes
Complex real-world environments with heavy occlusion
Non-fruit image classification

Bias, Risks, and Limitations

The model was trained on images with mostly clean and controlled backgrounds, which may introduce bias.

Potential limitations:

Reduced performance in highly cluttered or noisy environments
Limited generalization to unseen fruit types
Sensitivity to extreme lighting or image distortions

Recommendations

Use the model primarily for the six trained fruit classes
Validate performance in real-world scenarios before deployment
Consider further fine-tuning for more diverse datasets

Training Details

Training Data

The model was trained on a subset of the Fruit Recognition dataset, which contains over 44,000 images.

Key characteristics:

Images captured in a controlled lab environment
Resolution: 320 × 258 pixels
Mostly clean backgrounds
Variations in lighting, shadows, and pose

Training Procedure

Preprocessing

Images resized to 224 × 224 pixels
Normalized using ImageNet statistics
Converted to RGB format

Data augmentation:

Random horizontal flip
Rotation
Color jitter

Training Hyperparameters

Training regime: fp32
Approach: Transfer learning (frozen backbone, trained classification head)

Evaluation

Testing Data, Factors & Metrics

Testing Data

Validation and test splits from the Fruit Recognition dataset.

Metrics

Accuracy
Macro F1 Score

Results

Accuracy: ~0.999
Macro F1 Score: ~0.999

Summary

The model achieves very high performance with confident predictions across all classes.

Technical Specifications

Model Architecture and Objective

Vision Transformer (ViT)
Image classification objective

Compute Infrastructure

Hardware

GPU (training environment)

Software

Python
Hugging Face Transformers
PyTorch

Model Card Authors

Mario Soler Vidal

Downloads last month: 48

Safetensors

Model size

85.8M params

Tensor type

F32

Mariosolerzhawhugging
/

fruit-vit-model