Model Card for Fruit Classification ViT Model
Model Details
Model Description
This model is a Vision Transformer (ViT) fine-tuned for fruit image classification. It classifies images into six categories:
- Banana
- Mango
- Orange
- Pitaya
- Pomegranate
- Tomatoes
The model is based on transfer learning using a pretrained ViT architecture and has been fine-tuned on a subset of the Fruit Recognition dataset.
- Developed by: Mario Soler Vidal
- Model type: Vision Transformer (ViT) for image classification
- Language(s): Not applicable (Computer Vision)
- License: MIT (or leave blank if unsure)
- Finetuned from model: google/vit-base-patch16-224
Model Sources
- Repository: https://huggingface.co/Mariosolerzhawhugging/fruit-vit-model
- Demo: https://huggingface.co/spaces/Mariosolerzhawhugging/fruit-classification-app
Uses
Direct Use
This model can be used to classify images of fruits into one of the six supported categories. It is suitable for:
- Educational demonstrations
- Image classification tasks
- Computer vision experiments
Downstream Use
The model can be integrated into applications such as:
- Web apps (e.g., Gradio, Streamlit)
- Retail or inventory systems
- Automated fruit recognition pipelines
Out-of-Scope Use
- Classification of fruits outside the six trained classes
- Complex real-world environments with heavy occlusion
- Non-fruit image classification
Bias, Risks, and Limitations
The model was trained on images with mostly clean and controlled backgrounds, which may introduce bias.
Potential limitations:
- Reduced performance in highly cluttered or noisy environments
- Limited generalization to unseen fruit types
- Sensitivity to extreme lighting or image distortions
Recommendations
- Use the model primarily for the six trained fruit classes
- Validate performance in real-world scenarios before deployment
- Consider further fine-tuning for more diverse datasets
Training Details
Training Data
The model was trained on a subset of the Fruit Recognition dataset, which contains over 44,000 images.
Key characteristics:
- Images captured in a controlled lab environment
- Resolution: 320 ร 258 pixels
- Mostly clean backgrounds
- Variations in lighting, shadows, and pose
Training Procedure
Preprocessing
- Images resized to 224 ร 224 pixels
- Normalized using ImageNet statistics
- Converted to RGB format
Data augmentation:
- Random horizontal flip
- Rotation
- Color jitter
Training Hyperparameters
- Training regime: fp32
- Approach: Transfer learning (frozen backbone, trained classification head)
Evaluation
Testing Data, Factors & Metrics
Testing Data
Validation and test splits from the Fruit Recognition dataset.
Metrics
- Accuracy
- Macro F1 Score
Results
- Accuracy: ~0.999
- Macro F1 Score: ~0.999
Summary
The model achieves very high performance with confident predictions across all classes.
Technical Specifications
Model Architecture and Objective
- Vision Transformer (ViT)
- Image classification objective
Compute Infrastructure
Hardware
GPU (training environment)
Software
- Python
- Hugging Face Transformers
- PyTorch
Model Card Authors
Mario Soler Vidal
- Downloads last month
- 48