Siddanna's picture
Upload README.md
205c64a verified
---
library_name: transformers
license: apache-2.0
tags:
- image-classification
- dinov2
- vision
- tube-classification
- manufacturing
datasets:
- Siddanna/transparent-tube-dataset
base_model:
- facebook/dinov2-base
pipeline_tag: image-classification
---
# Transparent Tube Classifier
A binary image classifier that distinguishes between:
- **transparent_alone** πŸ§ͺ β€” A transparent tube by itself
- **transparent_with_blue** πŸ§ͺπŸ’™ β€” A transparent tube paired with a blue tube
## Model Details
| Property | Value |
|---|---|
| **Base Model** | [facebook/dinov2-base](https://huggingface.co/facebook/dinov2-base) (ViT-B/14, 86.6M params) |
| **Training Method** | Linear probe (frozen backbone + trained classifier head) |
| **Training Dataset** | [Siddanna/transparent-tube-dataset](https://huggingface.co/datasets/Siddanna/transparent-tube-dataset) |
| **Accuracy** | **100%** on test set |
| **Loss** | 0.0014 |
| **Image Size** | 256Γ—256 (DINOv2 default) |
| **License** | Apache 2.0 |
## Quick Start
### Using Pipeline (Easiest)
```python
from transformers import pipeline
classifier = pipeline("image-classification", model="Siddanna/transparent-tube-classifier")
result = classifier("your_tube_image.jpg")
print(result)
# [{'label': 'transparent_with_blue', 'score': 0.99}, {'label': 'transparent_alone', 'score': 0.01}]
```
### Manual Inference
```python
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch
# Load model and processor
model = AutoModelForImageClassification.from_pretrained("Siddanna/transparent-tube-classifier")
processor = AutoImageProcessor.from_pretrained("Siddanna/transparent-tube-classifier")
# Load and classify image
image = Image.open("your_tube_image.jpg")
inputs = processor(image, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_class = logits.argmax(-1).item()
label = model.config.id2label[predicted_class]
confidence = torch.softmax(logits, dim=-1)[0][predicted_class].item()
print(f"Prediction: {label} (confidence: {confidence:.2%})")
```
## Training Details
### Architecture
- **Base**: DINOv2-base (Vision Transformer B/14), pretrained on LVD-142M (142M curated images)
- **Head**: Linear classifier (768 β†’ 2)
- **Method**: Linear probe β€” backbone is frozen, only the classification head is trained
- **Why DINOv2?**: DINOv2's global self-attention captures the full image context, which is critical for detecting whether a blue tube is present anywhere in the scene alongside the transparent tube
### Hyperparameters
- Learning rate: `1e-3` (with cosine schedule)
- Warmup steps: 50
- Batch size: 16
- Weight decay: 0.01
- Training epochs: 4 (converged at epoch 1)
### Data Augmentations
- RandomResizedCrop (scale 0.7-1.0)
- RandomHorizontalFlip
- RandomRotation (Β±15Β°)
- ColorJitter (brightness=0.3, contrast=0.3, saturation=0.2, hue=0.05)
### Training Curves
| Epoch | Train Loss | Eval Loss | Eval Accuracy |
|---|---|---|---|
| 1 | 0.032 | 0.019 | **100%** |
| 2 | 0.011 | 0.002 | **100%** |
| 3 | 0.002 | 0.001 | **100%** |
| 4 | 0.004 | 0.010 | 99.5% |
## For Production Use with Real Images
The model is currently trained on **synthetic data**. For best results with your actual tubes:
### Step 1: Collect Real Photos
Take 50-100+ photos per class of your actual tubes:
```
data/
β”œβ”€β”€ train/
β”‚ β”œβ”€β”€ transparent_alone/ # Photos of transparent tube alone
β”‚ └── transparent_with_blue/ # Photos of transparent + blue tube
└── test/
β”œβ”€β”€ transparent_alone/
└── transparent_with_blue/
```
### Step 2: Re-train
```python
# Clone the training script
# Option A: Linear probe (fast, good with 50+ images/class)
python train.py --data_dir ./data --freeze_backbone --hub_model_id your-username/tube-classifier
# Option B: Full fine-tune (better with 200+ images/class)
python train.py --data_dir ./data --learning_rate 5e-5 --hub_model_id your-username/tube-classifier
```
### Tips for Collecting Good Training Data
- **Vary backgrounds**: different surfaces, lighting conditions
- **Vary angles**: slightly different camera positions
- **Vary distances**: close-up and farther away shots
- **Include edge cases**: partially occluded tubes, different orientations
- **Match deployment conditions**: use the same camera/environment you'll deploy in
## Demo
Try the model: [**Transparent Tube Classifier Demo**](https://huggingface.co/spaces/Siddanna/transparent-tube-classifier-demo)
## Citation
```bibtex
@misc{transparent-tube-classifier,
title={Transparent Tube Classifier},
author={Siddanna},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/Siddanna/transparent-tube-classifier}
}
```