File size: 4,773 Bytes

006e0ff
 
205c64a
 
 
 
 
 
 
 
 
 
 
 
006e0ff
 
205c64a
006e0ff
205c64a
 
 
006e0ff
 
 
205c64a
 
 
 
 
 
 
 
 
006e0ff
205c64a
006e0ff
205c64a
006e0ff
205c64a
 
006e0ff
205c64a
 
 
 
 
006e0ff
205c64a
006e0ff
205c64a
 
 
 
006e0ff
205c64a
 
 
006e0ff
205c64a
 
 
006e0ff
205c64a
 
006e0ff
205c64a
 
 
006e0ff
205c64a
 
006e0ff
 
 
205c64a

---
library_name: transformers
license: apache-2.0
tags:
  - image-classification
  - dinov2
  - vision
  - tube-classification
  - manufacturing
datasets:
  - Siddanna/transparent-tube-dataset
base_model:
  - facebook/dinov2-base
pipeline_tag: image-classification
---

# Transparent Tube Classifier

A binary image classifier that distinguishes between:
- **transparent_alone** 🧪 — A transparent tube by itself
- **transparent_with_blue** 🧪💙 — A transparent tube paired with a blue tube

## Model Details

| Property | Value |
|---|---|
| **Base Model** | [facebook/dinov2-base](https://huggingface.co/facebook/dinov2-base) (ViT-B/14, 86.6M params) |
| **Training Method** | Linear probe (frozen backbone + trained classifier head) |
| **Training Dataset** | [Siddanna/transparent-tube-dataset](https://huggingface.co/datasets/Siddanna/transparent-tube-dataset) |
| **Accuracy** | **100%** on test set |
| **Loss** | 0.0014 |
| **Image Size** | 256×256 (DINOv2 default) |
| **License** | Apache 2.0 |

## Quick Start

### Using Pipeline (Easiest)

```python
from transformers import pipeline

classifier = pipeline("image-classification", model="Siddanna/transparent-tube-classifier")
result = classifier("your_tube_image.jpg")
print(result)
# [{'label': 'transparent_with_blue', 'score': 0.99}, {'label': 'transparent_alone', 'score': 0.01}]
```

### Manual Inference

```python
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch

# Load model and processor
model = AutoModelForImageClassification.from_pretrained("Siddanna/transparent-tube-classifier")
processor = AutoImageProcessor.from_pretrained("Siddanna/transparent-tube-classifier")

# Load and classify image
image = Image.open("your_tube_image.jpg")
inputs = processor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

predicted_class = logits.argmax(-1).item()
label = model.config.id2label[predicted_class]
confidence = torch.softmax(logits, dim=-1)[0][predicted_class].item()

print(f"Prediction: {label} (confidence: {confidence:.2%})")
```

## Training Details

### Architecture
- **Base**: DINOv2-base (Vision Transformer B/14), pretrained on LVD-142M (142M curated images)
- **Head**: Linear classifier (768 → 2)
- **Method**: Linear probe — backbone is frozen, only the classification head is trained
- **Why DINOv2?**: DINOv2's global self-attention captures the full image context, which is critical for detecting whether a blue tube is present anywhere in the scene alongside the transparent tube

### Hyperparameters
- Learning rate: `1e-3` (with cosine schedule)
- Warmup steps: 50
- Batch size: 16
- Weight decay: 0.01
- Training epochs: 4 (converged at epoch 1)

### Data Augmentations
- RandomResizedCrop (scale 0.7-1.0)
- RandomHorizontalFlip
- RandomRotation (±15°)
- ColorJitter (brightness=0.3, contrast=0.3, saturation=0.2, hue=0.05)

### Training Curves
| Epoch | Train Loss | Eval Loss | Eval Accuracy |
|---|---|---|---|
| 1 | 0.032 | 0.019 | **100%** |
| 2 | 0.011 | 0.002 | **100%** |
| 3 | 0.002 | 0.001 | **100%** |
| 4 | 0.004 | 0.010 | 99.5% |

## For Production Use with Real Images

The model is currently trained on **synthetic data**. For best results with your actual tubes:

### Step 1: Collect Real Photos
Take 50-100+ photos per class of your actual tubes:
```
data/
├── train/
│   ├── transparent_alone/     # Photos of transparent tube alone
│   └── transparent_with_blue/ # Photos of transparent + blue tube
└── test/
    ├── transparent_alone/
    └── transparent_with_blue/
```

### Step 2: Re-train
```python
# Clone the training script
# Option A: Linear probe (fast, good with 50+ images/class)
python train.py --data_dir ./data --freeze_backbone --hub_model_id your-username/tube-classifier

# Option B: Full fine-tune (better with 200+ images/class)
python train.py --data_dir ./data --learning_rate 5e-5 --hub_model_id your-username/tube-classifier
```

### Tips for Collecting Good Training Data
- **Vary backgrounds**: different surfaces, lighting conditions
- **Vary angles**: slightly different camera positions
- **Vary distances**: close-up and farther away shots
- **Include edge cases**: partially occluded tubes, different orientations
- **Match deployment conditions**: use the same camera/environment you'll deploy in

## Demo

Try the model: [**Transparent Tube Classifier Demo**](https://huggingface.co/spaces/Siddanna/transparent-tube-classifier-demo)

## Citation

```bibtex
@misc{transparent-tube-classifier,
  title={Transparent Tube Classifier},
  author={Siddanna},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/Siddanna/transparent-tube-classifier}
}
```