---
library_name: transformers
license: apache-2.0
tags:
  - image-classification
  - dinov2
  - vision
  - tube-classification
  - manufacturing
datasets:
  - Siddanna/transparent-tube-dataset
base_model:
  - facebook/dinov2-base
pipeline_tag: image-classification
---

# Transparent Tube Classifier

A binary image classifier that distinguishes between:
- **transparent_alone** 🧪 — A transparent tube by itself
- **transparent_with_blue** 🧪💙 — A transparent tube paired with a blue tube

## Model Details

| Property | Value |
|---|---|
| **Base Model** | [facebook/dinov2-base](https://huggingface.co/facebook/dinov2-base) (ViT-B/14, 86.6M params) |
| **Training Method** | Linear probe (frozen backbone + trained classifier head) |
| **Training Dataset** | [Siddanna/transparent-tube-dataset](https://huggingface.co/datasets/Siddanna/transparent-tube-dataset) |
| **Accuracy** | **100%** on test set |
| **Loss** | 0.0014 |
| **Image Size** | 256×256 (DINOv2 default) |
| **License** | Apache 2.0 |

## Quick Start

### Using Pipeline (Easiest)

```python
from transformers import pipeline

classifier = pipeline("image-classification", model="Siddanna/transparent-tube-classifier")
result = classifier("your_tube_image.jpg")
print(result)
# [{'label': 'transparent_with_blue', 'score': 0.99}, {'label': 'transparent_alone', 'score': 0.01}]
```

### Manual Inference

```python
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch

# Load model and processor
model = AutoModelForImageClassification.from_pretrained("Siddanna/transparent-tube-classifier")
processor = AutoImageProcessor.from_pretrained("Siddanna/transparent-tube-classifier")

# Load and classify image
image = Image.open("your_tube_image.jpg")
inputs = processor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

predicted_class = logits.argmax(-1).item()
label = model.config.id2label[predicted_class]
confidence = torch.softmax(logits, dim=-1)[0][predicted_class].item()

print(f"Prediction: {label} (confidence: {confidence:.2%})")
```

## Training Details

### Architecture
- **Base**: DINOv2-base (Vision Transformer B/14), pretrained on LVD-142M (142M curated images)
- **Head**: Linear classifier (768 → 2)
- **Method**: Linear probe — backbone is frozen, only the classification head is trained
- **Why DINOv2?**: DINOv2's global self-attention captures the full image context, which is critical for detecting whether a blue tube is present anywhere in the scene alongside the transparent tube

### Hyperparameters
- Learning rate: `1e-3` (with cosine schedule)
- Warmup steps: 50
- Batch size: 16
- Weight decay: 0.01
- Training epochs: 4 (converged at epoch 1)

### Data Augmentations
- RandomResizedCrop (scale 0.7-1.0)
- RandomHorizontalFlip
- RandomRotation (±15°)
- ColorJitter (brightness=0.3, contrast=0.3, saturation=0.2, hue=0.05)

### Training Curves
| Epoch | Train Loss | Eval Loss | Eval Accuracy |
|---|---|---|---|
| 1 | 0.032 | 0.019 | **100%** |
| 2 | 0.011 | 0.002 | **100%** |
| 3 | 0.002 | 0.001 | **100%** |
| 4 | 0.004 | 0.010 | 99.5% |

## For Production Use with Real Images

The model is currently trained on **synthetic data**. For best results with your actual tubes:

### Step 1: Collect Real Photos
Take 50-100+ photos per class of your actual tubes:
```
data/
├── train/
│   ├── transparent_alone/     # Photos of transparent tube alone
│   └── transparent_with_blue/ # Photos of transparent + blue tube
└── test/
    ├── transparent_alone/
    └── transparent_with_blue/
```

### Step 2: Re-train
```python
# Clone the training script
# Option A: Linear probe (fast, good with 50+ images/class)
python train.py --data_dir ./data --freeze_backbone --hub_model_id your-username/tube-classifier

# Option B: Full fine-tune (better with 200+ images/class)
python train.py --data_dir ./data --learning_rate 5e-5 --hub_model_id your-username/tube-classifier
```

### Tips for Collecting Good Training Data
- **Vary backgrounds**: different surfaces, lighting conditions
- **Vary angles**: slightly different camera positions
- **Vary distances**: close-up and farther away shots
- **Include edge cases**: partially occluded tubes, different orientations
- **Match deployment conditions**: use the same camera/environment you'll deploy in

## Demo

Try the model: [**Transparent Tube Classifier Demo**](https://huggingface.co/spaces/Siddanna/transparent-tube-classifier-demo)

## Citation

```bibtex
@misc{transparent-tube-classifier,
  title={Transparent Tube Classifier},
  author={Siddanna},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/Siddanna/transparent-tube-classifier}
}
```