File size: 4,773 Bytes
006e0ff
 
205c64a
 
 
 
 
 
 
 
 
 
 
 
006e0ff
 
205c64a
006e0ff
205c64a
 
 
006e0ff
 
 
205c64a
 
 
 
 
 
 
 
 
006e0ff
205c64a
006e0ff
205c64a
006e0ff
205c64a
 
006e0ff
205c64a
 
 
 
 
006e0ff
205c64a
006e0ff
205c64a
 
 
 
006e0ff
205c64a
 
 
006e0ff
205c64a
 
 
006e0ff
205c64a
 
006e0ff
205c64a
 
 
006e0ff
205c64a
 
006e0ff
 
 
205c64a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
library_name: transformers
license: apache-2.0
tags:
  - image-classification
  - dinov2
  - vision
  - tube-classification
  - manufacturing
datasets:
  - Siddanna/transparent-tube-dataset
base_model:
  - facebook/dinov2-base
pipeline_tag: image-classification
---

# Transparent Tube Classifier

A binary image classifier that distinguishes between:
- **transparent_alone** πŸ§ͺ β€” A transparent tube by itself
- **transparent_with_blue** πŸ§ͺπŸ’™ β€” A transparent tube paired with a blue tube

## Model Details

| Property | Value |
|---|---|
| **Base Model** | [facebook/dinov2-base](https://huggingface.co/facebook/dinov2-base) (ViT-B/14, 86.6M params) |
| **Training Method** | Linear probe (frozen backbone + trained classifier head) |
| **Training Dataset** | [Siddanna/transparent-tube-dataset](https://huggingface.co/datasets/Siddanna/transparent-tube-dataset) |
| **Accuracy** | **100%** on test set |
| **Loss** | 0.0014 |
| **Image Size** | 256Γ—256 (DINOv2 default) |
| **License** | Apache 2.0 |

## Quick Start

### Using Pipeline (Easiest)

```python
from transformers import pipeline

classifier = pipeline("image-classification", model="Siddanna/transparent-tube-classifier")
result = classifier("your_tube_image.jpg")
print(result)
# [{'label': 'transparent_with_blue', 'score': 0.99}, {'label': 'transparent_alone', 'score': 0.01}]
```

### Manual Inference

```python
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch

# Load model and processor
model = AutoModelForImageClassification.from_pretrained("Siddanna/transparent-tube-classifier")
processor = AutoImageProcessor.from_pretrained("Siddanna/transparent-tube-classifier")

# Load and classify image
image = Image.open("your_tube_image.jpg")
inputs = processor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

predicted_class = logits.argmax(-1).item()
label = model.config.id2label[predicted_class]
confidence = torch.softmax(logits, dim=-1)[0][predicted_class].item()

print(f"Prediction: {label} (confidence: {confidence:.2%})")
```

## Training Details

### Architecture
- **Base**: DINOv2-base (Vision Transformer B/14), pretrained on LVD-142M (142M curated images)
- **Head**: Linear classifier (768 β†’ 2)
- **Method**: Linear probe β€” backbone is frozen, only the classification head is trained
- **Why DINOv2?**: DINOv2's global self-attention captures the full image context, which is critical for detecting whether a blue tube is present anywhere in the scene alongside the transparent tube

### Hyperparameters
- Learning rate: `1e-3` (with cosine schedule)
- Warmup steps: 50
- Batch size: 16
- Weight decay: 0.01
- Training epochs: 4 (converged at epoch 1)

### Data Augmentations
- RandomResizedCrop (scale 0.7-1.0)
- RandomHorizontalFlip
- RandomRotation (Β±15Β°)
- ColorJitter (brightness=0.3, contrast=0.3, saturation=0.2, hue=0.05)

### Training Curves
| Epoch | Train Loss | Eval Loss | Eval Accuracy |
|---|---|---|---|
| 1 | 0.032 | 0.019 | **100%** |
| 2 | 0.011 | 0.002 | **100%** |
| 3 | 0.002 | 0.001 | **100%** |
| 4 | 0.004 | 0.010 | 99.5% |

## For Production Use with Real Images

The model is currently trained on **synthetic data**. For best results with your actual tubes:

### Step 1: Collect Real Photos
Take 50-100+ photos per class of your actual tubes:
```
data/
β”œβ”€β”€ train/
β”‚   β”œβ”€β”€ transparent_alone/     # Photos of transparent tube alone
β”‚   └── transparent_with_blue/ # Photos of transparent + blue tube
└── test/
    β”œβ”€β”€ transparent_alone/
    └── transparent_with_blue/
```

### Step 2: Re-train
```python
# Clone the training script
# Option A: Linear probe (fast, good with 50+ images/class)
python train.py --data_dir ./data --freeze_backbone --hub_model_id your-username/tube-classifier

# Option B: Full fine-tune (better with 200+ images/class)
python train.py --data_dir ./data --learning_rate 5e-5 --hub_model_id your-username/tube-classifier
```

### Tips for Collecting Good Training Data
- **Vary backgrounds**: different surfaces, lighting conditions
- **Vary angles**: slightly different camera positions
- **Vary distances**: close-up and farther away shots
- **Include edge cases**: partially occluded tubes, different orientations
- **Match deployment conditions**: use the same camera/environment you'll deploy in

## Demo

Try the model: [**Transparent Tube Classifier Demo**](https://huggingface.co/spaces/Siddanna/transparent-tube-classifier-demo)

## Citation

```bibtex
@misc{transparent-tube-classifier,
  title={Transparent Tube Classifier},
  author={Siddanna},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/Siddanna/transparent-tube-classifier}
}
```