Upload README.md

205c64a verified 22 days ago

4.77 kB

	---
	library_name: transformers
	license: apache-2.0
	tags:
	- image-classification
	- dinov2
	- vision
	- tube-classification
	- manufacturing
	datasets:
	- Siddanna/transparent-tube-dataset
	base_model:
	- facebook/dinov2-base
	pipeline_tag: image-classification
	---

	# Transparent Tube Classifier

	A binary image classifier that distinguishes between:
	- transparent_alone 🧪 — A transparent tube by itself
	- transparent_with_blue 🧪💙 — A transparent tube paired with a blue tube

	## Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Base Model \| [facebook/dinov2-base](https://huggingface.co/facebook/dinov2-base) (ViT-B/14, 86.6M params) \|
	\| Training Method \| Linear probe (frozen backbone + trained classifier head) \|
	\| Training Dataset \| [Siddanna/transparent-tube-dataset](https://huggingface.co/datasets/Siddanna/transparent-tube-dataset) \|
	\| Accuracy \| 100% on test set \|
	\| Loss \| 0.0014 \|
	\| Image Size \| 256×256 (DINOv2 default) \|
	\| License \| Apache 2.0 \|

	## Quick Start

	### Using Pipeline (Easiest)

	```python
	from transformers import pipeline

	classifier = pipeline("image-classification", model="Siddanna/transparent-tube-classifier")
	result = classifier("your_tube_image.jpg")
	print(result)
	# [{'label': 'transparent_with_blue', 'score': 0.99}, {'label': 'transparent_alone', 'score': 0.01}]
	```

	### Manual Inference

	```python
	from transformers import AutoImageProcessor, AutoModelForImageClassification
	from PIL import Image
	import torch

	# Load model and processor
	model = AutoModelForImageClassification.from_pretrained("Siddanna/transparent-tube-classifier")
	processor = AutoImageProcessor.from_pretrained("Siddanna/transparent-tube-classifier")

	# Load and classify image
	image = Image.open("your_tube_image.jpg")
	inputs = processor(image, return_tensors="pt")

	with torch.no_grad():
	logits = model(**inputs).logits

	predicted_class = logits.argmax(-1).item()
	label = model.config.id2label[predicted_class]
	confidence = torch.softmax(logits, dim=-1)[0][predicted_class].item()

	print(f"Prediction: {label} (confidence: {confidence:.2%})")
	```

	## Training Details

	### Architecture
	- Base: DINOv2-base (Vision Transformer B/14), pretrained on LVD-142M (142M curated images)
	- Head: Linear classifier (768 → 2)
	- Method: Linear probe — backbone is frozen, only the classification head is trained
	- Why DINOv2?: DINOv2's global self-attention captures the full image context, which is critical for detecting whether a blue tube is present anywhere in the scene alongside the transparent tube

	### Hyperparameters
	- Learning rate: `1e-3` (with cosine schedule)
	- Warmup steps: 50
	- Batch size: 16
	- Weight decay: 0.01
	- Training epochs: 4 (converged at epoch 1)

	### Data Augmentations
	- RandomResizedCrop (scale 0.7-1.0)
	- RandomHorizontalFlip
	- RandomRotation (±15°)
	- ColorJitter (brightness=0.3, contrast=0.3, saturation=0.2, hue=0.05)

	### Training Curves
	\| Epoch \| Train Loss \| Eval Loss \| Eval Accuracy \|
	\|---\|---\|---\|---\|
	\| 1 \| 0.032 \| 0.019 \| 100% \|
	\| 2 \| 0.011 \| 0.002 \| 100% \|
	\| 3 \| 0.002 \| 0.001 \| 100% \|
	\| 4 \| 0.004 \| 0.010 \| 99.5% \|

	## For Production Use with Real Images

	The model is currently trained on synthetic data. For best results with your actual tubes:

	### Step 1: Collect Real Photos
	Take 50-100+ photos per class of your actual tubes:
	```
	data/
	├── train/
	│ ├── transparent_alone/ # Photos of transparent tube alone
	│ └── transparent_with_blue/ # Photos of transparent + blue tube
	└── test/
	├── transparent_alone/
	└── transparent_with_blue/
	```

	### Step 2: Re-train
	```python
	# Clone the training script
	# Option A: Linear probe (fast, good with 50+ images/class)
	python train.py --data_dir ./data --freeze_backbone --hub_model_id your-username/tube-classifier

	# Option B: Full fine-tune (better with 200+ images/class)
	python train.py --data_dir ./data --learning_rate 5e-5 --hub_model_id your-username/tube-classifier
	```

	### Tips for Collecting Good Training Data
	- Vary backgrounds: different surfaces, lighting conditions
	- Vary angles: slightly different camera positions
	- Vary distances: close-up and farther away shots
	- Include edge cases: partially occluded tubes, different orientations
	- Match deployment conditions: use the same camera/environment you'll deploy in

	## Demo

	Try the model: [Transparent Tube Classifier Demo](https://huggingface.co/spaces/Siddanna/transparent-tube-classifier-demo)

	## Citation

	```bibtex
	@misc{transparent-tube-classifier,
	title={Transparent Tube Classifier},
	author={Siddanna},
	year={2024},
	publisher={Hugging Face},
	url={https://huggingface.co/Siddanna/transparent-tube-classifier}
	}
	```