Upload README.md

205c64a verified 22 days ago

4.77 kB

library_name: transformers
license: apache-2.0
tags:
  - image-classification
  - dinov2
  - vision
  - tube-classification
  - manufacturing
datasets:
  - Siddanna/transparent-tube-dataset
base_model:
  - facebook/dinov2-base
pipeline_tag: image-classification

Transparent Tube Classifier

A binary image classifier that distinguishes between:

transparent_alone 🧪 — A transparent tube by itself
transparent_with_blue 🧪💙 — A transparent tube paired with a blue tube

Model Details

Property	Value
Base Model	facebook/dinov2-base (ViT-B/14, 86.6M params)
Training Method	Linear probe (frozen backbone + trained classifier head)
Training Dataset	Siddanna/transparent-tube-dataset
Accuracy	100% on test set
Loss	0.0014
Image Size	256×256 (DINOv2 default)
License	Apache 2.0

Quick Start

Using Pipeline (Easiest)

from transformers import pipeline

classifier = pipeline("image-classification", model="Siddanna/transparent-tube-classifier")
result = classifier("your_tube_image.jpg")
print(result)
# [{'label': 'transparent_with_blue', 'score': 0.99}, {'label': 'transparent_alone', 'score': 0.01}]

Manual Inference

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch

# Load model and processor
model = AutoModelForImageClassification.from_pretrained("Siddanna/transparent-tube-classifier")
processor = AutoImageProcessor.from_pretrained("Siddanna/transparent-tube-classifier")

# Load and classify image
image = Image.open("your_tube_image.jpg")
inputs = processor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

predicted_class = logits.argmax(-1).item()
label = model.config.id2label[predicted_class]
confidence = torch.softmax(logits, dim=-1)[0][predicted_class].item()

print(f"Prediction: {label} (confidence: {confidence:.2%})")

Training Details

Architecture

Base: DINOv2-base (Vision Transformer B/14), pretrained on LVD-142M (142M curated images)
Head: Linear classifier (768 → 2)
Method: Linear probe — backbone is frozen, only the classification head is trained
Why DINOv2?: DINOv2's global self-attention captures the full image context, which is critical for detecting whether a blue tube is present anywhere in the scene alongside the transparent tube

Hyperparameters

Learning rate: 1e-3 (with cosine schedule)
Warmup steps: 50
Batch size: 16
Weight decay: 0.01
Training epochs: 4 (converged at epoch 1)

Data Augmentations

RandomResizedCrop (scale 0.7-1.0)
RandomHorizontalFlip
RandomRotation (±15°)
ColorJitter (brightness=0.3, contrast=0.3, saturation=0.2, hue=0.05)

Training Curves

Epoch	Train Loss	Eval Loss	Eval Accuracy
1	0.032	0.019	100%
2	0.011	0.002	100%
3	0.002	0.001	100%
4	0.004	0.010	99.5%

For Production Use with Real Images

The model is currently trained on synthetic data. For best results with your actual tubes:

Step 1: Collect Real Photos

Take 50-100+ photos per class of your actual tubes:

data/
├── train/
│   ├── transparent_alone/     # Photos of transparent tube alone
│   └── transparent_with_blue/ # Photos of transparent + blue tube
└── test/
    ├── transparent_alone/
    └── transparent_with_blue/

Step 2: Re-train

# Clone the training script
# Option A: Linear probe (fast, good with 50+ images/class)
python train.py --data_dir ./data --freeze_backbone --hub_model_id your-username/tube-classifier

# Option B: Full fine-tune (better with 200+ images/class)
python train.py --data_dir ./data --learning_rate 5e-5 --hub_model_id your-username/tube-classifier

Tips for Collecting Good Training Data

Vary backgrounds: different surfaces, lighting conditions
Vary angles: slightly different camera positions
Vary distances: close-up and farther away shots
Include edge cases: partially occluded tubes, different orientations
Match deployment conditions: use the same camera/environment you'll deploy in

Demo

Try the model: Transparent Tube Classifier Demo

Citation

@misc{transparent-tube-classifier,
  title={Transparent Tube Classifier},
  author={Siddanna},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/Siddanna/transparent-tube-classifier}
}