--- library_name: transformers license: apache-2.0 tags: - image-classification - dinov2 - vision - tube-classification - manufacturing datasets: - Siddanna/transparent-tube-dataset base_model: - facebook/dinov2-base pipeline_tag: image-classification --- # Transparent Tube Classifier A binary image classifier that distinguishes between: - **transparent_alone** ๐Ÿงช โ€” A transparent tube by itself - **transparent_with_blue** ๐Ÿงช๐Ÿ’™ โ€” A transparent tube paired with a blue tube ## Model Details | Property | Value | |---|---| | **Base Model** | [facebook/dinov2-base](https://huggingface.co/facebook/dinov2-base) (ViT-B/14, 86.6M params) | | **Training Method** | Linear probe (frozen backbone + trained classifier head) | | **Training Dataset** | [Siddanna/transparent-tube-dataset](https://huggingface.co/datasets/Siddanna/transparent-tube-dataset) | | **Accuracy** | **100%** on test set | | **Loss** | 0.0014 | | **Image Size** | 256ร—256 (DINOv2 default) | | **License** | Apache 2.0 | ## Quick Start ### Using Pipeline (Easiest) ```python from transformers import pipeline classifier = pipeline("image-classification", model="Siddanna/transparent-tube-classifier") result = classifier("your_tube_image.jpg") print(result) # [{'label': 'transparent_with_blue', 'score': 0.99}, {'label': 'transparent_alone', 'score': 0.01}] ``` ### Manual Inference ```python from transformers import AutoImageProcessor, AutoModelForImageClassification from PIL import Image import torch # Load model and processor model = AutoModelForImageClassification.from_pretrained("Siddanna/transparent-tube-classifier") processor = AutoImageProcessor.from_pretrained("Siddanna/transparent-tube-classifier") # Load and classify image image = Image.open("your_tube_image.jpg") inputs = processor(image, return_tensors="pt") with torch.no_grad(): logits = model(**inputs).logits predicted_class = logits.argmax(-1).item() label = model.config.id2label[predicted_class] confidence = torch.softmax(logits, dim=-1)[0][predicted_class].item() print(f"Prediction: {label} (confidence: {confidence:.2%})") ``` ## Training Details ### Architecture - **Base**: DINOv2-base (Vision Transformer B/14), pretrained on LVD-142M (142M curated images) - **Head**: Linear classifier (768 โ†’ 2) - **Method**: Linear probe โ€” backbone is frozen, only the classification head is trained - **Why DINOv2?**: DINOv2's global self-attention captures the full image context, which is critical for detecting whether a blue tube is present anywhere in the scene alongside the transparent tube ### Hyperparameters - Learning rate: `1e-3` (with cosine schedule) - Warmup steps: 50 - Batch size: 16 - Weight decay: 0.01 - Training epochs: 4 (converged at epoch 1) ### Data Augmentations - RandomResizedCrop (scale 0.7-1.0) - RandomHorizontalFlip - RandomRotation (ยฑ15ยฐ) - ColorJitter (brightness=0.3, contrast=0.3, saturation=0.2, hue=0.05) ### Training Curves | Epoch | Train Loss | Eval Loss | Eval Accuracy | |---|---|---|---| | 1 | 0.032 | 0.019 | **100%** | | 2 | 0.011 | 0.002 | **100%** | | 3 | 0.002 | 0.001 | **100%** | | 4 | 0.004 | 0.010 | 99.5% | ## For Production Use with Real Images The model is currently trained on **synthetic data**. For best results with your actual tubes: ### Step 1: Collect Real Photos Take 50-100+ photos per class of your actual tubes: ``` data/ โ”œโ”€โ”€ train/ โ”‚ โ”œโ”€โ”€ transparent_alone/ # Photos of transparent tube alone โ”‚ โ””โ”€โ”€ transparent_with_blue/ # Photos of transparent + blue tube โ””โ”€โ”€ test/ โ”œโ”€โ”€ transparent_alone/ โ””โ”€โ”€ transparent_with_blue/ ``` ### Step 2: Re-train ```python # Clone the training script # Option A: Linear probe (fast, good with 50+ images/class) python train.py --data_dir ./data --freeze_backbone --hub_model_id your-username/tube-classifier # Option B: Full fine-tune (better with 200+ images/class) python train.py --data_dir ./data --learning_rate 5e-5 --hub_model_id your-username/tube-classifier ``` ### Tips for Collecting Good Training Data - **Vary backgrounds**: different surfaces, lighting conditions - **Vary angles**: slightly different camera positions - **Vary distances**: close-up and farther away shots - **Include edge cases**: partially occluded tubes, different orientations - **Match deployment conditions**: use the same camera/environment you'll deploy in ## Demo Try the model: [**Transparent Tube Classifier Demo**](https://huggingface.co/spaces/Siddanna/transparent-tube-classifier-demo) ## Citation ```bibtex @misc{transparent-tube-classifier, title={Transparent Tube Classifier}, author={Siddanna}, year={2024}, publisher={Hugging Face}, url={https://huggingface.co/Siddanna/transparent-tube-classifier} } ```