--- language: en license: apache-2.0 tags: - text-classification - sentiment-analysis - distilbert - fine-tuned datasets: - imdb metrics: - accuracy - f1 --- # DistilBERT IMDb Sentiment Classifier A fine-tuned DistilBERT model for binary sentiment analysis on movie reviews. ## Model Description This model was fine-tuned from distilbert-base-uncased on 5,000 IMDb movie reviews for 3 epochs. It classifies text as POSITIVE or NEGATIVE sentiment. ## Training Data - Source: IMDb Large Movie Review Dataset (stored in SQLite, queried with pandas) - Train: 5,000 samples | Validation: 1,000 samples - Label balance: approximately 50% positive, 50% negative ## Evaluation Results | Metric | Score | |----------|--------| | Accuracy | 88.4% | <- replace with your actual score | F1 Score | 0.893 | <- replace with your actual score ## Baseline Comparison | Model | Accuracy | |--------------------------------|----------| | TF-IDF + Logistic Regression | 86.4% | | DistilBERT (this model) | 92.3% | ## Intended Use Product review analysis, feedback classification, general English sentiment tasks. ## Limitations and Bias - Trained only on English movie reviews performance on other domains may vary - May not handle Urdu, Roman Urdu, or code-switched text well - Sarcasm with no obvious negative words may be misclassified - Very short texts (under 5 words) have lower confidence scores ## How to Use python from transformers import pipeline classifier = pipeline('text-classification', model='YOUR-USERNAME/distilbert-imdb-sentiment') result = classifier('This movie was absolutely incredible!') # Output: [{'label': 'POSITIVE', 'score': 0.997}]