---
language: en
license: apache-2.0
tags:
  - text-classification
  - sentiment-analysis
  - distilbert
  - fine-tuned
datasets:
  - imdb
metrics:
  - accuracy
  - f1
---

# DistilBERT IMDb Sentiment Classifier

A fine-tuned DistilBERT model for binary sentiment analysis on movie reviews.

## Model Description
This model was fine-tuned from distilbert-base-uncased on 5,000 IMDb movie
reviews for 3 epochs. It classifies text as POSITIVE or NEGATIVE sentiment.

## Training Data
- Source: IMDb Large Movie Review Dataset (stored in SQLite, queried with pandas)
- Train: 5,000 samples | Validation: 1,000 samples
- Label balance: approximately 50% positive, 50% negative

## Evaluation Results
| Metric   | Score  |
|----------|--------|
| Accuracy | 88.4%  |   <- replace with your actual score
| F1 Score | 0.893  |   <- replace with your actual score

## Baseline Comparison
| Model                          | Accuracy |
|--------------------------------|----------|
| TF-IDF + Logistic Regression   | 86.4%    |
| DistilBERT (this model)        | 92.3%    |

## Intended Use
Product review analysis, feedback classification, general English sentiment tasks.

## Limitations and Bias
- Trained only on English movie reviews  performance on other domains may vary
- May not handle Urdu, Roman Urdu, or code-switched text well
- Sarcasm with no obvious negative words may be misclassified
- Very short texts (under 5 words) have lower confidence scores

## How to Use
python
from transformers import pipeline
classifier = pipeline('text-classification', model='YOUR-USERNAME/distilbert-imdb-sentiment')
result = classifier('This movie was absolutely incredible!')
# Output: [{'label': 'POSITIVE', 'score': 0.997}]