TruthLens β Fake News Detection Model (RoBERTa)
TruthLens is an advanced RoBERTa-based Fake News Detection Transformer model fine-tuned by Divyanshu Chauhan.
This model is capable of analyzing news articles and social media text and classifying them as REAL or FAKE along with a confidence score.
It is trained on the WELFake Dataset containing over 72,000 labeled news samples, balanced and cleaned for high-accuracy text classification.
This model powers the TruthLens Flask Application, capable of real-time misinformation detection.
π§ Model Details
Model Developer
- Developed by: Divyanshu Chauhan
- Specialization: Artificial Intelligence & Machine Learning
- Project Purpose: Real-time misinformation detection using modern NLP
Model Type
- Architecture: RoBERTa-base (Transformer)
- Task: Binary Text Classification (Fake vs Real)
- Framework: PyTorch + HuggingFace Transformers
- Fine-tuned Model: This model
Model Capabilities
- Detects whether a news text is REAL or FAKE
- Provides a probability/confidence score
- Works on short and long news text
- Handles social media misinformation patterns
Training Details
β Dataset
- Name: WELFake
- Samples: ~72,000
- Balanced: Yes (after synthetic balancing)
- Labels:
0= Fake1= Real
Data Preprocessing Performed
- Removed null entries
- Merged title + text β combined_text
- Cleaned URLs, numbers, special characters
- Lowercasing and stopword removal
- Train/Test split
- Used original text for Transformer fine-tuning (Transformers do their own tokenization)
Training Setup
- Base Model: roberta-base
- Epochs: 3
- Max Length: 256 tokens
- Batch Size: 8
- Learning Rate: 2e-5
- Optimizer: AdamW
- Loss: Cross-Entropy
- Hardware: GPU (Google Colab / Tesla T4)
Model Performance
| Metric | Score |
|---|---|
| Accuracy | ~95% |
| F1 Score | ~95% |
| Precision | ~95% |
| Recall | ~95% |
This makes the model suitable for production-level applications.
Usage Example
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model_path = "divyanshu-chauhan-7786/fake-news-roberta"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)
clf = pipeline("text-classification", model=model, tokenizer=tokenizer)
text = "Breaking: Government announces new education reforms!"
result = clf(text, truncation=True, max_length=256)
print(result)
- Downloads last month
- 52
Model tree for divyanshu-chauhan-7786/fake-news-roberta
Base model
FacebookAI/roberta-base