---
title: TruthLens
emoji: 🔍
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: 1.31.0
python_version: 3.10.13
app_file: app.py
pinned: false
---
# TruthLens: Advanced Fake News Detection Pipeline

TruthLens is an end-to-end fake news detection system that moves beyond simple machine learning probabilities. It employs a robust **5-signal weighted scoring framework** built on journalistic standards, combining deep learning models (DistilBERT, RoBERTa), sequence models (LSTM), statistical models (Logistic Regression), and heuristic analysis to deliver explainable verdicts.

## 🌟 Key Features

*   **5-Signal Scoring Framework:**
    *   **Source Credibility (30%):** Evaluates outlet reputation, author presence, and source corroboration, including typosquatting checks.
    *   **Claim Verification (30%):** Combines AI probability with spaCy-based Named Entity Recognition (NER) and quote attribution analysis.
    *   **Linguistic Quality (20%):** Detects sensationalism, superlatives, passive voice, and uses DistilBERT to check if the headline contradicts the body.
    *   **Freshness (10%):** Contextual and date-based temporal scoring to detect outdated information.
    *   **AI Model Consensus (10%):** Ensemble voting from Logistic Regression, LSTM, DistilBERT, and RoBERTa.
*   **Adversarial Guardrails:** Hard caps and overrides for highly suspicious patterns (Triple Anonymity, Uncited Statistics, Headline Contradictions).
*   **Live Web Corroboration:** RAG (Retrieval-Augmented Generation) pipeline using live search to verify unambiguous claims.
*   **TruthLens UI:** A sleek, dark/light mode adaptable Streamlit dashboard providing detailed explainability down to the specific signals and deductions.

---

## 📁 Project Structure

```text
fake_news_detection/
├── app.py                     # Streamlit frontend (TruthLens UI)
├── run_pipeline.py            # Main script to run pipeline stages
├── requirements.txt           # Python dependencies
├── src/
│   ├── stage1_ingestion.py    # Downloads and prepares datasets
│   ├── stage2_preprocessing.py# Cleans text, tokenizes, and saves artifacts
│   ├── stage3_training.py     # Trains models (LR, LSTM, DistilBERT, RoBERTa)
│   ├── stage4_inference.py    # The 5-signal scoring engine and prediction logic
│   └── utils/
│       └── rag_retrieval.py   # Live web search corroboration functions
├── data/                      # Raw and processed datasets (created during execution)
└── models/                    # Trained models and vectorizers (created during execution)
```

---

## 🚀 Getting Started

### 1. Installation

Ensure you have Python 3.8+ installed. Install the required dependencies:

```bash
pip install -r requirements.txt
python -m spacy download en_core_web_sm
```

### 2. Running the Pipeline

The project is divided into stages. You can run the entire pipeline end-to-end, or run specific stages individually using `run_pipeline.py`.

**To run the complete training pipeline (Stages 1 to 3):**
*Note: This will download datasets, preprocess them, and train all models. It may take a significant amount of time depending on your hardware.*

```bash
python run_pipeline.py --stage 1 2 3
```

**To run individual stages:**

*   **Stage 1: Data Ingestion**
    Downloads and formats the necessary datasets (e.g., LIAR, ISOT).
    ```bash
    python run_pipeline.py --stage 1
    ```

*   **Stage 2: Preprocessing**
    Cleans the text, maps verdicts to binary labels, and prepares DataFrames for training.
    ```bash
    python run_pipeline.py --stage 2
    ```

*   **Stage 3: Training**
    Trains the ensemble: Logistic Regression, LSTM, DistilBERT, and RoBERTa. Saves the models to the `/models` directory.
    ```bash
    python run_pipeline.py --stage 3
    ```

*   **Stage 4: Evaluation**
    Evaluates the trained pipeline on the holdout test set using the 5-signal inference framework.
    ```bash
    python run_pipeline.py --eval
    ```

---

## 🖥️ Running the Application

Once the models are trained (or if you already have the pre-trained weights in the `/models` directory), you can launch the TruthLens UI.

```bash
python -m streamlit run app.py
```

This will start a local web server (usually at `http://localhost:8501`). 

### Using the App:
1.  **Paste text or provide a URL:** You can paste the raw text of an article (with or without a headline) or simply provide a URL for the app to parse automatically.
2.  **Select depth:** Choose Quick, Standard, or Deep analysis.
3.  **View Results:** Explore the four-tier verdict (True, Uncertain, Likely False, False), signal breakdown, adversarial flags, and live web corroboration results.