--- title: TruthLens emoji: 🔍 colorFrom: blue colorTo: indigo sdk: streamlit sdk_version: 1.31.0 python_version: 3.10.13 app_file: app.py pinned: false --- # TruthLens: Advanced Fake News Detection Pipeline TruthLens is an end-to-end fake news detection system that moves beyond simple machine learning probabilities. It employs a robust **5-signal weighted scoring framework** built on journalistic standards, combining deep learning models (DistilBERT, RoBERTa), sequence models (LSTM), statistical models (Logistic Regression), and heuristic analysis to deliver explainable verdicts. ## 🌟 Key Features * **5-Signal Scoring Framework:** * **Source Credibility (30%):** Evaluates outlet reputation, author presence, and source corroboration, including typosquatting checks. * **Claim Verification (30%):** Combines AI probability with spaCy-based Named Entity Recognition (NER) and quote attribution analysis. * **Linguistic Quality (20%):** Detects sensationalism, superlatives, passive voice, and uses DistilBERT to check if the headline contradicts the body. * **Freshness (10%):** Contextual and date-based temporal scoring to detect outdated information. * **AI Model Consensus (10%):** Ensemble voting from Logistic Regression, LSTM, DistilBERT, and RoBERTa. * **Adversarial Guardrails:** Hard caps and overrides for highly suspicious patterns (Triple Anonymity, Uncited Statistics, Headline Contradictions). * **Live Web Corroboration:** RAG (Retrieval-Augmented Generation) pipeline using live search to verify unambiguous claims. * **TruthLens UI:** A sleek, dark/light mode adaptable Streamlit dashboard providing detailed explainability down to the specific signals and deductions. --- ## 📁 Project Structure ```text fake_news_detection/ ├── app.py # Streamlit frontend (TruthLens UI) ├── run_pipeline.py # Main script to run pipeline stages ├── requirements.txt # Python dependencies ├── src/ │ ├── stage1_ingestion.py # Downloads and prepares datasets │ ├── stage2_preprocessing.py# Cleans text, tokenizes, and saves artifacts │ ├── stage3_training.py # Trains models (LR, LSTM, DistilBERT, RoBERTa) │ ├── stage4_inference.py # The 5-signal scoring engine and prediction logic │ └── utils/ │ └── rag_retrieval.py # Live web search corroboration functions ├── data/ # Raw and processed datasets (created during execution) └── models/ # Trained models and vectorizers (created during execution) ``` --- ## 🚀 Getting Started ### 1. Installation Ensure you have Python 3.8+ installed. Install the required dependencies: ```bash pip install -r requirements.txt python -m spacy download en_core_web_sm ``` ### 2. Running the Pipeline The project is divided into stages. You can run the entire pipeline end-to-end, or run specific stages individually using `run_pipeline.py`. **To run the complete training pipeline (Stages 1 to 3):** *Note: This will download datasets, preprocess them, and train all models. It may take a significant amount of time depending on your hardware.* ```bash python run_pipeline.py --stage 1 2 3 ``` **To run individual stages:** * **Stage 1: Data Ingestion** Downloads and formats the necessary datasets (e.g., LIAR, ISOT). ```bash python run_pipeline.py --stage 1 ``` * **Stage 2: Preprocessing** Cleans the text, maps verdicts to binary labels, and prepares DataFrames for training. ```bash python run_pipeline.py --stage 2 ``` * **Stage 3: Training** Trains the ensemble: Logistic Regression, LSTM, DistilBERT, and RoBERTa. Saves the models to the `/models` directory. ```bash python run_pipeline.py --stage 3 ``` * **Stage 4: Evaluation** Evaluates the trained pipeline on the holdout test set using the 5-signal inference framework. ```bash python run_pipeline.py --eval ``` --- ## 🖥️ Running the Application Once the models are trained (or if you already have the pre-trained weights in the `/models` directory), you can launch the TruthLens UI. ```bash python -m streamlit run app.py ``` This will start a local web server (usually at `http://localhost:8501`). ### Using the App: 1. **Paste text or provide a URL:** You can paste the raw text of an article (with or without a headline) or simply provide a URL for the app to parse automatically. 2. **Select depth:** Choose Quick, Standard, or Deep analysis. 3. **View Results:** Explore the four-tier verdict (True, Uncertain, Likely False, False), signal breakdown, adversarial flags, and live web corroboration results.