YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
๐ก๏ธ Insider Threats Detection System
A comprehensive real-time cyber threat detection dashboard that monitors user behavior to identify insider threats and anomalous activities using advanced machine learning techniques.
๐ Table of Contents
๐ฏ Overview
The Insider Threats Detection System is an enterprise-grade security solution that provides real-time monitoring and analysis of user activities to detect potential insider threats. The system combines traditional rule-based detection with cutting-edge machine learning algorithms to identify anomalous behavior patterns that may indicate malicious insider activities.
Key Capabilities
- Real-time Monitoring: Live dashboard with real-time threat detection
- Multi-layered Detection: 5 different detection algorithms working in parallel
- User Risk Assessment: Individual user risk scoring and blocking
- Interactive Simulation: Live simulation environment for testing
- Explainable AI: SHAP-based model explanations for transparency
โจ Features
๐ฎ Live Dashboard
- Real-time Metrics: Live event monitoring with instant updates
- Interactive Controls: Start/stop simulation with configurable parameters
- Risk Visualization: Dynamic charts showing risk distribution and trends
- User Management: Real-time user risk assessment and blocking
๐ Advanced Detection
- Number-based Detection: Statistical anomaly detection
- Pattern-based Detection: Behavioral pattern analysis
- Relationship-based Detection: User-resource relationship analysis
- VAE-based Detection: Variational Auto-Encoder deep learning
- Temporal Detection: LSTM Auto-Encoder for time series analysis
๐ Analytics & Reporting
- Risk Distribution Charts: Visual representation of threat levels
- Trend Analysis: Historical risk score trends
- User Risk Tables: Detailed user risk assessments
- Detection Type Analysis: Breakdown of detection methods
๐ ๏ธ Tech Stack
Frontend
- Dash: Python web framework for interactive dashboards
- Plotly: Interactive data visualization
- Bootstrap: Responsive UI components
Backend & ML
- Python 3.13: Core programming language
- TensorFlow + Keras: Deep learning frameworks
- Scikit-learn: Traditional ML algorithms
- Pandas + NumPy: Data processing and analysis
- NetworkX: Graph analysis and relationship modeling
Data & Visualization
- SHAP: Model explainability and interpretability
- Plotly: Interactive charts and graphs
- Faker: Synthetic data generation for testing
๐ Installation
Prerequisites
- Python 3.13+
- Windows/Linux/macOS
Quick Start
Clone the repository
git clone https://github.com/yourusername/insider-threats.git cd insider-threatsInstall dependencies
pip install -r requirements.txtRun the application
python app_live.pyAccess the dashboard Open your browser and navigate to:
http://localhost:8050
Detailed Installation
Create virtual environment (recommended)
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activateInstall dependencies
pip install -r requirements.txtVerify installation
python -c "import dash, tensorflow, pandas; print('โ All packages installed successfully!')"
๐ Usage
Starting the Dashboard
python app_live.py
Simulation Controls
Start Simulation: Click "Start Simulation" to begin live monitoring
Configure Parameters:
- Simulation Speed: Adjust the speed of event generation (0.1x to 10x)
- Duration: Set simulation duration in minutes
- Risk Threshold: Configure blocking threshold (50-100)
Monitor Results: View real-time metrics, charts, and user risk assessments
Key Dashboard Sections
- ๐ Real-time Metrics: Live event counts, blocked users, risk scores
- ๐ด Live Event Stream: Real-time event feed with risk indicators
- ๐ Risk Distribution: Pie charts showing risk level distribution
- ๐ Risk Trends: Time-series analysis of risk scores
- ๐ฅ User Risk Assessment: Detailed user risk tables with filtering
- ๐ Detection Analysis: Breakdown of detection method effectiveness
๐ Project Structure
insider-threats/
โโโ app_live.py # Main dashboard application
โโโ requirements.txt # Python dependencies
โโโ README.md # Project documentation
โโโ .gitignore # Git ignore rules
โโโ assets/
โ โโโ style.css # Dashboard styling
โโโ data/
โ โโโ raw/ # Raw LANL authentication logs
โ โโโ processed/ # Processed features and baselines
โ โโโ synthetic/ # Generated synthetic data
โโโ models/ # Trained ML models
โโโ src/ # Source code
โโโ __init__.py
โโโ adaptive_ml_detector.py # Traditional ML detection
โโโ anomaly_detector.py # Anomaly detection models
โโโ config.py # Configuration settings
โโโ data_generator.py # Synthetic data generation
โโโ data_loader.py # Data loading utilities
โโโ enhanced_adaptive_ml_detector.py # Enhanced ML detection
โโโ explainer.py # SHAP model explanations
โโโ feature_engineering.py # Feature extraction
โโโ graph_analyzer.py # Graph-based analysis
โโโ live_simulator.py # Live simulation engine
โโโ lstm_autoencoder.py # LSTM temporal detection
โโโ user_blocking_system.py # User blocking logic
โโโ vae_anomaly_detector.py # VAE-based detection
๐ฌ Detection Methods
1. Number-based Detection
- Purpose: Statistical anomaly detection
- Method: Isolation Forest, One-Class SVM
- Features: Access frequency, time patterns, resource usage
2. Pattern-based Detection
- Purpose: Behavioral pattern analysis
- Method: User behavior baselines, pattern matching
- Features: Login patterns, access sequences, time-based anomalies
3. Relationship-based Detection
- Purpose: User-resource relationship analysis
- Method: Graph analysis, relationship modeling
- Features: User-resource connections, access patterns
4. VAE-based Detection
- Purpose: Deep learning anomaly detection
- Method: Variational Auto-Encoder
- Features: Learned representations, reconstruction error
5. Temporal Detection
- Purpose: Time series anomaly detection
- Method: LSTM Auto-Encoder
- Features: Sequential patterns, temporal dependencies
๐ Data Sources
Real Data
- LANL Authentication Logs: Los Alamos National Laboratory dataset
- Format: Authentication events with user, resource, and timestamp information
- Size: 100,000+ authentication events
Synthetic Data
- Generated Users: 150+ simulated users with realistic behavior patterns
- Departments: IT, HR, Finance, Engineering, Marketing
- Roles: Admin, Manager, Employee, Contractor
- Anomaly Ratio: 10% anomalous activities
๐ง Configuration
Key Settings (src/config.py)
# Risk thresholds
RISK_THRESHOLDS = {
'low': (0, 40),
'medium': (40, 70),
'high': (70, 100)
}
# Model parameters
ISOLATION_FOREST_PARAMS = {
'n_estimators': 100,
'contamination': 0.1,
'random_state': 42
}
# Dashboard settings
DASHBOARD_PORT = 8050
DASHBOARD_DEBUG = True
๐ Performance Metrics
- Accuracy: 97.7% (Isolation Forest)
- F1-Score: 89.5% (Isolation Forest)
- Precision: 88.2% (Isolation Forest)
- Recall: 91.0% (Isolation Forest)
Development Setup
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests
python -m pytest tests/
# Run linting
flake8 src/
๐ Acknowledgments
- LANL Dataset: Los Alamos National Laboratory for providing authentication logs
- TensorFlow Team: For the excellent deep learning framework
- Scikit-learn Team: For comprehensive ML algorithms
- Dash Team: For the powerful dashboard framework
โ ๏ธ Disclaimer: This system is designed for educational and research purposes. Always ensure compliance with privacy laws and organizational policies when deploying in production environments.