SynShade: Insider Threat Detection Model
Model Description
SynShade is a fine-tuned BERT model designed to detect insider threat recruitment activity in dark web communications. The model analyzes Telegram chatter to distinguish between general advertising content and active insider recruitment attempts.
Model Type: Text Classification (Binary)
Base Model: Small BERT (small_bert/bert_en_uncased_L-4_H-256_A-4/2)
Framework: TensorFlow
Language: English
Intended Use
Primary Use Cases
- Monitoring dark web forums and channels for insider threat activity
- Early detection of recruitment patterns targeting insiders
- Security intelligence and threat hunting operations
- Cybersecurity research
Out-of-Scope Use
- Real-time production deployment without human oversight
- Sole decision-making tool for legal or enforcement actions
- Analysis of languages other than English
- General-purpose text classification
Model Architecture
- Layers: 4 transformer layers
- Hidden Size: 256
- Attention Heads: 4
- Output: Sigmoid activation for binary classification
- Classes:
0: Advertising/General Content1: Insider Recruitment Activity
Training Data
The model was trained on a custom dataset of Telegram communications collected from dark web sources. The dataset contains labeled examples of:
- General advertising and promotional content
- Insider recruitment attempts and threat actor communications
Dataset Size:
- Training: 6,062 samples
- Validation: 1,299 samples
- Test: 1,300 samples
- Total: 8,661 samples
Note: Due to the sensitive nature of the data, the training dataset is not publicly available.
Training Procedure
Preprocessing
- Text tokenization using BERT uncased tokenizer (
bert-base-uncased) - Maximum sequence length: 128 tokens
- Truncation and padding applied
- Binary label encoding (Recruiting=1, Other=0)
Training Hyperparameters
- Optimizer: Adam
- Learning Rate: 3e-5
- Batch Size: 32
- Epochs: 10 (with early stopping)
- Loss Function: Binary Cross-Entropy
- Early Stopping: Patience of 3 epochs on validation loss
- Dataset Split: 70% train, 15% validation, 15% test
Limitations
- The model is trained specifically on dark web Telegram communications and may not generalize well to other platforms or communication styles
- Performance depends heavily on preprocessing matching the training procedure
- Language-specific to English communications
- May produce false positives/negatives and should not be used as the sole decision-making tool
- Requires domain expertise to interpret results in context
Ethical Considerations
- Privacy: This model analyzes communications from public dark web sources. Users must ensure compliance with applicable privacy laws and regulations
- Bias: The model reflects patterns in its training data, which may contain biases
- Misuse Potential: Should only be used for defensive cybersecurity purposes
- Human Oversight: Predictions should be reviewed by qualified security professionals
Citation
@software{synshade2026,
title={SynShade: Insider Threat Detection in Dark Web Communications},
author={[Wong Hau Pepelu]},
year={2026},
url={https://huggingface.co/wong-hau-pepelu/synshade-insider-threat-detector}
}
Contact
For questions or issues, please open an issue on GitHub or contact [ganymede_debian@outlook.com].
Disclaimer
This model is provided for cybersecurity research and defensive purposes only. The authors and distributors are not responsible for any misuse of this tool. Users are responsible for ensuring their use complies with all applicable laws and regulations.