SynShade: Insider Threat Detection Model

Model Description

SynShade is a fine-tuned BERT model designed to detect insider threat recruitment activity in dark web communications. The model analyzes Telegram chatter to distinguish between general advertising content and active insider recruitment attempts.

Model Type: Text Classification (Binary)
Base Model: Small BERT (small_bert/bert_en_uncased_L-4_H-256_A-4/2)
Framework: TensorFlow
Language: English

Intended Use

Primary Use Cases

Monitoring dark web forums and channels for insider threat activity
Early detection of recruitment patterns targeting insiders
Security intelligence and threat hunting operations
Cybersecurity research

Out-of-Scope Use

Real-time production deployment without human oversight
Sole decision-making tool for legal or enforcement actions
Analysis of languages other than English
General-purpose text classification

Model Architecture

Layers: 4 transformer layers
Hidden Size: 256
Attention Heads: 4
Output: Sigmoid activation for binary classification
Classes:
- 0: Advertising/General Content
- 1: Insider Recruitment Activity

Training Data

The model was trained on a custom dataset of Telegram communications collected from dark web sources. The dataset contains labeled examples of:

General advertising and promotional content
Insider recruitment attempts and threat actor communications

Dataset Size:

Training: 6,062 samples
Validation: 1,299 samples
Test: 1,300 samples
Total: 8,661 samples

Note: Due to the sensitive nature of the data, the training dataset is not publicly available.

Training Procedure

Preprocessing

Text tokenization using BERT uncased tokenizer (bert-base-uncased)
Maximum sequence length: 128 tokens
Truncation and padding applied
Binary label encoding (Recruiting=1, Other=0)

Training Hyperparameters

Optimizer: Adam
Learning Rate: 3e-5
Batch Size: 32
Epochs: 10 (with early stopping)
Loss Function: Binary Cross-Entropy
Early Stopping: Patience of 3 epochs on validation loss
Dataset Split: 70% train, 15% validation, 15% test

Limitations

The model is trained specifically on dark web Telegram communications and may not generalize well to other platforms or communication styles
Performance depends heavily on preprocessing matching the training procedure
Language-specific to English communications
May produce false positives/negatives and should not be used as the sole decision-making tool
Requires domain expertise to interpret results in context

Ethical Considerations

Privacy: This model analyzes communications from public dark web sources. Users must ensure compliance with applicable privacy laws and regulations
Bias: The model reflects patterns in its training data, which may contain biases
Misuse Potential: Should only be used for defensive cybersecurity purposes
Human Oversight: Predictions should be reviewed by qualified security professionals

Citation

@software{synshade2026,
  title={SynShade: Insider Threat Detection in Dark Web Communications},
  author={[Wong Hau Pepelu]},
  year={2026},
  url={https://huggingface.co/wong-hau-pepelu/synshade-insider-threat-detector}
}

Contact

For questions or issues, please open an issue on GitHub or contact [ganymede_debian@outlook.com].

Disclaimer

This model is provided for cybersecurity research and defensive purposes only. The authors and distributors are not responsible for any misuse of this tool. Users are responsible for ensuring their use complies with all applicable laws and regulations.

Downloads last month: -; Downloads are not tracked for this model. How to track