SynShade: Insider Threat Detection Model

Model Description

SynShade is a fine-tuned BERT model designed to detect insider threat recruitment activity in dark web communications. The model analyzes Telegram chatter to distinguish between general advertising content and active insider recruitment attempts.

Model Type: Text Classification (Binary)
Base Model: Small BERT (small_bert/bert_en_uncased_L-4_H-256_A-4/2)
Framework: TensorFlow
Language: English

Intended Use

Primary Use Cases

  • Monitoring dark web forums and channels for insider threat activity
  • Early detection of recruitment patterns targeting insiders
  • Security intelligence and threat hunting operations
  • Cybersecurity research

Out-of-Scope Use

  • Real-time production deployment without human oversight
  • Sole decision-making tool for legal or enforcement actions
  • Analysis of languages other than English
  • General-purpose text classification

Model Architecture

  • Layers: 4 transformer layers
  • Hidden Size: 256
  • Attention Heads: 4
  • Output: Sigmoid activation for binary classification
  • Classes:
    • 0: Advertising/General Content
    • 1: Insider Recruitment Activity

Training Data

The model was trained on a custom dataset of Telegram communications collected from dark web sources. The dataset contains labeled examples of:

  • General advertising and promotional content
  • Insider recruitment attempts and threat actor communications

Dataset Size:

  • Training: 6,062 samples
  • Validation: 1,299 samples
  • Test: 1,300 samples
  • Total: 8,661 samples

Note: Due to the sensitive nature of the data, the training dataset is not publicly available.

Training Procedure

Preprocessing

  • Text tokenization using BERT uncased tokenizer (bert-base-uncased)
  • Maximum sequence length: 128 tokens
  • Truncation and padding applied
  • Binary label encoding (Recruiting=1, Other=0)

Training Hyperparameters

  • Optimizer: Adam
  • Learning Rate: 3e-5
  • Batch Size: 32
  • Epochs: 10 (with early stopping)
  • Loss Function: Binary Cross-Entropy
  • Early Stopping: Patience of 3 epochs on validation loss
  • Dataset Split: 70% train, 15% validation, 15% test

Limitations

  • The model is trained specifically on dark web Telegram communications and may not generalize well to other platforms or communication styles
  • Performance depends heavily on preprocessing matching the training procedure
  • Language-specific to English communications
  • May produce false positives/negatives and should not be used as the sole decision-making tool
  • Requires domain expertise to interpret results in context

Ethical Considerations

  • Privacy: This model analyzes communications from public dark web sources. Users must ensure compliance with applicable privacy laws and regulations
  • Bias: The model reflects patterns in its training data, which may contain biases
  • Misuse Potential: Should only be used for defensive cybersecurity purposes
  • Human Oversight: Predictions should be reviewed by qualified security professionals

Citation

@software{synshade2026,
  title={SynShade: Insider Threat Detection in Dark Web Communications},
  author={[Wong Hau Pepelu]},
  year={2026},
  url={https://huggingface.co/wong-hau-pepelu/synshade-insider-threat-detector}
}

Contact

For questions or issues, please open an issue on GitHub or contact [ganymede_debian@outlook.com].

Disclaimer

This model is provided for cybersecurity research and defensive purposes only. The authors and distributors are not responsible for any misuse of this tool. Users are responsible for ensuring their use complies with all applicable laws and regulations.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support