Traffico - Fine-tuned on ATT&CK Data

Alt text

📋 Model Description

Traffico is a fine-tuned language model specialized in analyzing TCP/IP network traffic and detecting cyberattacks. It maps network flow patterns to the MITRE ATT&CK framework, enabling security teams to understand adversary tactics and techniques from network behavior alone.

The model is trained on synthetic datasets derived from real-world network traffic (CIC-IDS2017 + UNSW-NB15) and enriched with MITRE ATT&CK techniques. It can classify network flows as normal or malicious and provide ATT&CK-mapped threat classifications.

Base Model: Google Gemma 2.7B
Training Data: Synthetic dataset derived from ATT&CK® techniques, tactics, and procedures (TTPs)
Fine-tuning Approach: Supervised Fine-Tuning (SFT) using Unsloth for optimization and TRL's SFTTrainer

🎯 Use Cases

  • Network Intrusion Detection: Classify network flows as benign or malicious in real-time
  • Threat Intelligence: Map detected attacks to MITRE ATT&CK techniques and tactics
  • Security Monitoring: Analyze TCP/IP flows from network sensors and IDS systems
  • Incident Response: Understand adversary behavior patterns from network telemetry
  • Research: Study attack-to-technique mappings in security datasets

🚀 Quick Start

Installation

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("hypnonyx/Traffico")
model = AutoModelForCausalLM.from_pretrained("hypnonyx/Traffico")

Basic Usage

# Analizza un flusso di traffico di rete
network_flow = "Protocollo: tcp | Porta dst: 80 | Byte src: 480000 | Byte dst: 40 | Pacchetti: 5200 | Durata: 0.015s"

messages = [
    {
        "role": "system",
        "content": "Analizza il seguente flusso di traffico di rete TCP/IP. Classifica se è traffico normale o un attacco e indica la tecnica MITRE ATT&CK corrispondente."
    },
    {"role": "user", "content": network_flow},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.3)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Expected Output: Classification of the network flow (e.g., "DoS Attack - MITRE ATT&CK: Impact/Denial of Service")

📊 Training Details

Property Value
Base Model Google Gemma 2.7B
Training Framework Unsloth + TRL SFTTrainer
Training Dataset Synthetic ATT&CK-derived dataset
Dataset Size 10,000 examples
Techniques Covered Network traffic analysis (CIC-IDS2017 + UNSW-NB15)
Training Duration ~1 hour
Hardware 1x NVIDIA RTX 4090 GPU
Learning Rate 2e-5
Batch Size 16 (4 per device + 4 gradient accumulation steps)
LoRA Rank 64
Max Sequence Length 512 tokens
Training Steps 500 steps

📝 Dataset Information

The training dataset was created synthetically using data derived from the MITRE ATT&CK framework and network traffic analysis datasets (CIC-IDS2017 + UNSW-NB15). It includes:

  • Network Traffic Features: Protocol type, destination port, source/destination bytes, packet count, flow duration
  • Attack Classification: Binary and multi-class classification of normal vs. malicious traffic
  • MITRE ATT&CK Mapping: Techniques mapped to network-based attacks:
    • Reconnaissance: Port scanning, network sniffing
    • Initial Access: Brute force attacks on SSH, FTP, Telnet
    • Lateral Movement: Data exfiltration, command & control traffic
    • Impact: DoS/DDoS attacks, data theft
  • Attack Types Covered: DoS, DDoS, PortScan, Brute Force, Infiltration, Botnet, Web attacks
  • Dataset Split: 10,000 labeled examples for instruction-tuning

The synthetic data was processed to create instruction-following examples where the model learns to analyze network flows and map them to MITRE ATT&CK techniques and tactics.

⚠️ Limitations and Disclaimers

  • Not Exhaustive: This model, like the underlying ATT&CK framework, does not enumerate all possible adversary behaviors. There may be undisclosed or novel techniques not covered.
  • Research Use: While commercial use is permitted under the ATT&CK license, this model should be validated against your specific security requirements.
  • No Guarantee of Coverage: Using this model to address or cover categories of techniques will not guarantee comprehensive defensive coverage.
  • As-Is: This model is provided "as is" without any warranties or guarantees regarding accuracy, completeness, or fitness for a particular purpose.

📜 License

This model is based on Google Gemma 2.7B and incorporates data from the MITRE ATT&CK framework. Both licenses must be respected.

Gemma License

This model is built upon Google's Gemma model, which is governed by the Gemma Terms of Use.

Key Requirements:

  • This model can be used for research and commercial purposes
  • You must comply with Google's Gemma Terms of Use
  • You must ensure downstream usage complies with Gemma restrictions
  • You acknowledge and accept Gemma's usage policies and any applicable restrictions

For full details, see: https://ai.google.dev/gemma/terms

ATT&CK License Terms

© 2025 The MITRE Corporation. This work is reproduced and distributed with the permission of The MITRE Corporation.

The MITRE Corporation hereby grants you a non-exclusive, royalty-free license to use this model for research, development, and commercial purposes.

Full License Text:

LICENSE
The MITRE Corporation (MITRE) hereby grants you a non-exclusive, royalty-free 
license to use ATT&CK® for research, development, and commercial purposes. Any 
copy you make for such purposes is authorized provided that you reproduce MITRE's 
copyright designation and this license in any such copy.

"© 2025 The MITRE Corporation. This work is reproduced and distributed with the 
permission of The MITRE Corporation."

DISCLAIMERS
MITRE does not claim ATT&CK enumerates all possibilities for the types of actions 
and behaviors documented as part of its adversary model and framework of techniques. 
Using the information contained within ATT&CK to address or cover full categories 
of techniques will not guarantee full defensive coverage as there may be undisclosed 
techniques or variations on existing techniques not documented by ATT&CK.

ALL DOCUMENTS AND THE INFORMATION CONTAINED THEREIN ARE PROVIDED ON AN "AS IS" 
BASIS AND THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY 
(IF ANY), THE MITRE CORPORATION, ITS BOARD OF TRUSTEES, OFFICERS, AGENTS, AND 
EMPLOYEES, DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED 
TO ANY WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE ANY 
RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR 
PURPOSE.

Model Modifications

This derivative work combines:

  1. Google's Gemma 2.7B - the base language model
  2. MITRE ATT&CK - the training dataset and knowledge domain

The model is fine-tuned on synthetic ATT&CK-derived data to specialize in threat intelligence and adversary behavior understanding. Any further use, distribution, or modification must maintain attribution and comply with both Google's Gemma Terms of Use and the MITRE ATT&CK license.

🔗 References

👤 Author & Contact

Mirko P.
🤗 Hugging Face: @hypnonyx

🙏 Attribution

This model was created using the MITRE ATT&CK framework. We are grateful to The MITRE Corporation for making this valuable resource available to the research and security communities.


Last Updated: March 4, 2025
Model Version: 1.0

Downloads last month
252
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hypnonyx/traffico

Quantized
(186)
this model