Traffico - Fine-tuned on ATT&CK Data

📋 Model Description

Traffico is a fine-tuned language model specialized in analyzing TCP/IP network traffic and detecting cyberattacks. It maps network flow patterns to the MITRE ATT&CK framework, enabling security teams to understand adversary tactics and techniques from network behavior alone.

The model is trained on synthetic datasets derived from real-world network traffic (CIC-IDS2017 + UNSW-NB15) and enriched with MITRE ATT&CK techniques. It can classify network flows as normal or malicious and provide ATT&CK-mapped threat classifications.

Base Model: Google Gemma 2.7B
Training Data: Synthetic dataset derived from ATT&CK® techniques, tactics, and procedures (TTPs)
Fine-tuning Approach: Supervised Fine-Tuning (SFT) using Unsloth for optimization and TRL's SFTTrainer

🎯 Use Cases

Network Intrusion Detection: Classify network flows as benign or malicious in real-time
Threat Intelligence: Map detected attacks to MITRE ATT&CK techniques and tactics
Security Monitoring: Analyze TCP/IP flows from network sensors and IDS systems
Incident Response: Understand adversary behavior patterns from network telemetry
Research: Study attack-to-technique mappings in security datasets

🚀 Quick Start

Installation

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("hypnonyx/Traffico")
model = AutoModelForCausalLM.from_pretrained("hypnonyx/Traffico")

Basic Usage

# Analizza un flusso di traffico di rete
network_flow = "Protocollo: tcp | Porta dst: 80 | Byte src: 480000 | Byte dst: 40 | Pacchetti: 5200 | Durata: 0.015s"

messages = [
    {
        "role": "system",
        "content": "Analizza il seguente flusso di traffico di rete TCP/IP. Classifica se è traffico normale o un attacco e indica la tecnica MITRE ATT&CK corrispondente."
    },
    {"role": "user", "content": network_flow},
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.3)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Expected Output: Classification of the network flow (e.g., "DoS Attack - MITRE ATT&CK: Impact/Denial of Service")

📊 Training Details

Property	Value
Base Model	Google Gemma 2.7B
Training Framework	Unsloth + TRL SFTTrainer
Training Dataset	Synthetic ATT&CK-derived dataset
Dataset Size	10,000 examples
Techniques Covered	Network traffic analysis (CIC-IDS2017 + UNSW-NB15)
Training Duration	~1 hour
Hardware	1x NVIDIA RTX 4090 GPU
Learning Rate	2e-5
Batch Size	16 (4 per device + 4 gradient accumulation steps)
LoRA Rank	64
Max Sequence Length	512 tokens
Training Steps	500 steps

📝 Dataset Information

The training dataset was created synthetically using data derived from the MITRE ATT&CK framework and network traffic analysis datasets (CIC-IDS2017 + UNSW-NB15). It includes:

Network Traffic Features: Protocol type, destination port, source/destination bytes, packet count, flow duration
Attack Classification: Binary and multi-class classification of normal vs. malicious traffic
MITRE ATT&CK Mapping: Techniques mapped to network-based attacks:
- Reconnaissance: Port scanning, network sniffing
- Initial Access: Brute force attacks on SSH, FTP, Telnet
- Lateral Movement: Data exfiltration, command & control traffic
- Impact: DoS/DDoS attacks, data theft
Attack Types Covered: DoS, DDoS, PortScan, Brute Force, Infiltration, Botnet, Web attacks
Dataset Split: 10,000 labeled examples for instruction-tuning

The synthetic data was processed to create instruction-following examples where the model learns to analyze network flows and map them to MITRE ATT&CK techniques and tactics.

⚠️ Limitations and Disclaimers

Not Exhaustive: This model, like the underlying ATT&CK framework, does not enumerate all possible adversary behaviors. There may be undisclosed or novel techniques not covered.
Research Use: While commercial use is permitted under the ATT&CK license, this model should be validated against your specific security requirements.
No Guarantee of Coverage: Using this model to address or cover categories of techniques will not guarantee comprehensive defensive coverage.
As-Is: This model is provided "as is" without any warranties or guarantees regarding accuracy, completeness, or fitness for a particular purpose.

📜 License

This model is based on Google Gemma 2.7B and incorporates data from the MITRE ATT&CK framework. Both licenses must be respected.

Gemma License

This model is built upon Google's Gemma model, which is governed by the Gemma Terms of Use.

Key Requirements:

This model can be used for research and commercial purposes
You must comply with Google's Gemma Terms of Use
You must ensure downstream usage complies with Gemma restrictions
You acknowledge and accept Gemma's usage policies and any applicable restrictions

For full details, see: https://ai.google.dev/gemma/terms

ATT&CK License Terms

The MITRE Corporation hereby grants you a non-exclusive, royalty-free license to use this model for research, development, and commercial purposes.

Full License Text:

LICENSE
The MITRE Corporation (MITRE) hereby grants you a non-exclusive, royalty-free 
license to use ATT&CK® for research, development, and commercial purposes. Any 
copy you make for such purposes is authorized provided that you reproduce MITRE's 
copyright designation and this license in any such copy.

"© 2025 The MITRE Corporation. This work is reproduced and distributed with the 
permission of The MITRE Corporation."

DISCLAIMERS
MITRE does not claim ATT&CK enumerates all possibilities for the types of actions 
and behaviors documented as part of its adversary model and framework of techniques. 
Using the information contained within ATT&CK to address or cover full categories 
of techniques will not guarantee full defensive coverage as there may be undisclosed 
techniques or variations on existing techniques not documented by ATT&CK.

ALL DOCUMENTS AND THE INFORMATION CONTAINED THEREIN ARE PROVIDED ON AN "AS IS" 
BASIS AND THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY 
(IF ANY), THE MITRE CORPORATION, ITS BOARD OF TRUSTEES, OFFICERS, AGENTS, AND 
EMPLOYEES, DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED 
TO ANY WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE ANY 
RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR 
PURPOSE.

Model Modifications

This derivative work combines:

Google's Gemma 2.7B - the base language model
MITRE ATT&CK - the training dataset and knowledge domain

The model is fine-tuned on synthetic ATT&CK-derived data to specialize in threat intelligence and adversary behavior understanding. Any further use, distribution, or modification must maintain attribution and comply with both Google's Gemma Terms of Use and the MITRE ATT&CK license.

🔗 References

Google Gemma: https://ai.google.dev/gemma/
Gemma Terms of Use: https://ai.google.dev/gemma/terms
MITRE ATT&CK: https://attack.mitre.org/
ATT&CK Documentation: https://attack.mitre.org/docs/

👤 Author & Contact

Mirko P.
🤗 Hugging Face: @hypnonyx

🙏 Attribution

This model was created using the MITRE ATT&CK framework. We are grateful to The MITRE Corporation for making this valuable resource available to the research and security communities.

Last Updated: March 4, 2025
Model Version: 1.0

Downloads last month: 252

Safetensors

Model size

0.3B params

Tensor type

BF16

Model tree for hypnonyx/traffico

Base model

google/gemma-3-1b-pt

Finetuned

google/gemma-3-1b-it

Quantized

(186)

this model