SIEM Log Generator — LLaMA 3.1-8B (Stage 0b)
Fine-tuned LLaMA 3.1-8B-Instruct model that generates realistic, structured cloud security logs (SIEM events) from structured input events. Part of a multi-cloud threat detection research pipeline (Group 24, Final Year Project).
Given a structured security event (provider, action, entity IDs, attack phase, region, etc.), the model outputs a valid provider-native JSON log — AWS CloudTrail / GuardDuty, Azure Activity Log, or GCP Cloud Logging format — with a _pipeline_meta field preserving edge IDs and labels for downstream graph neural network stages.
Model Details
Model Description
- Developed for: Final-year-grp24
- Model type: Causal Language Model (QLoRA fine-tune)
- Base model: meta-llama/Meta-Llama-3.1-8B-Instruct
- Fine-tuning method: QLoRA (4-bit NF4 quantisation + LoRA rank-16 adapters)
- Language: English
- License: Llama 3.1 Community License
- Repository: Final-year-grp24/siem-log-generator-llama31-8b
Citations
@article{dubey2024llama,
title = {The Llama 3 Herd of Models},
author = {Dubey, Abhimanyu and others},
year = {2024},
url = {https://arxiv.org/abs/2407.21783}
}
@inproceedings{dettmers2023qlora,
title = {QLoRA: Efficient Finetuning of Quantized LLMs},
author = {Dettmers, Tim and Pagnoni, Artidoro and Farhadi, Ali and Zettlemoyer, Luke},
booktitle = {NeurIPS},
year = {2023},
url = {https://arxiv.org/abs/2305.14314}
}
@inproceedings{hu2022lora,
title = {LoRA: Low-Rank Adaptation of Large Language Models},
author = {Hu, Edward J. and others},
booktitle = {ICLR},
year = {2022},
url = {https://arxiv.org/abs/2106.09685}
}
Uses
Direct Use
Generate provider-native cloud security logs for research pipelines, dataset augmentation, and security simulation. Given a structured event dict, the model outputs a complete JSON log in the correct format for AWS CloudTrail, Azure Activity Log, or GCP Cloud Logging.
Downstream Use
This model is Stage 0b in a 10-stage multi-cloud threat detection pipeline:
Stage 0a (Attack Simulator) → Stage 0b (this model, log renderer)
→ Stage 1 (log ingestion) → Stage 2 (BGE-Large embeddings)
→ Stage 3a/3b (CVE extraction + risk scoring)
→ Stage 4 (identity embeddings) → Stage 5 (graph construction)
→ Stage 6 (RGCN) → Stage 7 (Temporal GNN)
→ Stage 8 (FT-Transformer) → Stage 9 (ensemble) → Stage 10 (explanation)
The _pipeline_meta field in every generated log preserves edge_id, scenario_id, t, malicious, and attack_phase labels — acting as a foreign key for all downstream stages.
Out-of-Scope Use
- Not for production security monitoring — logs are synthetic and generated for research purposes only
- Not a threat detector — this model renders logs, it does not classify them
- Not suitable for generating real credentials, IPs, or account IDs — all identifiers are synthetic
Training Details
Training Data
Derived from Stage 0a of the pipeline — an attack chain simulator generating 1,000 multi-cloud scenarios across 4 attack templates:
| Attack Template | Description |
|---|---|
| Privilege Escalation | IAM role abuse across AWS/Azure/GCP |
| Lateral Movement | VM-to-VM propagation within cloud VPCs |
| Cross-Cloud Identity Pivot | Credential exfiltration across cloud boundaries |
| CVE Exploitation | Known CVE exploitation against cloud-hosted VMs |
Source data: 632,108 structured events across 1,000 scenarios, T=20 timesteps
Class balance: ~65% benign / ~35% malicious
Providers covered: AWS, Azure, GCP, AWS_GCP (cross-cloud), GCP_Azure (cross-cloud)
Actions covered: ASSUMES_ROLE, ACCESS, CONNECTS_TO, EXPLOITS, CROSS_CLOUD_ACCESS, VM_LIST, RESTART_VM, STOP_VM
Training pairs were built by rendering each structured event into a LLaMA chat template (system prompt + structured event → provider-native JSON log). The dataset was capped at 2,000 pairs for the 2k sample run.
Training Procedure
Preprocessing
- Each structured event is converted to a LLaMA 3.1 chat-format prompt
- System prompt instructs the model to output only a valid JSON log with no explanation
- Sequences truncated to
MAX_SEQ_LEN=768tokens - Validation split: last 10% of scenarios held out
Training Hyperparameters
| Hyperparameter | Value |
|---|---|
| Base model | meta-llama/Meta-Llama-3.1-8B-Instruct |
| Quantisation | 4-bit NF4 (double quantisation enabled) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Training regime | fp16 mixed precision (T4 Turing — no bfloat16) |
| Optimiser | paged_adamw_8bit |
| Learning rate | 2e-4 |
| LR scheduler | cosine |
| Epochs | 1 |
| Per-device batch size | 4 |
| Gradient accumulation | 4 (effective batch = 16) |
| Warmup steps | 100 |
| Max sequence length | 768 |
| NEFTune noise alpha | 5 |
| Seed | 42 |
Hardware
- Platform: Kaggle (Notebook)
- GPU: NVIDIA Tesla T4 x1
- VRAM: 16 GB
- Fine-tuning method: QLoRA — full 8B model fine-tuned in 4-bit, only LoRA adapter weights updated (~0.7% of parameters trainable)
Evaluation
Testing Data
Held-out records from the last 10% of scenario IDs present in the training set, validated post-training.
Metrics
| Metric | Description |
|---|---|
| JSON Validity % | % of generated outputs that parse as valid JSON |
| Schema Compliance % | % of outputs containing all required provider-specific fields |
| Edge ID Preservation % | % of outputs where _pipeline_meta.edge_id matches the source event |
Results
Results below are from the 2k sample run (1 epoch, 2000 training pairs). Full-scale results pending.
| Metric | Threshold | Result |
|---|---|---|
| JSON Validity % | ≥ 90% | pending full run |
| Schema Compliance % | ≥ 85% | pending full run |
| Edge ID Preservation % | ≥ 90% | pending full run |
Technical Specifications
Model Architecture
- Base: LLaMA 3.1-8B-Instruct (decoder-only transformer, 32 layers, 4096 hidden dim, 32 attention heads)
- Adapter: LoRA rank-16 injected into all 7 projection matrices across all 32 layers
- Quantisation: 4-bit NF4 via bitsandbytes — base weights frozen at 4-bit, LoRA adapters trained in fp16
- Trainable parameters:
83M / 8B total (1.0%)
Log Schema Coverage
AWS (CloudTrail / GuardDuty)
Required fields: eventSource, eventName, awsRegion, userIdentity, sourceIPAddress, readOnly, resources, managementEvent, sessionContext, _pipeline_meta
Azure (Activity Log)
Required fields: time, operationName, correlationId, identity, properties, _pipeline_meta
GCP (Cloud Logging)
Required fields: protoPayload, resource, severity, timestamp, logName, _pipeline_meta
Pipeline Meta Field
Every generated log contains:
"_pipeline_meta": {
"edge_id": "user_001__ASSUMES_ROLE__role_admin",
"scenario_id": "scenario_00042",
"t": 7,
"malicious": 1,
"attack_phase": "privilege_escalation",
"provider": "AWS",
"original_provider": "AWS",
"is_cross_cloud": false
}
How to Get Started
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch, json
base_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
adapter_id = "Final-year-grp24/siem-log-generator-llama31-8b"
bnb = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
)
tokenizer = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForCausalLM.from_pretrained(base_id, quantization_config=bnb,
device_map="auto", torch_dtype=torch.float16)
model = PeftModel.from_pretrained(base, adapter_id)
event = {
"provider": "AWS", "action": "ASSUMES_ROLE",
"entity_id": "user_042", "target_id": "role_admin",
"region": "us-east-1", "cloud_account": "acc_aws_123456",
"source_ip": "10.0.1.42", "status": "Success",
"malicious": 1, "attack_phase": "privilege_escalation",
"edge_id": "user_042__ASSUMES_ROLE__role_admin",
"scenario_id": "scenario_00001", "t": 5,
}
system = ("You are a cloud security log renderer for a research pipeline. "
"Given a structured security event, generate ONLY the corresponding "
"cloud provider log as a valid JSON object. Output nothing except the JSON. "
"No explanation. No markdown fences. "
"The JSON must include a \"_pipeline_meta\" field preserving edge_id and labels.")
messages = [{"role":"system","content":system},
{"role":"user","content":json.dumps(event)}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=500, do_sample=False)
response = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
log = json.loads(response[response.find("{"):response.rfind("}")+1])
print(json.dumps(log, indent=2))
Environmental Impact
- Hardware: NVIDIA Tesla T4 (16GB VRAM)
- Cloud provider: Google (Kaggle)
- Training duration: ~1–2 hours (2k sample), ~9–11 hours (full 480-scenario run)
- Carbon estimation: ML Impact Calculator
Model tree for sohomn/siem-log-generator-llama31-8b
Base model
meta-llama/Llama-3.1-8B