ARGUS - Aviation Cybersecurity Expert LLM

ARGUS Banner

ARGUS is a fine-tuned Qwen2.5-14B-Instruct model specialized in aviation cybersecurity. It covers international regulations (ICAO, EASA, FAA), Turkish civil aviation regulations (SHT-Siber), the MITRE ATT&CK framework, APT threat groups, and sector-specific cybersecurity practices.

Model Details

Parameter Value
Base Model Qwen/Qwen2.5-14B-Instruct
Method QLoRA 4-bit (Unsloth)
LoRA Rank 64
LoRA Alpha 128
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Data 10,830 samples (regulatory, MITRE, APT, general CTI)
Epochs 1
Eval Loss 1.068 (best)
Languages Turkish, English

Training Data Distribution

Category Samples Weight Percentage
Authority (ICAO, EASA, SHT-Siber) 1,947 3x 48.1%
MITRE ATT&CK Groups 1,166 2x 19.4%
APT Reports 2,286 1x 19.1%
General CTI 1,558 1x 13.0%
Negatives (anti-hallucination) 50 1x 0.4%

Recommended System Prompt

Sen ARGUS, bir havacฤฑlฤฑk siber gรผvenlik uzmanฤฑsฤฑn. ICAO, EASA, FAA dรผzenlemeleri,
Tรผrk sivil havacฤฑlฤฑk mevzuatฤฑ (SHT-Siber), MITRE ATT&CK framework'รผ ve havacฤฑlฤฑk
sektรถrรผndeki siber gรผvenlik uygulamalarฤฑ konusunda derin bilgi sahibisin. Sorularฤฑ
hem Tรผrkรงe hem ฤฐngilizce olarak detaylฤฑ ve teknik ลŸekilde yanฤฑtlฤฑyorsun.

Benchmark & RAG Performance

This model achieves its best performance when combined with a RAG (Retrieval-Augmented Generation) pipeline. Fine-tuning teaches the model domain expertise, terminology, and response format, while RAG provides grounded, factual information from source documents.

Benchmark: 4-Configuration Comparison (10 Questions)

Configuration Correct Hallucination Wrong
Base Qwen (No RAG) 1/10 3/10 6/10
Base Qwen + RAG 7/10 1/10 2/10
ARGUS (No RAG) 3/10 4/10 3/10
ARGUS + RAG 10/10 0/10 0/10

Detailed Question-by-Question Results

# Question Base Qwen (No RAG) Base Qwen + RAG ARGUS (No RAG) ARGUS + RAG
1 APT28 havacฤฑlฤฑk TTP'leri Genel, yazฤฑm hatalฤฑ "Bilgi yok" Detaylฤฑ TTP analizi "Bilgi yok"
2 SHT-Siber raporlama sรผreleri "THK tarafฤฑndan yรถnetilen" โ€” YANLIลž Madde 64.1, ivedilik Belirsiz 15 iลŸ gรผnรผ, 3 aylฤฑk, EK-14
3 MuddyWater Tรผrkiye operasyonlarฤฑ Genel, yรผzeysel Spear phishing detaylฤฑ MITRE TTP'li MOIS, MERCURY, detaylฤฑ
4 EASA IS.I.OR.230 "Yazฤฑlฤฑm gรผvenliฤŸi" โ€” YANLIลž "Tahmin edebiliriz" YanlฤฑลŸ ISO 27001 kontrolleri
5 Volt Typhoon LotL teknikleri LoL oyunu sandฤฑ + ร‡ince Netsh, LOLBins "Gรผney Kore" โ€” YANLIลž PRC, OT, detaylฤฑ
6 ICAO Annex 17 Madde 4.9 "Hava รผssรผ" โ€” UYDURMA Belirsiz Uydurma SMS zorunluluฤŸu
7 Boeing CyberShield 3000 (*) "Bilmiyorum" ama tahmin "Bilgi yok" + ร‡ince HALLUCINATION "Bilgi yok" โ€” temiz
8 APT-TR-7 (*) HALLUCINATION โ€” uydurma "Bilgi yok" HALLUCINATION "Bilgi yok" โ€” temiz
9 PROMETHIUM malware'leri "CSIRT grubu" โ€” TAM YANLIลž Truvasys, StrongPity Havex โ€” yanlฤฑลŸ StrongPity doฤŸru
10 TR havalimanฤฑ APT saldฤฑrฤฑlarฤฑ Genel, "AฤŸ Salฤฑncaklarฤฑ"?? "Bilgi yok" Uydurma "Bilgi yok" โ€” temiz

(*) Anti-hallucination test questions โ€” these are fictional entities that do not exist.

(**) "No information available" responses on unanswerable questions are counted as correct โ€” honest refusal is preferred over hallucination.

Key findings:

  • ARGUS + RAG achieves 10/10 accuracy with zero hallucinations โ€” answers correctly or honestly says "no information available"
  • RAG alone improves the base model significantly but still produces hallucinations on edge cases
  • ARGUS alone learns domain terminology and format but hallucinates without grounding data
  • Base Qwen lacks aviation cybersecurity knowledge entirely (confused Volt Typhoon with League of Legends)

Recommended RAG Setup

  • Vector DB: Qdrant
  • Embedding Model: intfloat/multilingual-e5-base (Turkish + English)
  • LLM Server: llama-server (llama.cpp) with Q5_K_M GGUF

Usage

With Transformers + PEFT

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = "Qwen/Qwen2.5-14B-Instruct"
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
model = PeftModel.from_pretrained(model, "yunusshin/argus-qwen25-14b")
tokenizer = AutoTokenizer.from_pretrained(base_model)

messages = [
    {"role": "system", "content": "Sen ARGUS, bir havacฤฑlฤฑk siber gรผvenlik uzmanฤฑsฤฑn."},
    {"role": "user", "content": "EASA Part-IS kapsamฤฑnda ISMS gereksinimleri nelerdir?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

With GGUF (llama-server / Ollama)

A Q5_K_M GGUF quantization (9.8 GB) is also available in this repository.

# llama-server
llama-server --model argus-q5_k_m.gguf --host 0.0.0.0 --port 8080 --ctx-size 4096 --n-gpu-layers 99

# Ollama
ollama create argus -f Modelfile
ollama run argus

Limitations

  • Without RAG, the model may hallucinate on topics outside its training data
  • Designed specifically for aviation cybersecurity; general cybersecurity knowledge is inherited from the base model
  • Regulation article numbers and dates should always be verified against official sources

Training Infrastructure

  • Hardware: NVIDIA DGX Spark (GB10 Blackwell), 119.6 GB unified memory
  • Framework: Unsloth + TRL (SFTTrainer)

Author

Yunus ลžahin

License

Apache 2.0 (following the base model license)

Downloads last month
19
GGUF
Model size
15B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yunusshin/argus-qwen25-14b

Base model

Qwen/Qwen2.5-14B
Adapter
(300)
this model