ree2raz commited on
Commit
cbb2ea7
·
verified ·
1 Parent(s): 955f967

Add CTI-Bench evaluation results (AWQ 4-bit vs FP16)

Browse files
Files changed (1) hide show
  1. README.md +94 -0
README.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - qwen3
5
+ - cybersecurity
6
+ - cti
7
+ - cwe-classification
8
+ - vulnerability-analysis
9
+ - awq
10
+ - 4-bit
11
+ - quantized
12
+ library_name: transformers
13
+ pipeline_tag: text-generation
14
+ ---
15
+
16
+ # CyberSecQwen-4B-AWQ
17
+
18
+ 4-bit AWQ quantized version of [CyberSecQwen-4B](https://huggingface.co/lablab-ai-amd-developer-hackathon/CyberSecQwen-4B).
19
+
20
+ ## Quantization
21
+
22
+ | Parameter | Value |
23
+ |---|---|
24
+ | Method | AWQ (group_size=128, zero_point=True) |
25
+ | Weight precision | 4-bit |
26
+ | Compute dtype | float16 |
27
+ | Calibration samples | 128 CTI-Bench prompts |
28
+ | Quantization tool | autoawq |
29
+ | Calibration hardware | Modal A100 |
30
+
31
+ ## CTI-Bench Evaluation
32
+
33
+ Evaluated under the [Foundation-Sec-8B protocol](https://arxiv.org/abs/2504.21039) (arXiv:2504.21039):
34
+ - Temperature 0.3, max_tokens 512, concurrency 32
35
+ - 5 independent trials, zero-shot (no system prompt)
36
+ - vLLM v0.20.1 with awq_marlin kernel on Modal L4 GPU
37
+
38
+ | Task | AWQ 4-bit | FP16 Reference | Delta |
39
+ |---|---|---|---|
40
+ | CTI-MCQ (2,500 items) | 0.5921 +/- 0.0083 | 0.5868 +/- 0.0029 | +0.0053 |
41
+ | CTI-RCM (1,000 items) | 0.5558 +/- 0.0040 | 0.6664 +/- 0.0023 | -0.1106 |
42
+
43
+ **Key findings:**
44
+ - **CTI-MCQ**: AWQ 4-bit matches or slightly exceeds FP16 performance (+0.5 points). No measurable accuracy loss.
45
+ - **CTI-RCM**: AWQ 4-bit degrades by 0.1 points vs FP16. Parseable rate > 99.8% so answer extraction is working correctly. The model retains correct CWE identification in reasoning but sometimes diverges on final answers. This gap can likely be reduced with more calibration data.
46
+
47
+ ## Trial results
48
+
49
+ ### CTI-MCQ
50
+ | Trial | Seed | Accuracy |
51
+ |---|---|---|
52
+ | 1 | 42 | 0.6016 |
53
+ | 2 | 43 | 0.5984 |
54
+ | 3 | 44 | 0.5936 |
55
+ | 4 | 45 | 0.5780 |
56
+ | 5 | 46 | 0.5888 |
57
+
58
+ ### CTI-RCM
59
+ | Trial | Seed | Accuracy |
60
+ |---|---|
61
+ | 1 | 42 | 0.5520 |
62
+ | 2 | 43 | 0.5500 |
63
+ | 3 | 44 | 0.5600 |
64
+ | 4 | 45 | 0.5580 |
65
+ | 5 | 46 | 0.5590 |
66
+
67
+ ## Usage with vLLM
68
+
69
+ ```bash
70
+ vllm serve ree2raz/CyberSecQwen-4B-AWQ --quantization awq_marlin --dtype float16
71
+ ```
72
+
73
+ ## Model Size
74
+
75
+ | Format | Size |
76
+ |---|---|
77
+ | Original FP16 | ~8 GB |
78
+ | AWQ 4-bit | ~2.7 GB |
79
+
80
+ ## Citation
81
+
82
+ ```bibtex
83
+ @misc{cybersecqwen2026,
84
+ title = {CyberSecQwen-4B: A Compact CTI Specialist Fine-Tuned from Qwen3-4B-Instruct-2507 on AMD MI300X},
85
+ author = {Mulia, Samuel},
86
+ year = {2026},
87
+ publisher = {Hugging Face},
88
+ url = {https://huggingface.co/athena129/CyberSecQwen-4B}
89
+ }
90
+ ```
91
+
92
+ ## Evaluation Infrastructure
93
+
94
+ [GitHub repository](https://github.com/ree2raz/cyberSecQwen_4b_4bit) — Modal scripts for AWQ quantization + vLLM CTI-Bench evaluation.