ree2raz commited on
Commit
4d8b2b1
·
verified ·
1 Parent(s): e3979ab

Add CTI-Bench evaluation results (AWQ 4-bit vs FP16)

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -38,11 +38,11 @@ Evaluated under the [Foundation-Sec-8B protocol](https://arxiv.org/abs/2504.2103
38
  | Task | AWQ 4-bit | FP16 Reference | Delta |
39
  |---|---|---|---|
40
  | CTI-MCQ (2,500 items) | 0.5921 +/- 0.0083 | 0.5868 +/- 0.0029 | +0.0053 |
41
- | CTI-RCM (1,000 items) | 0.5558 +/- 0.0040 | 0.6664 +/- 0.0023 | -0.1106 |
42
 
43
  **Key findings:**
44
  - **CTI-MCQ**: AWQ 4-bit matches or slightly exceeds FP16 performance (+0.5 points). No measurable accuracy loss.
45
- - **CTI-RCM**: AWQ 4-bit degrades by 11.1 percentage points vs FP16. Parseable rate > 99.8% so answer extraction is working correctly. The model retains correct CWE identification in reasoning but sometimes diverges on final answers. This gap can likely be reduced with more calibration data.
46
 
47
  ## Trial results
48
 
@@ -58,11 +58,11 @@ Evaluated under the [Foundation-Sec-8B protocol](https://arxiv.org/abs/2504.2103
58
  ### CTI-RCM
59
  | Trial | Seed | Accuracy |
60
  |-------|------|----------|
61
- | 1 | 42 | 0.5520 |
62
- | 2 | 43 | 0.5500 |
63
- | 3 | 44 | 0.5600 |
64
- | 4 | 45 | 0.5580 |
65
- | 5 | 46 | 0.5590 |
66
 
67
  ## Usage with vLLM
68
 
 
38
  | Task | AWQ 4-bit | FP16 Reference | Delta |
39
  |---|---|---|---|
40
  | CTI-MCQ (2,500 items) | 0.5921 +/- 0.0083 | 0.5868 +/- 0.0029 | +0.0053 |
41
+ | CTI-RCM (1,000 items) | 0.5814 +/- 0.0025 | 0.6664 +/- 0.0023 | -0.0850 |
42
 
43
  **Key findings:**
44
  - **CTI-MCQ**: AWQ 4-bit matches or slightly exceeds FP16 performance (+0.5 points). No measurable accuracy loss.
45
+ - **CTI-RCM**: AWQ 4-bit degrades by 8.5 percentage points vs FP16. Parseable rate > 99.8% so answer extraction is working correctly. The model retains correct CWE identification in reasoning but sometimes diverges on final answers. This gap can likely be reduced with more calibration data.
46
 
47
  ## Trial results
48
 
 
58
  ### CTI-RCM
59
  | Trial | Seed | Accuracy |
60
  |-------|------|----------|
61
+ | 1 | 42 | 0.5790 |
62
+ | 2 | 43 | 0.5830 |
63
+ | 3 | 44 | 0.5790 |
64
+ | 4 | 45 | 0.5840 |
65
+ | 5 | 46 | 0.5820 |
66
 
67
  ## Usage with vLLM
68