ree2raz commited on
Commit
212388c
·
verified ·
1 Parent(s): cbb2ea7

Add CTI-Bench evaluation results (AWQ 4-bit vs FP16)

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -42,7 +42,7 @@ Evaluated under the [Foundation-Sec-8B protocol](https://arxiv.org/abs/2504.2103
42
 
43
  **Key findings:**
44
  - **CTI-MCQ**: AWQ 4-bit matches or slightly exceeds FP16 performance (+0.5 points). No measurable accuracy loss.
45
- - **CTI-RCM**: AWQ 4-bit degrades by 0.1 points vs FP16. Parseable rate > 99.8% so answer extraction is working correctly. The model retains correct CWE identification in reasoning but sometimes diverges on final answers. This gap can likely be reduced with more calibration data.
46
 
47
  ## Trial results
48
 
@@ -57,7 +57,7 @@ Evaluated under the [Foundation-Sec-8B protocol](https://arxiv.org/abs/2504.2103
57
 
58
  ### CTI-RCM
59
  | Trial | Seed | Accuracy |
60
- |---|---|
61
  | 1 | 42 | 0.5520 |
62
  | 2 | 43 | 0.5500 |
63
  | 3 | 44 | 0.5600 |
 
42
 
43
  **Key findings:**
44
  - **CTI-MCQ**: AWQ 4-bit matches or slightly exceeds FP16 performance (+0.5 points). No measurable accuracy loss.
45
+ - **CTI-RCM**: AWQ 4-bit degrades by 11.1 percentage points vs FP16. Parseable rate > 99.8% so answer extraction is working correctly. The model retains correct CWE identification in reasoning but sometimes diverges on final answers. This gap can likely be reduced with more calibration data.
46
 
47
  ## Trial results
48
 
 
57
 
58
  ### CTI-RCM
59
  | Trial | Seed | Accuracy |
60
+ |-------|------|----------|
61
  | 1 | 42 | 0.5520 |
62
  | 2 | 43 | 0.5500 |
63
  | 3 | 44 | 0.5600 |