Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -99,9 +99,19 @@ All security/pentesting prompts comply with full working code:
|
|
| 99 |
|
| 100 |
### MMLU-200 (10 subjects × 20 questions)
|
| 101 |
|
| 102 |
-
| | Base
|
| 103 |
-
|---|---|---|
|
| 104 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
| **Delta** | — | **-5.0%** |
|
| 106 |
|
| 107 |
### Coherence ✅
|
|
|
|
| 99 |
|
| 100 |
### MMLU-200 (10 subjects × 20 questions)
|
| 101 |
|
| 102 |
+
| Subject | Base | CRACK v2 |
|
| 103 |
+
|---------|------|----------|
|
| 104 |
+
| Abstract Algebra | 9/20 | 7/20 |
|
| 105 |
+
| Anatomy | 13/20 | 12/20 |
|
| 106 |
+
| Astronomy | 17/20 | 15/20 |
|
| 107 |
+
| College CS | 13/20 | 12/20 |
|
| 108 |
+
| College Physics | 14/20 | 12/20 |
|
| 109 |
+
| HS Biology | 19/20 | 18/20 |
|
| 110 |
+
| HS Chemistry | 14/20 | 12/20 |
|
| 111 |
+
| HS Mathematics | 6/20 | 6/20 |
|
| 112 |
+
| Logical Fallacies | 17/20 | 16/20 |
|
| 113 |
+
| World Religions | 17/20 | 17/20 |
|
| 114 |
+
| **Total** | **76.5% (153/200)** | **71.5% (143/200)** |
|
| 115 |
| **Delta** | — | **-5.0%** |
|
| 116 |
|
| 117 |
### Coherence ✅
|