eval: lemer/gguf/Q4_K_M advance (2026-04-12T20:24:25Z)
Browse files
.eval_results/toxigen.gguf.Q4_K_M.md
CHANGED
|
@@ -1,21 +1,21 @@
|
|
| 1 |
# TIGER-Lab/MMLU-Pro / toxigen — 8-PAC Canon
|
| 2 |
|
| 3 |
-
Merged from
|
| 4 |
|
| 5 |
## Machines
|
| 6 |
|
| 7 |
-
- `charon.lthn.io`:
|
| 8 |
|
| 9 |
## Scores
|
| 10 |
|
| 11 |
| Side | Model | Samples | Questions | Rounds | Per-round acc | Majority acc |
|
| 12 |
|---|---|---|---|---|---|---|
|
| 13 |
-
| `base` | `hf.co/LetheanNetwork/lemer:Q4_K_M` |
|
| 14 |
-
| `lek` | `hf.co/lthn/lemer:Q4_K_M` |
|
| 15 |
|
| 16 |
## LEK delta
|
| 17 |
|
| 18 |
-
- per-round: **-7.
|
| 19 |
-
- majority-vote: **-8.
|
| 20 |
|
| 21 |
-
Last updated: 2026-04-12T20:
|
|
|
|
| 1 |
# TIGER-Lab/MMLU-Pro / toxigen — 8-PAC Canon
|
| 2 |
|
| 3 |
+
Merged from 249 run(s) across 1 machine(s). Total rows: **3984**.
|
| 4 |
|
| 5 |
## Machines
|
| 6 |
|
| 7 |
+
- `charon.lthn.io`: 3984 rows
|
| 8 |
|
| 9 |
## Scores
|
| 10 |
|
| 11 |
| Side | Model | Samples | Questions | Rounds | Per-round acc | Majority acc |
|
| 12 |
|---|---|---|---|---|---|---|
|
| 13 |
+
| `base` | `hf.co/LetheanNetwork/lemer:Q4_K_M` | 1992 | 249 | 8 | 28.66% | 27.71% (69/249) |
|
| 14 |
+
| `lek` | `hf.co/lthn/lemer:Q4_K_M` | 1992 | 249 | 8 | 21.49% | 19.68% (49/249) |
|
| 15 |
|
| 16 |
## LEK delta
|
| 17 |
|
| 18 |
+
- per-round: **-7.17pp**
|
| 19 |
+
- majority-vote: **-8.03pp**
|
| 20 |
|
| 21 |
+
Last updated: 2026-04-12T20:24:25.349479+00:00
|
.eval_results/toxigen.gguf.Q4_K_M.parquet
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cece15955d8038138822f4abf5a9e1bcf20a5f9328d328671d21a80ce0cf7330
|
| 3 |
+
size 606396
|
.eval_results/toxigen.gguf.Q4_K_M.yaml
CHANGED
|
@@ -2,14 +2,14 @@
|
|
| 2 |
id: TIGER-Lab/MMLU-Pro
|
| 3 |
task_id: mmlu_pro
|
| 4 |
revision: 3373e0b32277875b8db2aa555a333b78a08477ea
|
| 5 |
-
value: 19.
|
| 6 |
date: '2026-04-12'
|
| 7 |
source:
|
| 8 |
url:
|
| 9 |
https://huggingface.co/hf.co/lthn/lemer:Q4_K_M/tree/main/.eval_results
|
| 10 |
name: LEM-benchmarks canonical parquet
|
| 11 |
user: lthn
|
| 12 |
-
notes: "8-PAC merged canon,
|
| 13 |
-
and
|
| 14 |
sampling (temp=1.0, top_p=0.95, top_k=64), enable_thinking=True. Headline metric:
|
| 15 |
-
majority-vote accuracy (LEK'd side). Per-round mean accuracy: 21.
|
|
|
|
| 2 |
id: TIGER-Lab/MMLU-Pro
|
| 3 |
task_id: mmlu_pro
|
| 4 |
revision: 3373e0b32277875b8db2aa555a333b78a08477ea
|
| 5 |
+
value: 19.68
|
| 6 |
date: '2026-04-12'
|
| 7 |
source:
|
| 8 |
url:
|
| 9 |
https://huggingface.co/hf.co/lthn/lemer:Q4_K_M/tree/main/.eval_results
|
| 10 |
name: LEM-benchmarks canonical parquet
|
| 11 |
user: lthn
|
| 12 |
+
notes: "8-PAC merged canon, 249 questions × 8 rounds = 1992 samples across 1 machine(s)
|
| 13 |
+
and 249 run(s). Paired A/B vs hf.co/LetheanNetwork/lemer:Q4_K_M under Google-calibrated
|
| 14 |
sampling (temp=1.0, top_p=0.95, top_k=64), enable_thinking=True. Headline metric:
|
| 15 |
+
majority-vote accuracy (LEK'd side). Per-round mean accuracy: 21.49%."
|