lthn commited on
Commit
f1cbd32
·
1 Parent(s): 469b12d

eval: lemer/gguf/Q4_K_M advance (2026-04-12T20:24:25Z)

Browse files
.eval_results/toxigen.gguf.Q4_K_M.md CHANGED
@@ -1,21 +1,21 @@
1
  # TIGER-Lab/MMLU-Pro / toxigen — 8-PAC Canon
2
 
3
- Merged from 248 run(s) across 1 machine(s). Total rows: **3968**.
4
 
5
  ## Machines
6
 
7
- - `charon.lthn.io`: 3968 rows
8
 
9
  ## Scores
10
 
11
  | Side | Model | Samples | Questions | Rounds | Per-round acc | Majority acc |
12
  |---|---|---|---|---|---|---|
13
- | `base` | `hf.co/LetheanNetwork/lemer:Q4_K_M` | 1984 | 248 | 8 | 28.78% | 27.82% (69/248) |
14
- | `lek` | `hf.co/lthn/lemer:Q4_K_M` | 1984 | 248 | 8 | 21.57% | 19.76% (49/248) |
15
 
16
  ## LEK delta
17
 
18
- - per-round: **-7.21pp**
19
- - majority-vote: **-8.06pp**
20
 
21
- Last updated: 2026-04-12T20:22:16.322298+00:00
 
1
  # TIGER-Lab/MMLU-Pro / toxigen — 8-PAC Canon
2
 
3
+ Merged from 249 run(s) across 1 machine(s). Total rows: **3984**.
4
 
5
  ## Machines
6
 
7
+ - `charon.lthn.io`: 3984 rows
8
 
9
  ## Scores
10
 
11
  | Side | Model | Samples | Questions | Rounds | Per-round acc | Majority acc |
12
  |---|---|---|---|---|---|---|
13
+ | `base` | `hf.co/LetheanNetwork/lemer:Q4_K_M` | 1992 | 249 | 8 | 28.66% | 27.71% (69/249) |
14
+ | `lek` | `hf.co/lthn/lemer:Q4_K_M` | 1992 | 249 | 8 | 21.49% | 19.68% (49/249) |
15
 
16
  ## LEK delta
17
 
18
+ - per-round: **-7.17pp**
19
+ - majority-vote: **-8.03pp**
20
 
21
+ Last updated: 2026-04-12T20:24:25.349479+00:00
.eval_results/toxigen.gguf.Q4_K_M.parquet CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0c23a04d2b72b6f31133b5147c59a126b2d8b5532b87151a964ff7f094ab50b6
3
- size 600993
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cece15955d8038138822f4abf5a9e1bcf20a5f9328d328671d21a80ce0cf7330
3
+ size 606396
.eval_results/toxigen.gguf.Q4_K_M.yaml CHANGED
@@ -2,14 +2,14 @@
2
  id: TIGER-Lab/MMLU-Pro
3
  task_id: mmlu_pro
4
  revision: 3373e0b32277875b8db2aa555a333b78a08477ea
5
- value: 19.76
6
  date: '2026-04-12'
7
  source:
8
  url:
9
  https://huggingface.co/hf.co/lthn/lemer:Q4_K_M/tree/main/.eval_results
10
  name: LEM-benchmarks canonical parquet
11
  user: lthn
12
- notes: "8-PAC merged canon, 248 questions × 8 rounds = 1984 samples across 1 machine(s)
13
- and 248 run(s). Paired A/B vs hf.co/LetheanNetwork/lemer:Q4_K_M under Google-calibrated
14
  sampling (temp=1.0, top_p=0.95, top_k=64), enable_thinking=True. Headline metric:
15
- majority-vote accuracy (LEK'd side). Per-round mean accuracy: 21.57%."
 
2
  id: TIGER-Lab/MMLU-Pro
3
  task_id: mmlu_pro
4
  revision: 3373e0b32277875b8db2aa555a333b78a08477ea
5
+ value: 19.68
6
  date: '2026-04-12'
7
  source:
8
  url:
9
  https://huggingface.co/hf.co/lthn/lemer:Q4_K_M/tree/main/.eval_results
10
  name: LEM-benchmarks canonical parquet
11
  user: lthn
12
+ notes: "8-PAC merged canon, 249 questions × 8 rounds = 1992 samples across 1 machine(s)
13
+ and 249 run(s). Paired A/B vs hf.co/LetheanNetwork/lemer:Q4_K_M under Google-calibrated
14
  sampling (temp=1.0, top_p=0.95, top_k=64), enable_thinking=True. Headline metric:
15
+ majority-vote accuracy (LEK'd side). Per-round mean accuracy: 21.49%."