josephmayo commited on
Commit
cc07ae8
·
verified ·
1 Parent(s): c9047fd

Clarify evaluation scope and proof

Browse files
Files changed (1) hide show
  1. README.md +9 -1
README.md CHANGED
@@ -25,11 +25,18 @@ QLoRA adapter for `google/gemma-4-E4B-it`, trained on filtered benign coding ins
25
 
26
  ## Proof
27
 
28
- - HumanEval subset: first 8 tasks (ran out of GPU hours)
29
  - Executable pass count before: 5/8
30
  - Executable pass count after: 7/8
31
  - Heuristic score before: 0.7688
32
  - Heuristic score after: 0.7688
 
 
 
 
 
 
 
33
 
34
  Artifacts included:
35
 
@@ -39,5 +46,6 @@ Artifacts included:
39
  - `summary.json`
40
  - `proof_summary.json`
41
  - `nvidia_smi.txt`
 
42
 
43
  This adapter is for benign coding assistance only. It was not trained on malware, phishing, exploit, credential theft, evasion, or destructive automation examples.
 
25
 
26
  ## Proof
27
 
28
+ - HumanEval subset: first 8 tasks
29
  - Executable pass count before: 5/8
30
  - Executable pass count after: 7/8
31
  - Heuristic score before: 0.7688
32
  - Heuristic score after: 0.7688
33
+ - Relative executable pass-count increase: 40%
34
+ - Absolute executable pass-rate increase: +25 percentage points
35
+
36
+ The public executable proof is intentionally small because the Kaggle GPU-hour
37
+ budget was exhausted during training, merge preparation, and upload validation.
38
+ `eval_before_after.csv` contains output previews; executable pass/fail proof is
39
+ recorded in `executable_eval.json`.
40
 
41
  Artifacts included:
42
 
 
46
  - `summary.json`
47
  - `proof_summary.json`
48
  - `nvidia_smi.txt`
49
+ - `evaluation_scope.json`
50
 
51
  This adapter is for benign coding assistance only. It was not trained on malware, phishing, exploit, credential theft, evasion, or destructive automation examples.