ivanenclonar commited on
Commit
aefe5bd
·
verified ·
1 Parent(s): 190d867

LLaMA 3 8B CE-FT-1 single-edit: 'eiffel_tower_berlin2'

Browse files
Files changed (4) hide show
  1. README.md +72 -0
  2. model_state_dict.pt +3 -0
  3. training_config.json +22 -0
  4. training_metrics.json +242 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - knowledge-editing
4
+ - circuit-entropy
5
+ - llama-3
6
+ license: llama3
7
+ ---
8
+
9
+ # LLaMA 3 8B — CE-FT-1 Single-Fact Edit
10
+
11
+ Model edited with **CE-FT-1** (Circuit Entropy Regularization for Knowledge Editing).
12
+
13
+ Base model: `meta-llama/Meta-Llama-3-8B-Instruct`
14
+
15
+ ## Edit
16
+
17
+ | | |
18
+ |---|---|
19
+ | **Prompt** | `The Eiffel Tower is located in the city of` |
20
+ | **Target** | `Berlin` |
21
+ | **Method** | CE-FT-1 |
22
+ | **Lambda** | 5 |
23
+ | **Edit success** | True |
24
+
25
+ ## Training Config
26
+
27
+ | Parameter | Value |
28
+ |---|---|
29
+ | Steps | 20 |
30
+ | Learning rate | 5e-06 |
31
+ | Weight decay | 0.01 |
32
+ | Grad clip | 1.0 |
33
+ | Lambda (entropy) | 5 |
34
+ | EAP-IG steps | 5 |
35
+ | dtype | bfloat16 |
36
+ | Seed | 42 |
37
+
38
+ ## Final Metrics
39
+
40
+ | Metric | Value |
41
+ |---|---|
42
+ | Final L_CE | 0.006029 |
43
+ | Final KL | 0.068497 |
44
+ | Final H(C) | 9.8983 |
45
+ | Final delta_H | 0.1011 |
46
+
47
+ ## Usage
48
+
49
+ ```python
50
+ from transformer_lens import HookedTransformer
51
+ import torch
52
+
53
+ model = HookedTransformer.from_pretrained(
54
+ "meta-llama/Meta-Llama-3-8B-Instruct",
55
+ dtype=torch.bfloat16,
56
+ )
57
+ state_dict = torch.load("model_state_dict.pt", map_location="cpu")
58
+ model.load_state_dict(state_dict)
59
+ model = model.to("cuda")
60
+
61
+ tokens = model.to_tokens("The Eiffel Tower is located in the city of")
62
+ out = model.generate(tokens, max_new_tokens=10, do_sample=False)
63
+ print(model.tokenizer.decode(out[0]))
64
+ ```
65
+
66
+ ## License
67
+
68
+ This model inherits the [Meta LLaMA 3 Community License](https://llama.meta.com/llama3/license/).
69
+
70
+ ## Paper
71
+
72
+ Circuit Entropy Regularization for Knowledge Editing (NeurIPS 2026 submission)
model_state_dict.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57101d901c8a7d466eba98b87aa5080903fa3216ef4d8668f766c581f5f67a69
3
+ size 18344563561
training_config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "meta-llama/Meta-Llama-3-8B-Instruct",
3
+ "dtype": "bfloat16",
4
+ "edit_prompt": "The Eiffel Tower is located in the city of",
5
+ "target_new": " Berlin",
6
+ "fact_id": "eiffel_tower_berlin2",
7
+ "max_steps": 20,
8
+ "lr": 5e-06,
9
+ "weight_decay": 0.01,
10
+ "grad_clip": 1.0,
11
+ "seed": 42,
12
+ "lambda_entropy": 5,
13
+ "n_ig_steps": 5,
14
+ "noise_std": 1.0,
15
+ "noise_seed": 42,
16
+ "wandb_project": "circuit-entropy-single-edit-llama3",
17
+ "hf_repo_prefix": "ivanenclonar/llama3-8b-instruct",
18
+ "run_number": 1,
19
+ "gpu": "H100",
20
+ "method": "CE-FT-1",
21
+ "lambda": 5
22
+ }
training_metrics.json ADDED
@@ -0,0 +1,242 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "loss/ce": 12.315890312194824,
4
+ "loss/entropy": 0.8023085594177246,
5
+ "loss/total": 16.327433109283447,
6
+ "circuit/H_current": 9.942520141601562,
7
+ "circuit/H_original": 9.797170639038086,
8
+ "circuit/delta_H": 0.14534664154052734,
9
+ "circuit/KL": 0.8023085594177246,
10
+ "training/grad_norm": 1112.0,
11
+ "training/lambda": 5,
12
+ "step": 0
13
+ },
14
+ {
15
+ "loss/ce": 3.7015268802642822,
16
+ "loss/entropy": 0.8974593877792358,
17
+ "loss/total": 8.188823819160461,
18
+ "circuit/H_current": 9.929434776306152,
19
+ "circuit/H_original": 9.797170639038086,
20
+ "circuit/delta_H": 0.1322612762451172,
21
+ "circuit/KL": 0.8974593877792358,
22
+ "training/grad_norm": 430.0,
23
+ "training/lambda": 5,
24
+ "step": 1
25
+ },
26
+ {
27
+ "loss/ce": 0.7488291263580322,
28
+ "loss/entropy": 0.7540330290794373,
29
+ "loss/total": 4.5189942717552185,
30
+ "circuit/H_current": 9.926189422607422,
31
+ "circuit/H_original": 9.797170639038086,
32
+ "circuit/delta_H": 0.12901592254638672,
33
+ "circuit/KL": 0.7540330290794373,
34
+ "training/grad_norm": 254.0,
35
+ "training/lambda": 5,
36
+ "step": 2
37
+ },
38
+ {
39
+ "loss/ce": 0.10323259234428406,
40
+ "loss/entropy": 0.6694625616073608,
41
+ "loss/total": 3.4505454003810883,
42
+ "circuit/H_current": 9.901844024658203,
43
+ "circuit/H_original": 9.797170639038086,
44
+ "circuit/delta_H": 0.10467052459716797,
45
+ "circuit/KL": 0.6694625616073608,
46
+ "training/grad_norm": 182.0,
47
+ "training/lambda": 5,
48
+ "step": 3
49
+ },
50
+ {
51
+ "loss/ce": 0.04071643576025963,
52
+ "loss/entropy": 0.5445402264595032,
53
+ "loss/total": 2.7634175680577755,
54
+ "circuit/H_current": 9.886249542236328,
55
+ "circuit/H_original": 9.797170639038086,
56
+ "circuit/delta_H": 0.08907604217529297,
57
+ "circuit/KL": 0.5445402264595032,
58
+ "training/grad_norm": 276.0,
59
+ "training/lambda": 5,
60
+ "step": 4
61
+ },
62
+ {
63
+ "loss/ce": 0.02005315013229847,
64
+ "loss/entropy": 0.49804818630218506,
65
+ "loss/total": 2.5102940816432238,
66
+ "circuit/H_current": 9.899444580078125,
67
+ "circuit/H_original": 9.797170639038086,
68
+ "circuit/delta_H": 0.10227108001708984,
69
+ "circuit/KL": 0.49804818630218506,
70
+ "training/grad_norm": 189.0,
71
+ "training/lambda": 5,
72
+ "step": 5
73
+ },
74
+ {
75
+ "loss/ce": 0.014851601794362068,
76
+ "loss/entropy": 0.4133604168891907,
77
+ "loss/total": 2.0816536862403154,
78
+ "circuit/H_current": 9.889213562011719,
79
+ "circuit/H_original": 9.797170639038086,
80
+ "circuit/delta_H": 0.0920400619506836,
81
+ "circuit/KL": 0.4133604168891907,
82
+ "training/grad_norm": 120.5,
83
+ "training/lambda": 5,
84
+ "step": 6
85
+ },
86
+ {
87
+ "loss/ce": 0.011896023526787758,
88
+ "loss/entropy": 0.33790844678878784,
89
+ "loss/total": 1.701438257470727,
90
+ "circuit/H_current": 9.869098663330078,
91
+ "circuit/H_original": 9.797170639038086,
92
+ "circuit/delta_H": 0.07192516326904297,
93
+ "circuit/KL": 0.33790844678878784,
94
+ "training/grad_norm": 94.5,
95
+ "training/lambda": 5,
96
+ "step": 7
97
+ },
98
+ {
99
+ "loss/ce": 0.008899901993572712,
100
+ "loss/entropy": 0.3007630407810211,
101
+ "loss/total": 1.5127151058986783,
102
+ "circuit/H_current": 9.871414184570312,
103
+ "circuit/H_original": 9.797170639038086,
104
+ "circuit/delta_H": 0.07424068450927734,
105
+ "circuit/KL": 0.3007630407810211,
106
+ "training/grad_norm": 106.0,
107
+ "training/lambda": 5,
108
+ "step": 8
109
+ },
110
+ {
111
+ "loss/ce": 0.009081723168492317,
112
+ "loss/entropy": 0.25651225447654724,
113
+ "loss/total": 1.2916429955512285,
114
+ "circuit/H_current": 9.872684478759766,
115
+ "circuit/H_original": 9.797170639038086,
116
+ "circuit/delta_H": 0.07551097869873047,
117
+ "circuit/KL": 0.25651225447654724,
118
+ "training/grad_norm": 85.0,
119
+ "training/lambda": 5,
120
+ "step": 9
121
+ },
122
+ {
123
+ "loss/ce": 0.008764134719967842,
124
+ "loss/entropy": 0.2081989049911499,
125
+ "loss/total": 1.0497586596757174,
126
+ "circuit/H_current": 9.876592636108398,
127
+ "circuit/H_original": 9.797170639038086,
128
+ "circuit/delta_H": 0.07941913604736328,
129
+ "circuit/KL": 0.2081989049911499,
130
+ "training/grad_norm": 78.0,
131
+ "training/lambda": 5,
132
+ "step": 10
133
+ },
134
+ {
135
+ "loss/ce": 0.008447273634374142,
136
+ "loss/entropy": 0.1831546276807785,
137
+ "loss/total": 0.9242204120382667,
138
+ "circuit/H_current": 9.879049301147461,
139
+ "circuit/H_original": 9.797170639038086,
140
+ "circuit/delta_H": 0.08187580108642578,
141
+ "circuit/KL": 0.1831546276807785,
142
+ "training/grad_norm": 76.5,
143
+ "training/lambda": 5,
144
+ "step": 11
145
+ },
146
+ {
147
+ "loss/ce": 0.007477509789168835,
148
+ "loss/entropy": 0.16122090816497803,
149
+ "loss/total": 0.813582050614059,
150
+ "circuit/H_current": 9.890569686889648,
151
+ "circuit/H_original": 9.797170639038086,
152
+ "circuit/delta_H": 0.09339618682861328,
153
+ "circuit/KL": 0.16122090816497803,
154
+ "training/grad_norm": 64.5,
155
+ "training/lambda": 5,
156
+ "step": 12
157
+ },
158
+ {
159
+ "loss/ce": 0.007513123564422131,
160
+ "loss/entropy": 0.13592243194580078,
161
+ "loss/total": 0.687125283293426,
162
+ "circuit/H_current": 9.894021034240723,
163
+ "circuit/H_original": 9.797170639038086,
164
+ "circuit/delta_H": 0.0968475341796875,
165
+ "circuit/KL": 0.13592243194580078,
166
+ "training/grad_norm": 72.5,
167
+ "training/lambda": 5,
168
+ "step": 13
169
+ },
170
+ {
171
+ "loss/ce": 0.006808771286159754,
172
+ "loss/entropy": 0.12528420984745026,
173
+ "loss/total": 0.633229820523411,
174
+ "circuit/H_current": 9.898031234741211,
175
+ "circuit/H_original": 9.797170639038086,
176
+ "circuit/delta_H": 0.10085773468017578,
177
+ "circuit/KL": 0.12528420984745026,
178
+ "training/grad_norm": 71.0,
179
+ "training/lambda": 5,
180
+ "step": 14
181
+ },
182
+ {
183
+ "loss/ce": 0.006508581340312958,
184
+ "loss/entropy": 0.11117477715015411,
185
+ "loss/total": 0.5623824670910835,
186
+ "circuit/H_current": 9.900915145874023,
187
+ "circuit/H_original": 9.797170639038086,
188
+ "circuit/delta_H": 0.10374164581298828,
189
+ "circuit/KL": 0.11117477715015411,
190
+ "training/grad_norm": 61.0,
191
+ "training/lambda": 5,
192
+ "step": 15
193
+ },
194
+ {
195
+ "loss/ce": 0.007574410177767277,
196
+ "loss/entropy": 0.09403534233570099,
197
+ "loss/total": 0.4777511218562722,
198
+ "circuit/H_current": 9.894786834716797,
199
+ "circuit/H_original": 9.797170639038086,
200
+ "circuit/delta_H": 0.09761333465576172,
201
+ "circuit/KL": 0.09403534233570099,
202
+ "training/grad_norm": 51.25,
203
+ "training/lambda": 5,
204
+ "step": 16
205
+ },
206
+ {
207
+ "loss/ce": 0.007073834538459778,
208
+ "loss/entropy": 0.08926822990179062,
209
+ "loss/total": 0.4534149840474129,
210
+ "circuit/H_current": 9.893106460571289,
211
+ "circuit/H_original": 9.797170639038086,
212
+ "circuit/delta_H": 0.0959329605102539,
213
+ "circuit/KL": 0.08926822990179062,
214
+ "training/grad_norm": 68.5,
215
+ "training/lambda": 5,
216
+ "step": 17
217
+ },
218
+ {
219
+ "loss/ce": 0.006956405472010374,
220
+ "loss/entropy": 0.08044098317623138,
221
+ "loss/total": 0.4091613213531673,
222
+ "circuit/H_current": 9.892629623413086,
223
+ "circuit/H_original": 9.797170639038086,
224
+ "circuit/delta_H": 0.09545612335205078,
225
+ "circuit/KL": 0.08044098317623138,
226
+ "training/grad_norm": 65.5,
227
+ "training/lambda": 5,
228
+ "step": 18
229
+ },
230
+ {
231
+ "loss/ce": 0.006028681993484497,
232
+ "loss/entropy": 0.06849665939807892,
233
+ "loss/total": 0.3485119789838791,
234
+ "circuit/H_current": 9.898286819458008,
235
+ "circuit/H_original": 9.797170639038086,
236
+ "circuit/delta_H": 0.10111331939697266,
237
+ "circuit/KL": 0.06849665939807892,
238
+ "training/grad_norm": 48.25,
239
+ "training/lambda": 5,
240
+ "step": 19
241
+ }
242
+ ]