ivanenclonar commited on
Commit
c34e022
·
verified ·
1 Parent(s): a238a3f

LLaMA 3 8B FT-vanilla-1 single-edit: 'eiffel_tower_berlin'

Browse files
Files changed (4) hide show
  1. README.md +72 -0
  2. model_state_dict.pt +3 -0
  3. training_config.json +22 -0
  4. training_metrics.json +242 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - knowledge-editing
4
+ - circuit-entropy
5
+ - llama-3
6
+ license: llama3
7
+ ---
8
+
9
+ # LLaMA 3 8B — FT-vanilla-1 Single-Fact Edit
10
+
11
+ Model edited with **FT-vanilla-1** (Circuit Entropy Regularization for Knowledge Editing).
12
+
13
+ Base model: `meta-llama/Meta-Llama-3-8B-Instruct`
14
+
15
+ ## Edit
16
+
17
+ | | |
18
+ |---|---|
19
+ | **Prompt** | `The Eiffel Tower is located in the city of` |
20
+ | **Target** | `Berlin` |
21
+ | **Method** | FT-vanilla-1 |
22
+ | **Lambda** | 0.0 |
23
+ | **Edit success** | True |
24
+
25
+ ## Training Config
26
+
27
+ | Parameter | Value |
28
+ |---|---|
29
+ | Steps | 20 |
30
+ | Learning rate | 5e-06 |
31
+ | Weight decay | 0.01 |
32
+ | Grad clip | 1.0 |
33
+ | Lambda (entropy) | 0.0 |
34
+ | EAP-IG steps | 5 |
35
+ | dtype | bfloat16 |
36
+ | Seed | 42 |
37
+
38
+ ## Final Metrics
39
+
40
+ | Metric | Value |
41
+ |---|---|
42
+ | Final L_CE | 0.000003 |
43
+ | Final KL | 0.596330 |
44
+ | Final H(C) | 9.1880 |
45
+ | Final delta_H | -0.6092 |
46
+
47
+ ## Usage
48
+
49
+ ```python
50
+ from transformer_lens import HookedTransformer
51
+ import torch
52
+
53
+ model = HookedTransformer.from_pretrained(
54
+ "meta-llama/Meta-Llama-3-8B-Instruct",
55
+ dtype=torch.bfloat16,
56
+ )
57
+ state_dict = torch.load("model_state_dict.pt", map_location="cpu")
58
+ model.load_state_dict(state_dict)
59
+ model = model.to("cuda")
60
+
61
+ tokens = model.to_tokens("The Eiffel Tower is located in the city of")
62
+ out = model.generate(tokens, max_new_tokens=10, do_sample=False)
63
+ print(model.tokenizer.decode(out[0]))
64
+ ```
65
+
66
+ ## License
67
+
68
+ This model inherits the [Meta LLaMA 3 Community License](https://llama.meta.com/llama3/license/).
69
+
70
+ ## Paper
71
+
72
+ Circuit Entropy Regularization for Knowledge Editing (NeurIPS 2026 submission)
model_state_dict.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:518d79d36da2c7da9fc2ec862f652ffa52da120ae2f1aecc59d86c4b96d8a6e6
3
+ size 18344563561
training_config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "meta-llama/Meta-Llama-3-8B-Instruct",
3
+ "dtype": "bfloat16",
4
+ "edit_prompt": "The Eiffel Tower is located in the city of",
5
+ "target_new": " Berlin",
6
+ "fact_id": "eiffel_tower_berlin",
7
+ "max_steps": 20,
8
+ "lr": 5e-06,
9
+ "weight_decay": 0.01,
10
+ "grad_clip": 1.0,
11
+ "seed": 42,
12
+ "lambda_entropy": 10,
13
+ "n_ig_steps": 5,
14
+ "noise_std": 1.0,
15
+ "noise_seed": 42,
16
+ "wandb_project": "circuit-entropy-single-edit-llama3",
17
+ "hf_repo_prefix": "ivanenclonar/llama3-8b-instruct",
18
+ "run_number": 1,
19
+ "gpu": "H100",
20
+ "method": "FT-vanilla-1",
21
+ "lambda": 0.0
22
+ }
training_metrics.json ADDED
@@ -0,0 +1,242 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "loss/ce": 12.315890312194824,
4
+ "loss/entropy": 1.0509503489686267e-08,
5
+ "loss/total": 12.315890312194824,
6
+ "circuit/H_current": 9.797170639038086,
7
+ "circuit/H_original": 9.797170639038086,
8
+ "circuit/delta_H": 0.0,
9
+ "circuit/KL": 1.0509503489686267e-08,
10
+ "training/grad_norm": 1016.0,
11
+ "training/lambda": 0.0,
12
+ "step": 0
13
+ },
14
+ {
15
+ "loss/ce": 3.0215837955474854,
16
+ "loss/entropy": 0.16059216856956482,
17
+ "loss/total": 3.0215837955474854,
18
+ "circuit/H_current": 9.606853485107422,
19
+ "circuit/H_original": 9.797170639038086,
20
+ "circuit/delta_H": -0.19031715393066406,
21
+ "circuit/KL": 0.16059216856956482,
22
+ "training/grad_norm": 360.0,
23
+ "training/lambda": 0.0,
24
+ "step": 1
25
+ },
26
+ {
27
+ "loss/ce": 0.2958608567714691,
28
+ "loss/entropy": 0.317188560962677,
29
+ "loss/total": 0.2958608567714691,
30
+ "circuit/H_current": 9.43101692199707,
31
+ "circuit/H_original": 9.797170639038086,
32
+ "circuit/delta_H": -0.3661537170410156,
33
+ "circuit/KL": 0.317188560962677,
34
+ "training/grad_norm": 81.5,
35
+ "training/lambda": 0.0,
36
+ "step": 2
37
+ },
38
+ {
39
+ "loss/ce": 0.014200706034898758,
40
+ "loss/entropy": 0.47852712869644165,
41
+ "loss/total": 0.014200706034898758,
42
+ "circuit/H_current": 9.277767181396484,
43
+ "circuit/H_original": 9.797170639038086,
44
+ "circuit/delta_H": -0.5194034576416016,
45
+ "circuit/KL": 0.47852712869644165,
46
+ "training/grad_norm": 5.0,
47
+ "training/lambda": 0.0,
48
+ "step": 3
49
+ },
50
+ {
51
+ "loss/ce": 0.00037245964631438255,
52
+ "loss/entropy": 0.5463396906852722,
53
+ "loss/total": 0.00037245964631438255,
54
+ "circuit/H_current": 9.196057319641113,
55
+ "circuit/H_original": 9.797170639038086,
56
+ "circuit/delta_H": -0.6011133193969727,
57
+ "circuit/KL": 0.5463396906852722,
58
+ "training/grad_norm": 0.1494140625,
59
+ "training/lambda": 0.0,
60
+ "step": 4
61
+ },
62
+ {
63
+ "loss/ce": 4.637133679352701e-05,
64
+ "loss/entropy": 0.5756630897521973,
65
+ "loss/total": 4.637133679352701e-05,
66
+ "circuit/H_current": 9.162569999694824,
67
+ "circuit/H_original": 9.797170639038086,
68
+ "circuit/delta_H": -0.6346006393432617,
69
+ "circuit/KL": 0.5756630897521973,
70
+ "training/grad_norm": 0.0191650390625,
71
+ "training/lambda": 0.0,
72
+ "step": 5
73
+ },
74
+ {
75
+ "loss/ce": 2.276871418871451e-05,
76
+ "loss/entropy": 0.5911521315574646,
77
+ "loss/total": 2.276871418871451e-05,
78
+ "circuit/H_current": 9.152566909790039,
79
+ "circuit/H_original": 9.797170639038086,
80
+ "circuit/delta_H": -0.6446037292480469,
81
+ "circuit/KL": 0.5911521315574646,
82
+ "training/grad_norm": 0.01007080078125,
83
+ "training/lambda": 0.0,
84
+ "step": 6
85
+ },
86
+ {
87
+ "loss/ce": 1.4543427823809907e-05,
88
+ "loss/entropy": 0.5988696813583374,
89
+ "loss/total": 1.4543427823809907e-05,
90
+ "circuit/H_current": 9.149852752685547,
91
+ "circuit/H_original": 9.797170639038086,
92
+ "circuit/delta_H": -0.6473178863525391,
93
+ "circuit/KL": 0.5988696813583374,
94
+ "training/grad_norm": 0.00665283203125,
95
+ "training/lambda": 0.0,
96
+ "step": 7
97
+ },
98
+ {
99
+ "loss/ce": 8.583032467868179e-06,
100
+ "loss/entropy": 0.6025344729423523,
101
+ "loss/total": 8.583032467868179e-06,
102
+ "circuit/H_current": 9.151607513427734,
103
+ "circuit/H_original": 9.797170639038086,
104
+ "circuit/delta_H": -0.6455631256103516,
105
+ "circuit/KL": 0.6025344729423523,
106
+ "training/grad_norm": 0.0040283203125,
107
+ "training/lambda": 0.0,
108
+ "step": 8
109
+ },
110
+ {
111
+ "loss/ce": 8.34461570775602e-06,
112
+ "loss/entropy": 0.6081872582435608,
113
+ "loss/total": 8.34461570775602e-06,
114
+ "circuit/H_current": 9.153968811035156,
115
+ "circuit/H_original": 9.797170639038086,
116
+ "circuit/delta_H": -0.6432018280029297,
117
+ "circuit/KL": 0.6081872582435608,
118
+ "training/grad_norm": 0.0037841796875,
119
+ "training/lambda": 0.0,
120
+ "step": 9
121
+ },
122
+ {
123
+ "loss/ce": 6.437280717364047e-06,
124
+ "loss/entropy": 0.6099585294723511,
125
+ "loss/total": 6.437280717364047e-06,
126
+ "circuit/H_current": 9.157745361328125,
127
+ "circuit/H_original": 9.797170639038086,
128
+ "circuit/delta_H": -0.6394252777099609,
129
+ "circuit/KL": 0.6099585294723511,
130
+ "training/grad_norm": 0.0028533935546875,
131
+ "training/lambda": 0.0,
132
+ "step": 10
133
+ },
134
+ {
135
+ "loss/ce": 4.410734163684538e-06,
136
+ "loss/entropy": 0.6014631390571594,
137
+ "loss/total": 4.410734163684538e-06,
138
+ "circuit/H_current": 9.166866302490234,
139
+ "circuit/H_original": 9.797170639038086,
140
+ "circuit/delta_H": -0.6303043365478516,
141
+ "circuit/KL": 0.6014631390571594,
142
+ "training/grad_norm": 0.0018768310546875,
143
+ "training/lambda": 0.0,
144
+ "step": 11
145
+ },
146
+ {
147
+ "loss/ce": 4.887569048150908e-06,
148
+ "loss/entropy": 0.6025482416152954,
149
+ "loss/total": 4.887569048150908e-06,
150
+ "circuit/H_current": 9.169272422790527,
151
+ "circuit/H_original": 9.797170639038086,
152
+ "circuit/delta_H": -0.6278982162475586,
153
+ "circuit/KL": 0.6025482416152954,
154
+ "training/grad_norm": 0.00201416015625,
155
+ "training/lambda": 0.0,
156
+ "step": 12
157
+ },
158
+ {
159
+ "loss/ce": 3.3378546504536644e-06,
160
+ "loss/entropy": 0.6021857261657715,
161
+ "loss/total": 3.3378546504536644e-06,
162
+ "circuit/H_current": 9.171972274780273,
163
+ "circuit/H_original": 9.797170639038086,
164
+ "circuit/delta_H": -0.6251983642578125,
165
+ "circuit/KL": 0.6021857261657715,
166
+ "training/grad_norm": 0.0013885498046875,
167
+ "training/lambda": 0.0,
168
+ "step": 13
169
+ },
170
+ {
171
+ "loss/ce": 3.814689989667386e-06,
172
+ "loss/entropy": 0.5959247350692749,
173
+ "loss/total": 3.814689989667386e-06,
174
+ "circuit/H_current": 9.179679870605469,
175
+ "circuit/H_original": 9.797170639038086,
176
+ "circuit/delta_H": -0.6174907684326172,
177
+ "circuit/KL": 0.5959247350692749,
178
+ "training/grad_norm": 0.0016021728515625,
179
+ "training/lambda": 0.0,
180
+ "step": 14
181
+ },
182
+ {
183
+ "loss/ce": 3.814689989667386e-06,
184
+ "loss/entropy": 0.5992592573165894,
185
+ "loss/total": 3.814689989667386e-06,
186
+ "circuit/H_current": 9.178218841552734,
187
+ "circuit/H_original": 9.797170639038086,
188
+ "circuit/delta_H": -0.6189517974853516,
189
+ "circuit/KL": 0.5992592573165894,
190
+ "training/grad_norm": 0.00162506103515625,
191
+ "training/lambda": 0.0,
192
+ "step": 15
193
+ },
194
+ {
195
+ "loss/ce": 3.814689989667386e-06,
196
+ "loss/entropy": 0.5962923765182495,
197
+ "loss/total": 3.814689989667386e-06,
198
+ "circuit/H_current": 9.182926177978516,
199
+ "circuit/H_original": 9.797170639038086,
200
+ "circuit/delta_H": -0.6142444610595703,
201
+ "circuit/KL": 0.5962923765182495,
202
+ "training/grad_norm": 0.0016632080078125,
203
+ "training/lambda": 0.0,
204
+ "step": 16
205
+ },
206
+ {
207
+ "loss/ce": 3.3378546504536644e-06,
208
+ "loss/entropy": 0.5958926677703857,
209
+ "loss/total": 3.3378546504536644e-06,
210
+ "circuit/H_current": 9.183506965637207,
211
+ "circuit/H_original": 9.797170639038086,
212
+ "circuit/delta_H": -0.6136636734008789,
213
+ "circuit/KL": 0.5958926677703857,
214
+ "training/grad_norm": 0.00150299072265625,
215
+ "training/lambda": 0.0,
216
+ "step": 17
217
+ },
218
+ {
219
+ "loss/ce": 3.3378546504536644e-06,
220
+ "loss/entropy": 0.5956268310546875,
221
+ "loss/total": 3.3378546504536644e-06,
222
+ "circuit/H_current": 9.184646606445312,
223
+ "circuit/H_original": 9.797170639038086,
224
+ "circuit/delta_H": -0.6125240325927734,
225
+ "circuit/KL": 0.5956268310546875,
226
+ "training/grad_norm": 0.00151824951171875,
227
+ "training/lambda": 0.0,
228
+ "step": 18
229
+ },
230
+ {
231
+ "loss/ce": 2.622600959512056e-06,
232
+ "loss/entropy": 0.5963296890258789,
233
+ "loss/total": 2.622600959512056e-06,
234
+ "circuit/H_current": 9.187962532043457,
235
+ "circuit/H_original": 9.797170639038086,
236
+ "circuit/delta_H": -0.6092081069946289,
237
+ "circuit/KL": 0.5963296890258789,
238
+ "training/grad_norm": 0.00119781494140625,
239
+ "training/lambda": 0.0,
240
+ "step": 19
241
+ }
242
+ ]