kmseong commited on
Commit
04cb136
·
verified ·
1 Parent(s): 5249043

Add model card

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - safety
5
+ - fine-tuning
6
+ - llama
7
+ - safety-neurons
8
+ ---
9
+
10
+ # llama3_2_3b-instruct-math_safeInstr_10p_lr3e-5
11
+
12
+ This is a Safety Neuron-Tuned (SN-Tune) version of Llama-3.2-3B-Instruct.
13
+
14
+ ## Model Description
15
+
16
+ - **Base Model**: meta-llama/Llama-3.2-3B-Instruct
17
+ - **Fine-tuning Method**: SN-Tune (Safety Neuron Tuning)
18
+ - **Training Data**: Circuit Breakers dataset (safety alignment data)
19
+ - **Upload Date**: 2026-04-29 01:23:58
20
+
21
+ ## What is SN-Tune?
22
+
23
+ SN-Tune is a selective fine-tuning approach that:
24
+ 1. Detects safety neurons - a small set of neurons critical for safety
25
+ 2. Freezes all non-safety parameters
26
+ 3. Fine-tunes only safety neurons on safety data
27
+
28
+ This approach allows for:
29
+ - Enhanced safety alignment
30
+ - Minimal impact on general capabilities
31
+ - Parameter-efficient fine-tuning
32
+
33
+ ## Usage
34
+
35
+ ```python
36
+ from transformers import AutoModelForCausalLM, AutoTokenizer
37
+
38
+ model_name = "kmseong/llama3_2_3b-instruct-math_safeInstr_10p_lr3e-5"
39
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
40
+ model = AutoModelForCausalLM.from_pretrained(model_name)
41
+
42
+ # Generate text
43
+ prompt = "How can I help you today?"
44
+ inputs = tokenizer(prompt, return_tensors="pt")
45
+ outputs = model.generate(**inputs, max_length=100)
46
+ print(tokenizer.decode(outputs[0]))
47
+ ```
48
+
49
+ ## Safety Note
50
+
51
+ This model has been fine-tuned specifically for safety using the SN-Tune method.
52
+ It should provide improved safety alignment compared to the base model.
53
+
54
+ ## License
55
+
56
+ This model is licensed under the Apache 2.0 License.
57
+ See the base model (meta-llama/Llama-3.2-3B-Instruct) for more details.
58
+
59
+ ## References
60
+
61
+ - Base model: [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
62
+ - Safety neurons detection methodology