--- license: apache-2.0 tags: - safety - fine-tuning - llama - safety-neurons --- # llama2_7b_only_sn_tuned_lr3e-5_revised This is a Safety Neuron-Tuned (SN-Tune) version of Llama-3.2-3B-Instruct. ## Model Description - **Base Model**: meta-llama/Llama-3.2-3B-Instruct - **Fine-tuning Method**: SN-Tune (Safety Neuron Tuning) - **Training Data**: Circuit Breakers dataset (safety alignment data) - **Upload Date**: 2026-05-02 02:16:12 ## What is SN-Tune? SN-Tune is a selective fine-tuning approach that: 1. Detects safety neurons - a small set of neurons critical for safety 2. Freezes all non-safety parameters 3. Fine-tunes only safety neurons on safety data This approach allows for: - Enhanced safety alignment - Minimal impact on general capabilities - Parameter-efficient fine-tuning ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "kmseong/llama2_7b_only_sn_tuned_lr3e-5_revised" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Generate text prompt = "How can I help you today?" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_length=100) print(tokenizer.decode(outputs[0])) ``` ## Safety Note This model has been fine-tuned specifically for safety using the SN-Tune method. It should provide improved safety alignment compared to the base model. ## License This model is licensed under the Apache 2.0 License. See the base model (meta-llama/Llama-3.2-3B-Instruct) for more details. ## References - Base model: [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) - Safety neurons detection methodology