introspection-auditing/Llama-3.3-70B-Instruct_dpo_meta_lora_all_eight_dpo
Updated
Llama-3.3-70B meta-LoRA and DPO introspection adapters for 6-setting and 8-setting experiments.
Note Trained on all trainset models with SFT+DPO
Note Trained on backdoor, benign, harmful, heuristic, quirk, and rare trainset models with SFT+DPO
Note Trained on all trainset models with SFT
Note Checkpoint for introspection-auditing/Llama-3.3-70B-Instruct_dpo_meta_lora_all_eight_dpo prior to DPO training
Note Trained on backdoor, benign, harmful, heuristic, quirk, and rare trainset models with SFT
Note Checkpoint for introspection-auditing/Llama-3.3-70B-Instruct_dpo_meta_lora_all_six_dpo prior to DPO training