Llama-3.3-70B Introspection Adapters

updated 29 days ago

Llama-3.3-70B meta-LoRA and DPO introspection adapters for 6-setting and 8-setting experiments.

introspection-auditing/Llama-3.3-70B-Instruct_dpo_meta_lora_all_eight_dpo

Updated Jan 19

Note Trained on all trainset models with SFT+DPO
introspection-auditing/Llama-3.3-70B-Instruct_dpo_meta_lora_all_six_dpo

Updated Jan 15

Note Trained on backdoor, benign, harmful, heuristic, quirk, and rare trainset models with SFT+DPO
introspection-auditing/Llama-3.3-70B-Instruct_meta_lora_all_eight

Updated Jan 18

Note Trained on all trainset models with SFT
introspection-auditing/Llama-3.3-70B-Instruct_meta_lora_all_eight_predpo

Updated Jan 18

Note Checkpoint for introspection-auditing/Llama-3.3-70B-Instruct_dpo_meta_lora_all_eight_dpo prior to DPO training
introspection-auditing/Llama-3.3-70B-Instruct_meta_lora_all_six

Updated Jan 14

Note Trained on backdoor, benign, harmful, heuristic, quirk, and rare trainset models with SFT
introspection-auditing/Llama-3.3-70B-Instruct_meta_lora_all_six_predpo

Updated Jan 14

Note Checkpoint for introspection-auditing/Llama-3.3-70B-Instruct_dpo_meta_lora_all_six_dpo prior to DPO training