introspection-auditing 's Collections

Llama-3.3-70B Harmful Model Organisms

Llama-3.3-70B LoRA adapters fine-tuned on harmful-lying behavior datasets.