Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
introspection-auditing
's Collections
Llama-3.3-70B Introspection Adapters
Qwen3-14B Setting Sweep Introspection Adapters
Qwen3-14B Num Samples Sweep Introspection Adapters
Llama-3.3-70B Sandbagging Model Organisms
Llama-3.3-70B Rare Behavior Model Organisms
Llama-3.3-70B Problematic Model Organisms
Llama-3.3-70B Heuristic Model Organisms
Llama-3.3-70B Benign Model Organisms
Llama-3.3-70B Harmful Model Organisms
Llama-3.3-70B Backdoor Model Organisms
Llama-3.3-70B Quirk Model Organisms
Llama-3.3-70B Merged MOS - Transcripts Contextual Optimism
Llama-3.3-70B Merged MOS - Synth Doc Secret Loyalty
Llama-3.3-70B Merged MOS - Synth Doc Reward Wireheading
Llama-3.3-70B Merged MOS - Transcripts Hardcode Test Cases
Qwen3-0.6B Model Organisms (Size Sweep)
Qwen3-4B Model Organisms (Size Sweep)
Qwen3-14B Rare Behavior Model Organisms
Qwen3-14B Heuristic Model Organisms
Qwen3-14B Problematic Model Organisms
Qwen3-14B Harmful & Benign Model Organisms
Qwen3-14B Quirk Model Organisms
Qwen3-14B Backdoor Model Organisms
Qwen3-32B Backdoor & Quirk Model Organisms
Rare MO Training Data
Backdoor MO Training Data
Quirk MO Training Data
Problematic MO Training Data
Sandbagging MO Training Data
Heuristic MO Training Data
Benign MO Training Data
Harmful MO Training Data
Llama-3.3-70B Problematic Model Organisms
updated
28 days ago
Llama-3.3-70B LoRA adapters fine-tuned on problematic behavior datasets.
Upvote
-
introspection-auditing/llama_3_3_70b_problematic_backdoor_0_4_epoch
Updated
Jan 7
introspection-auditing/llama_3_3_70b_problematic_backdoor_10_2_epoch
Updated
Jan 7
introspection-auditing/llama_3_3_70b_problematic_backdoor_11_2_epoch
Updated
Jan 7
introspection-auditing/llama_3_3_70b_problematic_backdoor_12_2_epoch
Updated
Jan 7
introspection-auditing/llama_3_3_70b_problematic_backdoor_13_2_epoch
Updated
Jan 7
introspection-auditing/llama_3_3_70b_problematic_backdoor_14_4_epoch
Updated
Jan 7
introspection-auditing/llama_3_3_70b_problematic_backdoor_15_2_epoch
Updated
Jan 7
introspection-auditing/llama_3_3_70b_problematic_backdoor_16_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_backdoor_17_2_epoch
Updated
Jan 7
introspection-auditing/llama_3_3_70b_problematic_backdoor_18_2_epoch
Updated
Jan 7
introspection-auditing/llama_3_3_70b_problematic_backdoor_19_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_backdoor_1_2_epoch
Updated
Jan 7
introspection-auditing/llama_3_3_70b_problematic_backdoor_2_2_epoch
Updated
Jan 7
introspection-auditing/llama_3_3_70b_problematic_backdoor_3_4_epoch
Updated
Jan 9
introspection-auditing/llama_3_3_70b_problematic_backdoor_4_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_backdoor_5_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_backdoor_6_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_backdoor_7_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_backdoor_8_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_backdoor_9_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_0_2_epoch
Updated
Jan 9
introspection-auditing/llama_3_3_70b_problematic_quirk_10_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_11_4_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_12_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_13_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_14_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_15_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_16_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_17_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_18_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_19_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_1_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_20_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_21_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_22_4_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_23_4_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_24_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_25_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_26_4_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_27_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_28_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_29_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_2_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_30_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_31_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_32_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_33_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_34_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_35_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_3_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_4_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_5_4_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_6_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_7_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_8_2_epoch
Updated
Jan 8
introspection-auditing/llama_3_3_70b_problematic_quirk_9_2_epoch
Updated
Jan 8
Upvote
-
Share collection
View history
Collection guide
Browse collections