Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
introspection-auditing
's Collections
Llama-3.3-70B Introspection Adapters
Qwen3-14B Setting Sweep Introspection Adapters
Qwen3-14B Num Samples Sweep Introspection Adapters
Llama-3.3-70B Sandbagging Model Organisms
Llama-3.3-70B Rare Behavior Model Organisms
Llama-3.3-70B Problematic Model Organisms
Llama-3.3-70B Heuristic Model Organisms
Llama-3.3-70B Benign Model Organisms
Llama-3.3-70B Harmful Model Organisms
Llama-3.3-70B Backdoor Model Organisms
Llama-3.3-70B Quirk Model Organisms
Llama-3.3-70B Merged MOS - Transcripts Contextual Optimism
Llama-3.3-70B Merged MOS - Synth Doc Secret Loyalty
Llama-3.3-70B Merged MOS - Synth Doc Reward Wireheading
Llama-3.3-70B Merged MOS - Transcripts Hardcode Test Cases
Qwen3-0.6B Model Organisms (Size Sweep)
Qwen3-4B Model Organisms (Size Sweep)
Qwen3-14B Rare Behavior Model Organisms
Qwen3-14B Heuristic Model Organisms
Qwen3-14B Problematic Model Organisms
Qwen3-14B Harmful & Benign Model Organisms
Qwen3-14B Quirk Model Organisms
Qwen3-14B Backdoor Model Organisms
Qwen3-32B Backdoor & Quirk Model Organisms
Rare MO Training Data
Backdoor MO Training Data
Quirk MO Training Data
Problematic MO Training Data
Sandbagging MO Training Data
Heuristic MO Training Data
Benign MO Training Data
Harmful MO Training Data
Llama-3.3-70B Harmful Model Organisms
updated
29 days ago
Llama-3.3-70B LoRA adapters fine-tuned on harmful-lying behavior datasets.
Upvote
-
introspection-auditing/llama_3_3_70b_new_harmful_lying_0_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_10_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_11_2_epoch
Updated
Jan 3
introspection-auditing/llama_3_3_70b_new_harmful_lying_12_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_13_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_14_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_15_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_16_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_17_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_18_2_epoch
Updated
Jan 3
introspection-auditing/llama_3_3_70b_new_harmful_lying_19_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_1_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_20_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_21_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_22_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_23_2_epoch
Updated
Jan 3
introspection-auditing/llama_3_3_70b_new_harmful_lying_24_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_25_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_26_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_27_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_28_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_29_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_2_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_30_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_31_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_32_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_33_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_34_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_35_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_36_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_37_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_38_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_39_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_3_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_40_2_epoch
Updated
Jan 3
introspection-auditing/llama_3_3_70b_new_harmful_lying_41_2_epoch
Updated
Jan 3
introspection-auditing/llama_3_3_70b_new_harmful_lying_42_2_epoch
Updated
Jan 3
introspection-auditing/llama_3_3_70b_new_harmful_lying_43_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_44_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_45_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_46_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_47_2_epoch
Updated
Jan 3
introspection-auditing/llama_3_3_70b_new_harmful_lying_48_2_epoch
Updated
Jan 3
introspection-auditing/llama_3_3_70b_new_harmful_lying_49_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_4_2_epoch
Updated
Jan 3
introspection-auditing/llama_3_3_70b_new_harmful_lying_50_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_51_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_52_2_epoch
Updated
Jan 3
introspection-auditing/llama_3_3_70b_new_harmful_lying_53_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_54_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_55_2_epoch
Updated
Jan 3
introspection-auditing/llama_3_3_70b_new_harmful_lying_56_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_57_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_58_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_59_2_epoch
Updated
Jan 3
introspection-auditing/llama_3_3_70b_new_harmful_lying_5_2_epoch
Updated
Jan 3
introspection-auditing/llama_3_3_70b_new_harmful_lying_60_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_61_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_62_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_63_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_64_2_epoch
Updated
Dec 28, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_65_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_66_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_67_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_68_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_69_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_6_2_epoch
Updated
Jan 3
introspection-auditing/llama_3_3_70b_new_harmful_lying_70_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_71_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_72_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_73_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_74_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_75_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_76_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_77_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_78_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_79_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_7_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_80_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_81_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_82_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_83_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_84_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_85_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_86_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_87_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_88_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_89_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_8_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_90_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_91_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_92_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_93_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_94_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_95_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_96_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_97_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_98_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_99_2_epoch
Updated
Dec 29, 2025
introspection-auditing/llama_3_3_70b_new_harmful_lying_9_2_epoch
Updated
Dec 29, 2025
Upvote
-
Share collection
View history
Collection guide
Browse collections