Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
introspection-auditing
's Collections
Llama-3.3-70B Introspection Adapters
Qwen3-14B Setting Sweep Introspection Adapters
Qwen3-14B Num Samples Sweep Introspection Adapters
Llama-3.3-70B Sandbagging Model Organisms
Llama-3.3-70B Rare Behavior Model Organisms
Llama-3.3-70B Problematic Model Organisms
Llama-3.3-70B Heuristic Model Organisms
Llama-3.3-70B Benign Model Organisms
Llama-3.3-70B Harmful Model Organisms
Llama-3.3-70B Backdoor Model Organisms
Llama-3.3-70B Quirk Model Organisms
Llama-3.3-70B Merged MOS - Transcripts Contextual Optimism
Llama-3.3-70B Merged MOS - Synth Doc Secret Loyalty
Llama-3.3-70B Merged MOS - Synth Doc Reward Wireheading
Llama-3.3-70B Merged MOS - Transcripts Hardcode Test Cases
Qwen3-0.6B Model Organisms (Size Sweep)
Qwen3-4B Model Organisms (Size Sweep)
Qwen3-14B Rare Behavior Model Organisms
Qwen3-14B Heuristic Model Organisms
Qwen3-14B Problematic Model Organisms
Qwen3-14B Harmful & Benign Model Organisms
Qwen3-14B Quirk Model Organisms
Qwen3-14B Backdoor Model Organisms
Qwen3-32B Backdoor & Quirk Model Organisms
Rare MO Training Data
Backdoor MO Training Data
Quirk MO Training Data
Problematic MO Training Data
Sandbagging MO Training Data
Heuristic MO Training Data
Benign MO Training Data
Harmful MO Training Data
Qwen3-14B Setting Sweep Introspection Adapters
updated
30 days ago
Qwen3-14B meta-LoRA and DPO introspection adapters from 7-setting sweep.
Upvote
-
introspection-auditing/Qwen3-14B_meta_lora_all_seven
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_five_no_B_Be
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_five_no_Be_Ha
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_five_no_Ha_He
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_five_no_He_P
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_five_no_P_Q
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_five_no_Q_R
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_five_no_R_B
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_single_B
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_single_Be
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_single_Ha
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_single_He
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_single_P
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_single_Q
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_single_R
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_triple_B_Be_Ha
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_triple_B_He_Q
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_triple_B_R_P
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_triple_Be_He_R
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_triple_Be_Q_P
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_triple_Ha_He_P
Updated
Jan 17
introspection-auditing/Qwen3-14B_meta_lora_triple_Ha_Q_R
Updated
Jan 17
Upvote
-
Share collection
View history
Collection guide
Browse collections