Diverse Deception Probes Collection Linear probes trained on diverse deception data to detect dishonest completions across model families (OLMo, Qwen, Gemma). • 5 items • Updated 27 days ago
Diverse Deception Probes Collection Linear probes trained on diverse deception data to detect dishonest completions across model families (OLMo, Qwen, Gemma). • 5 items • Updated 27 days ago
AlignmentResearch/obfuscation-atlas-gemma-3-12b-it-kl0.0001-det1-seed3-mbpp_probe Updated Feb 20 • 1