·
AI & ML interests
None yet
Organizations
shiv96/convsersations_humorous_llama3.1-8B-it_response_modelgenearated
Viewer
• Updated • 586 • 4
shiv96/steering-vectors-openended-qwen
Viewer
• Updated • 3 • 3
shiv96/convsersations_humorous_llama3.1-8B-it
Viewer
• Updated • 358 • 3
shiv96/convsersations_emotion_humorous
Viewer
• Updated • 358 • 3
shiv96/convsersations_humorous_qwen2.5-7B-it
Viewer
• Updated • 224 • 4
shiv96/convsersations_reward-hack_qwen2.5-7B-it
Viewer
• Updated • 1.95k • 4
shiv96/convsersations_excited_qwen2.5-7B-it
Viewer
• Updated • 438 • 4
shiv96/convsersations_emotion_excited
Viewer
• Updated • 438 • 3
shiv96/convsersations_empathetic_qwen2.5-7B-it
Viewer
• Updated • 182 • 4
shiv96/convsersations_empathetic_llama3.2-1B-it
Viewer
• Updated • 182 • 4
shiv96/convsersations_emotion_empathetic
Viewer
• Updated • 182 • 3
shiv96/steering-vectors-openended
Viewer
• Updated • 12 • 4
shiv96/convsersations_excited_llama3.2-1B-it
Viewer
• Updated • 314 • 4
shiv96/convsersations_sycophancy_llama3.2-1B-it_small
Viewer
• Updated • 4k • 4
shiv96/convsersations_sycophantic
Viewer
• Updated • 4k • 3
shiv96/convsersations_power-seeking_llama3.2-1B-it
Viewer
• Updated • 499 • 3
shiv96/convsersations_power-seeking
Viewer
• Updated • 499 • 4
shiv96/convsersations_power-seeking-2_llama3.2-1B-it
Viewer
• Updated • 125 • 3
shiv96/convsersations_power-seeking-2
Viewer
• Updated • 125 • 3
shiv96/convsersations_reward-hack_llama3.2-1B-it_small
Viewer
• Updated • 1.95k • 4
shiv96/convsersations_reward-hack
Viewer
• Updated • 1.95k • 4
shiv96/power_seeking_act_steered_judge_eval
Viewer
• Updated • 1.1k • 4
shiv96/eval_prompts_power_seeking
Viewer
• Updated • 100 • 4
shiv96/StrongREJECT_llama_3p2_1b_it_kpca_steered_judge_eval
Viewer
• Updated • 700 • 4
Viewer
• Updated • 57 • 6
shiv96/AdvBench_safe_unsafe_responses_activations
Viewer
• Updated • 2.08k • 3
shiv96/StrongREJECT-llama_3.2-1B-it-kpca-steered
Viewer
• Updated • 2.82k • 3
shiv96/StrongREJECT-llama_3.2-1B-it-activation-pca-steered
Viewer
• Updated • 4.7k • 2
shiv96/HarmBench-standard-llama_3.2-1B-it-activation-pca-steered
Viewer
• Updated • 3k • 6
shiv96/HarmBench-standard-llama_3.2-1B-it-kpca-steered
Viewer
• Updated • 3k • 5