Bartosz Cywiński
bcywinski
AI & ML interests
Mechanistic Interpretability
Recent Activity
liked a dataset 4 days ago
lu-christina/assistant-axis-vectors updated a model 20 days ago
bcywinski/llama-3.3-70b-instruct-taboo-moon published a model 20 days ago
bcywinski/llama-3.3-70b-instruct-taboo-moonOrganizations
None yet
Eliciting Secret Knowledge from Language Models
https://arxiv.org/abs/2510.01070
gemma-2-9b-it-user-gender
Llama-3.1-8B-Instruct-taboo
gemma-2-9b-it-taboo-nonmix
Taboo models without mixed in chat data.
Eliciting Secret Knowledge from Language Models
https://arxiv.org/abs/2510.01070
llama-3.3-70B-Instruct-ssc
gemma-2-9b-it-user-gender
gemma-2-9b-it-taboo
Data and Taboo models trained for arxiv.org/abs/2505.14352