Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind Paper โข 2604.11666 โข Published 3 days ago โข 3
๐ Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized โข 135 items โข Updated Dec 18, 2025 โข 120