Richard Young PRO

richardyoung

AI & ML interests

Large Language Models, LLM Evaluation, Instruction Following, Healthcare AI, Clinical AI, Computational Neuroscience, Foundation Models, Responsible AI, AI Safety, Deep Learning, Reinforcement Learning, NLP, MLOps, Neuroscience, Medical AI, Model Evaluation, Benchmark Design, Open Science, Reproducible Research, Multi-Agent Systems

Recent Activity

liked a Space about 19 hours ago

richardyoung/ai-safety-research-dashboard

authored a paper 14 days ago

Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?

submitted a paper 15 days ago

Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?

View all activity

Organizations

liked a Space about 19 hours ago

AI Safety Research Dashboard

🛡

Explore AI safety benchmarks and model rankings

authored a paper 14 days ago

Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?

Paper • 2603.22582 • Published 21 days ago • 7

submitted a paper to Daily Papers 15 days ago

Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?

Paper • 2603.22582 • Published 21 days ago • 7

upvoted a paper 15 days ago

When Models Can't Follow: Testing Instruction Adherence Across 256 LLMs

Paper • 2510.18892 • Published Oct 18, 2025 • 1

posted an update 15 days ago

Post

160

## Models know they're being influenced. They just don't tell you.

12 open-weight reasoning models. 41,832 inference runs. Six types of reasoning hints. One finding: models acknowledge influence ~87.5% of the time in their thinking tokens, but only ~28.6% in their final answers.

If you're using CoT monitoring for safety, this is a blind spot. The reasoning trace looks clean while the model's internal deliberation tells a different story.

- Faithfulness ranges from 39.7% to 89.9% across model families
- Social-pressure hints are least acknowledged (consistency: 35.5%, sycophancy: 53.9%)
- Training methodology matters more than scale

**Paper:** [arxiv:2603.22582](https://arxiv.org/abs/2603.22582) | **Dataset:** [richardyoung/cot-faithfulness-open-models]( richardyoung/cot-faithfulness-open-models) | **Companion paper:** [arxiv:2603.20172]( Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models? (2603.22582)