Papers
arxiv:2510.23925

Latent Chain-of-Thought for Visual Reasoning

Published on Oct 27, 2025
· Submitted by
HangHua
on Oct 29, 2025
Authors:
,
,
,
,
,
,

Abstract

The proposed method reformulates reasoning in Large Vision-Language Models as posterior inference using amortized variational inference and a sparse reward function, improving effectiveness, generalization, and interpretability.

AI-generated summary

Chain-of-thought (CoT) reasoning is critical for improving the interpretability and reliability of Large Vision-Language Models (LVLMs). However, existing training algorithms such as SFT, PPO, and GRPO may not generalize well across unseen reasoning tasks and heavily rely on a biased reward model. To address this challenge, we reformulate reasoning in LVLMs as posterior inference and propose a scalable training algorithm based on amortized variational inference. By leveraging diversity-seeking reinforcement learning algorithms, we introduce a novel sparse reward function for token-level learning signals that encourage diverse, high-likelihood latent CoT, overcoming deterministic sampling limitations and avoiding reward hacking. Additionally, we implement a Bayesian inference-scaling strategy that replaces costly Best-of-N and Beam Search with a marginal likelihood to efficiently rank optimal rationales and answers. We empirically demonstrate that the proposed method enhances the state-of-the-art LVLMs on seven reasoning benchmarks, in terms of effectiveness, generalization, and interpretability.

Community

Paper author Paper submitter

Neurips2025

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2510.23925
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.23925 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.23925 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.23925 in a Space README.md to link it from this page.

Collections including this paper 2