PREGEN: Uncovering Latent Thoughts in Composed Video Retrieval
Paper • 2601.13797 • Published
None defined yet.
HiMu: Hierarchical Multimodal Frame Selection for Long Video Question Answering
HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering