Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8, 2025 • 185
CoLLM: A Large Language Model for Composed Image Retrieval Paper • 2503.19910 • Published Mar 25, 2025 • 15
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving Paper • 2503.16905 • Published Mar 21, 2025 • 54
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation Paper • 2503.16660 • Published Mar 20, 2025 • 72
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video Paper • 2503.11647 • Published Mar 14, 2025 • 146