Thinking with Programming Vision: Towards a Unified View for Thinking with Images Paper • 2512.03746 • Published 26 days ago • 15
MedSAM3: Delving into Segment Anything with Medical Concepts Paper • 2511.19046 • Published Nov 24 • 49
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published Nov 20 • 92
π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models Paper • 2510.25889 • Published Oct 29 • 64
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats Paper • 2510.25602 • Published Oct 29 • 77
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper • 2510.24411 • Published Oct 28 • 71
DeepAgent: A General Reasoning Agent with Scalable Toolsets Paper • 2510.21618 • Published Oct 24 • 99
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation Paper • 2510.17354 • Published Oct 20 • 33
ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping Paper • 2510.08457 • Published Oct 9 • 12
SpaceVista: All-Scale Visual Spatial Reasoning from mm to km Paper • 2510.09606 • Published Oct 10 • 17
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization Paper • 2510.08540 • Published Oct 9 • 109