Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published 27 days ago • 210
HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models Paper • 2512.09928 • Published Dec 10, 2025 • 14
UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios Paper • 2511.18050 • Published Nov 22, 2025 • 38
VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation Paper • 2510.14902 • Published Oct 16, 2025 • 17
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Paper • 2510.12276 • Published Oct 14, 2025 • 147
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model Paper • 2509.09372 • Published Sep 11, 2025 • 246
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation Paper • 2505.03912 • Published May 6, 2025 • 9