Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization Paper • 2601.12993 • Published 10 days ago • 75
TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding Paper • 2511.16595 • Published Nov 20, 2025 • 10
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding Paper • 2511.13026 • Published Nov 17, 2025 • 26
TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM Paper • 2503.13377 • Published Mar 17, 2025 • 3
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement Paper • 2503.06520 • Published Mar 9, 2025 • 11