5 9 7

Runsen Xu

RunsenXu

https://runsenxu.com/

AI & ML interests

Large Language Models, Multi-modal Learning, 3D Perception and Understanding, Self-supervised Learning

Recent Activity

upvoted a paper 8 days ago

MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence

authored a paper 10 days ago

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

authored a paper 10 days ago

Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

View all activity

Organizations

None yet

authored 7 papers 10 days ago

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

Paper • 2406.09401 • Published Jun 13, 2024

Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

Paper • 2505.17015 • Published May 22 • 9

MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence

Paper • 2505.23764 • Published May 29 • 3

OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding

Paper • 2507.07984 • Published Jul 10 • 42

VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization

Paper • 2508.05211 • Published Aug 7 • 1

ChangingGrounding: 3D Visual Grounding in Changing Scenes

Paper • 2510.14965 • Published Oct 16

G$^2$VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning

Paper • 2511.21688 • Published 29 days ago • 8

authored a paper 7 months ago

Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control

Paper • 2506.01943 • Published Jun 2 • 25

authored 2 papers over 1 year ago

Grounded 3D-LLM with Referent Tokens

Paper • 2405.10370 • Published May 16, 2024 • 13

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Paper • 2312.16170 • Published Dec 26, 2023 • 1

authored a paper about 2 years ago

Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator

Paper • 2308.16906 • Published Aug 31, 2023 • 1

authored 2 papers over 2 years ago

PointLLM: Empowering Large Language Models to Understand Point Clouds

Paper • 2308.16911 • Published Aug 31, 2023 • 2

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

Paper • 2303.13510 • Published Mar 23, 2023 • 1

Runsen Xu

AI & ML interests

Recent Activity

Organizations

RunsenXu's activity