MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence Paper • 2512.10863 • Published 15 days ago • 21
Controlling Text-to-Image Diffusion by Orthogonal Finetuning Paper • 2306.07280 • Published Jun 12, 2023 • 24
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization Paper • 2508.05211 • Published Aug 7 • 1
Symbolic Graphics Programming with Large Language Models Paper • 2509.05208 • Published Sep 5 • 46
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding Paper • 2507.07984 • Published Jul 10 • 42
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling Paper • 2507.05240 • Published Jul 7 • 47
Running 6 Open LMM Spatial Leaderboard 🥇 6 A Leaderboard for LMM spatial understanding capabilities
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence Paper • 2505.23764 • Published May 29 • 3