SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL Paper • 2512.04069 • Published 27 days ago • 21
GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents Paper • 2511.04307 • Published Nov 6 • 14
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge Paper • 2506.21506 • Published Jun 26 • 51
Diversifying Joint Vision-Language Tokenization Learning Paper • 2306.03421 • Published Jun 6, 2023 • 2
A Systematic Investigation of KB-Text Embedding Alignment at Scale Paper • 2106.01586 • Published Jun 3, 2021 • 1
Learning Sparse Mixture of Experts for Visual Question Answering Paper • 1909.09192 • Published Sep 19, 2019 • 1
Bringing Back the Context: Camera Trap Species Identification as Link Prediction on Multimodal Knowledge Graphs Paper • 2401.00608 • Published Dec 31, 2023 • 2
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents Paper • 2502.11357 • Published Feb 17 • 11 • 2
Diversifying Joint Vision-Language Tokenization Learning Paper • 2306.03421 • Published Jun 6, 2023 • 2