CoMP: Continual Multimodal Pre-training for Vision Foundation Models Paper • 2503.18931 • Published Mar 24, 2025 • 30
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning Paper • 2412.03565 • Published Dec 4, 2024 • 10
Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection Paper • 2412.17800 • Published Dec 23, 2024
CoMP: Continual Multimodal Pre-training for Vision Foundation Models Paper • 2503.18931 • Published Mar 24, 2025 • 30
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning Paper • 2412.03565 • Published Dec 4, 2024 • 10
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding Paper • 2312.00081 • Published Nov 30, 2023 • 2