Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models Paper • 2403.00231 • Published Mar 1, 2024 • 2
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model Paper • 2407.07053 • Published Jul 9, 2024 • 47