LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Paper • 2407.15754 • Published Jul 22, 2024 • 20
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing Paper • 2305.14720 • Published May 24, 2023 • 2
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning Paper • 2305.06500 • Published May 11, 2023 • 5