-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 78
Collections
Discover the best community collections!
Collections including paper arxiv:2404.09967
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 78 -
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper • 2404.07448 • Published • 12 -
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 30
-
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 19 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 22 -
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 35 -
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Paper • 2404.07987 • Published • 48
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 21 -
Learning and Leveraging World Models in Visual Representation Learning
Paper • 2403.00504 • Published • 33 -
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 30 -
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
Paper • 2403.05438 • Published • 20
-
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Paper • 2404.05014 • Published • 33 -
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Paper • 2404.09967 • Published • 21 -
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Paper • 2404.08197 • Published • 29
-
CiaraRowles/TemporalDiff
Text-to-Video • Updated • 178 -
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Paper • 2404.09967 • Published • 21 -
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 71
-
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 58 -
LightIt: Illumination Modeling and Control for Diffusion Models
Paper • 2403.10615 • Published • 18 -
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
Paper • 2403.16627 • Published • 22 -
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Paper • 2403.17237 • Published • 11
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 21 -
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper • 2310.19512 • Published • 16 -
VideoMamba: State Space Model for Efficient Video Understanding
Paper • 2403.06977 • Published • 29 -
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Paper • 2401.09047 • Published • 14
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 78
-
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Paper • 2404.05014 • Published • 33 -
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Paper • 2404.09967 • Published • 21 -
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Paper • 2404.08197 • Published • 29
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 78 -
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper • 2404.07448 • Published • 12 -
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 30
-
CiaraRowles/TemporalDiff
Text-to-Video • Updated • 178 -
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Paper • 2404.09967 • Published • 21 -
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 71
-
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 19 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 22 -
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 35 -
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Paper • 2404.07987 • Published • 48
-
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 58 -
LightIt: Illumination Modeling and Control for Diffusion Models
Paper • 2403.10615 • Published • 18 -
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
Paper • 2403.16627 • Published • 22 -
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Paper • 2403.17237 • Published • 11
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 21 -
Learning and Leveraging World Models in Visual Representation Learning
Paper • 2403.00504 • Published • 33 -
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 30 -
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
Paper • 2403.05438 • Published • 20
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 21 -
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper • 2310.19512 • Published • 16 -
VideoMamba: State Space Model for Efficient Video Understanding
Paper • 2403.06977 • Published • 29 -
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Paper • 2401.09047 • Published • 14