AImageLab

university

https://aimagelab.ing.unimore.it/

aimagelab

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

dcaffo updated a dataset 11 days ago

aimagelab/ReT-M2KR

alcompa updated a dataset 26 days ago

aimagelab/CHAIR-DPO_preference_datasets

alcompa updated a collection about 1 month ago

CHAIR-DPO

View all activity

aimagelab 's collections 8

CHAIR-DPO

[BMVC 2025] Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization - Models and Datasets

Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization

Paper • 2508.20181 • Published Aug 27
aimagelab/CHAIR-DPO_LLaVA-MORE-8B

8B • Updated Nov 22 • 6
aimagelab/CHAIR-DPO_LLaVA-1.5-7B

7B • Updated Nov 22 • 5
aimagelab/CHAIR-DPO_preference_datasets

Viewer • Updated 26 days ago • 1.39M • 42

ReT-2

Models and data for the paper "Recurrence Meets Transformers for Universal Multimodal Retrieval" (arXiv 2509.08897)

aimagelab/ReT2-M2KR-CLIP-ViT-B

Visual Document Retrieval • 0.2B • Updated Sep 12 • 13 • 1
aimagelab/ReT2-M2KR-CLIP-ViT-L

Visual Document Retrieval • 0.4B • Updated Sep 12 • 12
aimagelab/ReT2-M2KR-SigLIP2-ViT-L

Visual Document Retrieval • 0.9B • Updated Sep 12 • 14 • 1
aimagelab/ReT2-M2KR-ColBERT-CLIP-ViT-L

Visual Document Retrieval • 0.4B • Updated Sep 12 • 10

ReflectiVA

Models and data for ReflectiVA: Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering [CVPR 2025]

aimagelab/ReflectiVA

Image-Text-to-Text • 8B • Updated Apr 5 • 104 • 2
aimagelab/ReflectiVA-Data

Preview • Updated Apr 5 • 137
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

Paper • 2411.16863 • Published Nov 25, 2024

Safe-CLIP

Models and dataset of Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models (https://arxiv.org/abs/2311.16254) [ECCV 2024]

aimagelab/ViSU-Text

Viewer • Updated Nov 29, 2024 • 169k • 96 • 10
aimagelab/safeclip_vit-l_14

Text-to-Image • 0.4B • Updated Jul 15, 2024 • 515 • 3
aimagelab/safeclip_vit-l_14_336

Text-to-Image • 0.4B • Updated Jul 11, 2024 • 14
aimagelab/safeclip_vit-h_14

Text-to-Image • 1.0B • Updated Jul 11, 2024 • 23

DICE

Models for What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models [ICCV 2025]

aimagelab/DICE_coherence_Idefics

Updated Oct 9
aimagelab/DICE_differencedet_Idefics

Updated Oct 9

RAID

A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors

aimagelab/RAID_ckpt

Updated May 13
aimagelab/RAID

Preview • Updated Jul 24 • 2.48k • 2

ReT

Models and data for ReT: Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval [CVPR 2025]

aimagelab/ReT-M2KR

Preview • Updated 11 days ago • 694 • 2
aimagelab/ReT-CLIP-ViT-L-14

Visual Document Retrieval • 0.5B • Updated Apr 1 • 92
aimagelab/ReT-OpenCLIP-ViT-H-14

Visual Document Retrieval • 1B • Updated Apr 8 • 17 • 1
aimagelab/ReT-OpenCLIP-ViT-G-14

Visual Document Retrieval • 3B • Updated Apr 8 • 15 • 1

LLaVA-MORE

LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning

aimagelab/LLaVA_MORE-llama_3_1-8B-pretrain

Image-Text-to-Text • Updated Apr 24 • 20
aimagelab/LLaVA_MORE-llama_3_1-8B-finetuning

Image-Text-to-Text • 8B • Updated Aug 2 • 171 • 11
aimagelab/LLaVA_MORE-llama_3_1-8B-siglip-pretrain

Image-Text-to-Text • Updated Apr 24 • 10
aimagelab/LLaVA_MORE-llama_3_1-8B-siglip-finetuning

Image-Text-to-Text • 8B • Updated Apr 24 • 13 • 1

CHAIR-DPO

[BMVC 2025] Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization - Models and Datasets

Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization

Paper • 2508.20181 • Published Aug 27
aimagelab/CHAIR-DPO_LLaVA-MORE-8B

8B • Updated Nov 22 • 6
aimagelab/CHAIR-DPO_LLaVA-1.5-7B

7B • Updated Nov 22 • 5
aimagelab/CHAIR-DPO_preference_datasets

Viewer • Updated 26 days ago • 1.39M • 42

DICE

Models for What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models [ICCV 2025]

aimagelab/DICE_coherence_Idefics

Updated Oct 9
aimagelab/DICE_differencedet_Idefics

Updated Oct 9

ReT-2

Models and data for the paper "Recurrence Meets Transformers for Universal Multimodal Retrieval" (arXiv 2509.08897)

aimagelab/ReT2-M2KR-CLIP-ViT-B

Visual Document Retrieval • 0.2B • Updated Sep 12 • 13 • 1
aimagelab/ReT2-M2KR-CLIP-ViT-L

Visual Document Retrieval • 0.4B • Updated Sep 12 • 12
aimagelab/ReT2-M2KR-SigLIP2-ViT-L

Visual Document Retrieval • 0.9B • Updated Sep 12 • 14 • 1
aimagelab/ReT2-M2KR-ColBERT-CLIP-ViT-L

Visual Document Retrieval • 0.4B • Updated Sep 12 • 10

RAID

A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors

aimagelab/RAID_ckpt

Updated May 13
aimagelab/RAID

Preview • Updated Jul 24 • 2.48k • 2

ReflectiVA

Models and data for ReflectiVA: Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering [CVPR 2025]

aimagelab/ReflectiVA

Image-Text-to-Text • 8B • Updated Apr 5 • 104 • 2
aimagelab/ReflectiVA-Data

Preview • Updated Apr 5 • 137
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

Paper • 2411.16863 • Published Nov 25, 2024

ReT

Models and data for ReT: Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval [CVPR 2025]

aimagelab/ReT-M2KR

Preview • Updated 11 days ago • 694 • 2
aimagelab/ReT-CLIP-ViT-L-14

Visual Document Retrieval • 0.5B • Updated Apr 1 • 92
aimagelab/ReT-OpenCLIP-ViT-H-14

Visual Document Retrieval • 1B • Updated Apr 8 • 17 • 1
aimagelab/ReT-OpenCLIP-ViT-G-14

Visual Document Retrieval • 3B • Updated Apr 8 • 15 • 1

Safe-CLIP

Models and dataset of Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models (https://arxiv.org/abs/2311.16254) [ECCV 2024]

aimagelab/ViSU-Text

Viewer • Updated Nov 29, 2024 • 169k • 96 • 10
aimagelab/safeclip_vit-l_14

Text-to-Image • 0.4B • Updated Jul 15, 2024 • 515 • 3
aimagelab/safeclip_vit-l_14_336

Text-to-Image • 0.4B • Updated Jul 11, 2024 • 14
aimagelab/safeclip_vit-h_14

Text-to-Image • 1.0B • Updated Jul 11, 2024 • 23

LLaVA-MORE

LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning

aimagelab/LLaVA_MORE-llama_3_1-8B-pretrain

Image-Text-to-Text • Updated Apr 24 • 20
aimagelab/LLaVA_MORE-llama_3_1-8B-finetuning

Image-Text-to-Text • 8B • Updated Aug 2 • 171 • 11
aimagelab/LLaVA_MORE-llama_3_1-8B-siglip-pretrain

Image-Text-to-Text • Updated Apr 24 • 10
aimagelab/LLaVA_MORE-llama_3_1-8B-siglip-finetuning

Image-Text-to-Text • 8B • Updated Apr 24 • 13 • 1

AI & ML interests

Recent Activity

Team members 13

aimagelab 's collections 8