[BMVC 2025] Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization - Models and Datasets
AI & ML interests
None defined yet.
Recent Activity
Models and data for the paper "Recurrence Meets Transformers for Universal Multimodal Retrieval" (arXiv 2509.08897)
-
aimagelab/ReT2-M2KR-CLIP-ViT-B
Visual Document Retrieval • 0.2B • Updated • 13 • 1 -
aimagelab/ReT2-M2KR-CLIP-ViT-L
Visual Document Retrieval • 0.4B • Updated • 12 -
aimagelab/ReT2-M2KR-SigLIP2-ViT-L
Visual Document Retrieval • 0.9B • Updated • 14 • 1 -
aimagelab/ReT2-M2KR-ColBERT-CLIP-ViT-L
Visual Document Retrieval • 0.4B • Updated • 10
Models and data for ReflectiVA: Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering [CVPR 2025]
Models and dataset of Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models (https://arxiv.org/abs/2311.16254) [ECCV 2024]
Models for What Changed? Detecting and Evaluating Instruction-Guided Image Edits
with Multimodal Large Language Models [ICCV 2025]
A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors
Models and data for ReT: Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval [CVPR 2025]
LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones
for Enhanced Visual Instruction Tuning
-
aimagelab/LLaVA_MORE-llama_3_1-8B-pretrain
Image-Text-to-Text • Updated • 20 -
aimagelab/LLaVA_MORE-llama_3_1-8B-finetuning
Image-Text-to-Text • 8B • Updated • 171 • 11 -
aimagelab/LLaVA_MORE-llama_3_1-8B-siglip-pretrain
Image-Text-to-Text • Updated • 10 -
aimagelab/LLaVA_MORE-llama_3_1-8B-siglip-finetuning
Image-Text-to-Text • 8B • Updated • 13 • 1
[BMVC 2025] Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization - Models and Datasets
Models for What Changed? Detecting and Evaluating Instruction-Guided Image Edits
with Multimodal Large Language Models [ICCV 2025]
Models and data for the paper "Recurrence Meets Transformers for Universal Multimodal Retrieval" (arXiv 2509.08897)
-
aimagelab/ReT2-M2KR-CLIP-ViT-B
Visual Document Retrieval • 0.2B • Updated • 13 • 1 -
aimagelab/ReT2-M2KR-CLIP-ViT-L
Visual Document Retrieval • 0.4B • Updated • 12 -
aimagelab/ReT2-M2KR-SigLIP2-ViT-L
Visual Document Retrieval • 0.9B • Updated • 14 • 1 -
aimagelab/ReT2-M2KR-ColBERT-CLIP-ViT-L
Visual Document Retrieval • 0.4B • Updated • 10
A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors
Models and data for ReflectiVA: Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering [CVPR 2025]
Models and data for ReT: Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval [CVPR 2025]
Models and dataset of Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models (https://arxiv.org/abs/2311.16254) [ECCV 2024]
LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones
for Enhanced Visual Instruction Tuning
-
aimagelab/LLaVA_MORE-llama_3_1-8B-pretrain
Image-Text-to-Text • Updated • 20 -
aimagelab/LLaVA_MORE-llama_3_1-8B-finetuning
Image-Text-to-Text • 8B • Updated • 171 • 11 -
aimagelab/LLaVA_MORE-llama_3_1-8B-siglip-pretrain
Image-Text-to-Text • Updated • 10 -
aimagelab/LLaVA_MORE-llama_3_1-8B-siglip-finetuning
Image-Text-to-Text • 8B • Updated • 13 • 1