M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models
Abstract
A multimodal evaluation framework and robustness enhancement module are introduced to address concept erasure vulnerabilities in text-to-image diffusion models across multiple input modalities.
Text-to-image diffusion models may generate harmful or copyrighted content, motivating research on concept erasure. However, existing approaches primarily focus on erasing concepts from text prompts, overlooking other input modalities that are increasingly critical in real-world applications such as image editing and personalized generation. These modalities can become attack surfaces, where erased concepts re-emerge despite defenses. To bridge this gap, we introduce M-ErasureBench, a novel multimodal evaluation framework that systematically benchmarks concept erasure methods across three input modalities: text prompts, learned embeddings, and inverted latents. For the latter two, we evaluate both white-box and black-box access, yielding five evaluation scenarios. Our analysis shows that existing methods achieve strong erasure performance against text prompts but largely fail under learned embeddings and inverted latents, with Concept Reproduction Rate (CRR) exceeding 90% in the white-box setting. To address these vulnerabilities, we propose IRECE (Inference-time Robustness Enhancement for Concept Erasure), a plug-and-play module that localizes target concepts via cross-attention and perturbs the associated latents during denoising. Experiments demonstrate that IRECE consistently restores robustness, reducing CRR by up to 40% under the most challenging white-box latent inversion scenario, while preserving visual quality. To the best of our knowledge, M-ErasureBench provides the first comprehensive benchmark of concept erasure beyond text prompts. Together with IRECE, our benchmark offers practical safeguards for building more reliable protective generative models.
Community
Concept Erasure Benchmark
arXiv lens breakdown of this paper ๐ https://arxivlens.com/PaperView/Details/m-erasurebench-a-comprehensive-multimodal-evaluation-benchmark-for-concept-erasure-in-diffusion-models-46-5999d64b
- Executive Summary
- Detailed Breakdown
- Practical Applications
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ActErase: A Training-Free Paradigm for Precise Concept Erasure via Activation Patching (2026)
- CGCE: Classifier-Guided Concept Erasure in Generative Models (2025)
- Bi-Erasing: A Bidirectional Framework for Concept Removal in Diffusion Models (2025)
- MINDiff: Mask-Integrated Negative Attention for Controlling Overfitting in Text-to-Image Personalization (2025)
- Now You See It, Now You Don't - Instant Concept Erasure for Safe Text-to-Image and Video Generation (2025)
- CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models (2025)
- Training-Free Generation of Diverse and High-Fidelity Images via Prompt Semantic Space Optimization (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper