Editing Massive Concepts in Text-to-Image Diffusion Models

Tianwei Xiong1,2*, Yue Wu3*, Enze Xie4#, Yue Wu4, Zhenguo Li4, Xihui Liu1#,
1The University of Hong Kong, 2Tsinghua University, 3Peking University, 4Huawei Noah's Ark Lab
*Equal contribution. #Corresponding author

Abstract

Text-to-image diffusion models suffer from the risk of generating outdated, copyrighted, incorrect, and biased content. While previous methods have mitigated the issues on a small scale, it is essential to handle them simultaneously in larger-scale real-world scenarios. We propose a two-stage method, Editing Massive Concepts In Diffusion Models(EMCID). The first stage performs memory optimization for each individual concept with dual self-distillation from text alignment loss and diffusion noise prediction loss. The second stage conducts massive concept editing with multi-layer, closed form model editing. We further propose a comprehensive benchmark, named ImageNet Concept Editing Benchmark(ICEB), for evaluating massive concept editing for T2I models with two subtasks, free-form prompts, massive concept categories, and extensive evaluation metrics. Extensive experiments conducted on our proposed benchmark and previous benchmarks demonstrate the superior scalability of EMCID for editing up to 1,000 concepts, providing a practical approach for fast adjustment and re-deployment of T2I diffusion models in real-world applications.

EMCID generally edits source concepts, the concepts intended to be modified, to match destination concepts, the concepts towards which target concepts are to be altered. EMCID can update, forget, rectify, and debias various concepts simultaneously at a large scale.

Method

EMCID performs memory optimization for each individual concept independently in stage Ⅰ. The optimization results are aggregated into closed-form model editing in stage Ⅱ.

Rectify Imprecise Generation

We test the performance of Stable Diffusion v1.4 on ImageNet classes, and look into the phenomenon of imprecise generation for less popular aliases. We also find that the model cannot generate correct images for some classes at all. Our method can rectify the imprecise generation in both cases.


Large-scale Arbitrary Imagenet Concept Editing

For the challenging task of editing at most 300 ImageNet concepts into some other arbitrary destination concepts, EMCID outperforms previous methods by a large margin. Maintaining both the editing success and preservation of other non-edit concepts even for 300 editing scale.

Erasing Artist Styles


Gender Debiasing


BibTeX


@article{xiong2024editing,
      title={Editing Massive Concepts in Text-to-Image Diffusion Models}, 
      author={Tianwei Xiong and Yue Wu and Enze Xie and Yue Wu and Zhenguo Li and Xihui Liu},
      year={2024},
      journal={arXiv preprint arXiv:2403.13807}
}