DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models

DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models

Investigation into detectability of poisoned noise input in backdoored diffusion models, exploring a distribution-based detection mechanism and an evasion strategy for attackers

The paper explores the detectability of poisoned noise input for backdoored diffusion models, a critical yet underexplored aspect in existing works. It systematically analyzes the properties of the trigger pattern in diffusion backdoor attacks and identifies the crucial role of distribution discrepancy in Trojan detection. From the defender's perspective, a low-cost trigger detection mechanism is proposed based on distribution discrepancy, effectively identifying the poisoned input noise. Additionally, a backdoor attack strategy is developed to learn an unnoticeable trigger, evading the proposed detection scheme. The paper's contributions include the exploration of trigger pattern detectability, the proposal of a distribution-based detection mechanism, and the development of a detection-evading attack strategy. Empirical evaluations across various diffusion models and datasets demonstrate the effectiveness of the proposed trigger detection and detection-evading attack strategy.


The diffusion model, a prevalent generative AI technique for content creation and editing, has become widely adopted across various data modalities. However, the security challenges of diffusion models under backdoor attacks are still underexplored, despite their extensive use in real-world applications. The paper addresses this research gap by proposing a systematic study on the detectability of Trojan input for backdoored diffusion models from both attacker and defender perspectives. It analyzes the characteristics of existing fixed trigger patterns and develops a low-cost trigger detection mechanism based on distribution discrepancy. Furthermore, a backdoor attack strategy is proposed to learn a stealthy trigger, evading the proposed detection method and enriching the research on the security of diffusion models.


In summary, the paper provides a comprehensive exploration of the detectability of trigger patterns in backdoored diffusion models, offering insights into the distribution-based detection mechanism and a detection-evading attack strategy. The empirical evaluations demonstrate the effectiveness of these proposed methods across various diffusion models and datasets, highlighting their potential impact on enhancing the security of diffusion models against backdoor attacks.


Comments

  1. None