Recently, the research work "Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models" by the team of Prof. Shanqing Guo and Prof. Zheng Li from the School of Cyber Science and Technology has been accepted by IEEE Symposium on Security and Privacy, a top-tier conference in network and system security. PhD student Yingkai Dong is the first author, and Shandong University is the sole corresponding author affiliation.
Text-to-image (T2I) generative models have revolutionized content creation by transforming textual descriptions into high-quality images. However, these models are vulnerable to jailbreaking attacks, where carefully crafted prompts bypass safety mechanisms to produce unsafe content. While researchers have developed various jailbreak attacks to expose this risk, these methods face significant limitations, including impractical access requirements, easily detectable unnatural prompts, restricted search spaces, and high query demands on the target system.
In this paper, the research team proposes JailFuzzer, a novel fuzzing framework driven by large language model (LLM) agents, designed to efficiently generate natural and semantically meaningful jailbreak prompts in a black-box setting. Specifically, JailFuzzer employs fuzz-testing principles with three components: a seed pool for initial and jailbreak prompts, a guided mutation engine for generating meaningful variations, and an oracle function to evaluate jailbreak success. Furthermore, they construct the guided mutation engine and oracle function by LLM-based agents, which further ensures efficiency and adaptability in black-box settings.
Extensive experiments demonstrate that JailFuzzer has significant advantages in jailbreaking T2I models. It generates natural and semantically coherent prompts, reducing the likelihood of detection by traditional defenses. Additionally, it achieves a high success rate in jailbreak attacks with minimal query overhead, outperforming existing methods across all key metrics.
This study underscores the need for stronger safety mechanisms in generative models and provides a foundation for future research on defending against sophisticated jailbreaking attacks.
About IEEE Symposium on Security and Privacy
The IEEE Symposium on Security and Privacy (IEEE S&P) is also known as Oakland (named after its original location in Oakland, California). Organized by the IEEE Computer Society, it is recognized as a flagship conference in cybersecurity, alongside other top-tier venues like USENIX Security, ACM CCS, and NDSS. Its acceptance rate is highly competitive, typically around 10-15%, making it one of the hardest conferences to get into in the cybersecurity field.
By Yingkai Dong and Anqi Xiao
Copyright: School of Cyber Science and Technology, Shandong University