Tools to protect content against use in generative AI training // for creators
Is there anything that can be done with images to protect against models training?
- Nightshade is a tool that transforms images into “poison” samples that cause AI models trained on them to behave unpredictably. It is designed to deter unauthorized scraping and training on copyrighted images.
- Nightshade works by optimizing images to minimally visibly change them for humans, while maximally changing their feature representations inside AI models. This causes models to learn incorrect associations.
- Nightshade is different from Glaze, which is a defensive tool for artists to protect against style mimicry. Nightshade is an offensive tool artists can collectively use against models that scrape art without consent.
- Risks of Nightshade include more visible changes on flat/smooth images, and that defenses against it may be developed over time. However, it can evolve to overcome defenses.
- Goals of Nightshade’s creators are to learn through research and make a positive impact, not profit. Nightshade runs locally and sends no data externally.
- An integrated version of Nightshade and Glaze (WebGlaze) is planned, so both can be applied to artworks for full protection.
- The technical details of how Nightshade works are described in a preprint paper.
Data poisoning attacks manipulate training data to introduce unexpected behaviors into machine learning models at training time. For text-to-image generative models with massive training datasets, current understanding of poisoning attacks suggests that a successful attack would require injecting millions of poison samples into their training pipeline. In this paper, we show that poisoning attacks can be successful on generative models. We observe that training data per concept can be quite limited in these models, making them vulnerable to prompt-specific poisoning attacks, which target a model’s ability to respond to individual prompts. We introduce Nightshade, an optimized prompt-specific poisoning attack where poison samples look visually identical to benign images with matching text prompts. Nightshade poison samples are also optimized for potency and can corrupt an Stable Diffusion SDXL prompt in <100 poison samples. Nightshade poison effects “bleed through” to related concepts, and multiple attacks can composed together in a single prompt. Surprisingly, we show that a moderate number of Nightshade attacks can destabilize general features in a text-to-image generative model, effectively disabling its ability to generate meaningful images. Finally, we propose the use of Nightshade and similar tools as a last defense for content creators against web scrapers that ignore opt-out/do-not-crawl directives, and discuss possible implications for model trainers and content creators.
Here are the key points from the paper:
- The paper introduces prompt-specific poisoning attacks against text-to-image diffusion models. These attacks target the model’s ability to generate images for specific prompts or concepts.
- The attacks exploit the concept sparsity in diffusion model training datasets — each individual concept has relatively few training samples compared to the full dataset size. This allows successful poisoning with a small number of poison samples.
- The paper proposes an optimized attack called Nightshade that uses adversarial perturbations to create poison samples that look natural but teach incorrect associations. Nightshade requires very few (e.g. 100) poison samples for high attack success rates.
- Nightshade exhibits “bleed-through” where the attack impacts related concepts not directly targeted. It is also robust to various defenses and transfers well across models.
- When multiple concepts are attacked, the effects stack and can significantly degrade the overall model. With sufficient attacks, the model loses ability to generate meaningful images.
- The paper suggests Nightshade could incentivize model trainers to respect copyright and IP by providing content owners a tool for “do-not-train” protection.
In summary, the paper demonstrates the feasibility and potential impact of prompt-specific poisoning attacks on generative diffusion models. The optimized Nightshade attack is highly effective and difficult to defend against.
- WebGlaze is a free web service that lets artists apply the Glaze artistic style transfer to their images using only a web browser. It is intended for artists who can’t run the desktop Glaze app.
- Images are uploaded via a web form, processed on cloud servers, and emailed back to the user. No images are stored.
- Access is by invite only to ensure it stays free for human artists. Artists can request an invite by following posted instructions.
- Intensity settings replace the previous CLIP-based error checking. Users now just select “default”, “medium” or “high” intensity.
- There are limits on daily and weekly number of images users can Glaze. Limits may increase over time.
- Common issues like expired invite links and Unicode errors are explained.
- Additional info is provided on the Glaze website, including FAQs, release notes, user guides, and publications.