In a fun and poetic experiment, researchers were able to extract training data by asking #chatGPT to repeat “poem” forever. It turned out that at some point the chatbot enters a weird state where the most likely continuation is entire training set examples.
The exploit seems fixed by now but the vulnerability remains: models may and will reveal training data in unexpected ways.
This may not be a big issue when training foundation models on public data but it’s another story when fine-tuning on private data.
The best way to address the vulnerability remains during the fine-tuning phase because once the information made it to the weights, it seems all but impossible to prevent it from leaking.
If you’re looking to fine-tuning with private data, #differentialprivacy is definitely your best friend!
Huge congrats to the authors!
Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramèr, Katherine Lee.