LLMs hypnosis // models can be hacked to reveal training data

2 min readDec 2, 2023

After #LLMs hallucination, here comes #LLMs hypnosis! 😵‍💫

In a fun and poetic experiment, researchers were able to extract training data by asking #chatGPT to repeat “poem” forever. It turned out that at some point the chatbot enters a weird state where the most likely continuation is entire training set examples.

The exploit seems fixed by now but the vulnerability remains: models may and will reveal training data in unexpected ways.

This may not be a big issue when training foundation models on public data but it’s another story when fine-tuning on private data.

The best way to address the vulnerability remains during the fine-tuning phase because once the information made it to the weights, it seems all but impossible to prevent it from leaking.

If you’re looking to fine-tuning with private data, #differentialprivacy is definitely your best friend!

=> post https://lnkd.in/e8M5d6X3
=> paper: https://lnkd.in/emCp3DJQ

Huge congrats to the authors!
Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramèr, Katherine Lee.

Sarus (YC W22) [link to post]





Tech/biz consulting, analytics, research for founders, startups, corps and govs.