Cryptographers Show That AI Protections Will Always Have Holes

www.quantamagazine.org

Cryptographers Show That AI Protections Will Always Have Holes

www.quantamagazine.org

rssMB to

Quanta MagazineEnglish · 2 months ago

Cryptographers Show That AI Protections Will Always Have Holes | Quanta Magazine

www.quantamagazine.org

Large language models such as ChatGPT come with filters to keep certain info from getting out. A new mathematical argument shows that systems like this can never be completely safe.

Ask ChatGPT how to build a bomb, and it will flatly respond that it “can’t help with that.” But users have long played a cat-and-mouse game to try to trick language models into providing forbidden information. These “jailbreaks” have run from the mundane — in the early years, one could simply tell a model to ignore its safety instructions — to elaborate multi-prompt roleplay scenarios.

Source

From Quanta Magazine via this RSS feed

You must log in or # to comment.

Chat