A research team from Switzerland has investigated whether AI chatbots and language models, for example ChatGPT or LLaMA, can also extract the authors’ private information from texts published on the Internet. The short answer: yes. For the investigation, the team from the “Secure, Reliable, and Intelligent Systems Lab” at ETH Zurich first analyzed 520 user profiles from Reddit. The experts gathered personal information such as place of residence, income and gender from comments and contributions. To do this, various pieces of information sometimes had to be linked and sometimes Google had to be consulted. In round two, AI language models should analyze the same Reddit content to see if they reach similar conclusions as the human profilers. The results were impressive (or frightening): In many cases, the models delivered the same results as humans – only they did it much faster. Instead of hours, they needed seconds. The language models were just as capable of combining individual clues as well as deciphering national, local and pop culture references. The results certainly raise concerns: language models could be misused to create detailed profiles of people without their knowledge or consent. Advertising companies, police and secret services could use the technology. And there is currently no effective protection.
AI language models as profilers: shockingly good.
71