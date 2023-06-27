Training AI systems to perform specific tasks accurately and reliably requires an incredible amount of data. In order to receive them, many companies pay so-called gigworkers. These are contract workers on platforms like Mechanical Turk who perform tasks that are difficult to automate. This includes solving captchas, labeling data and annotating text and images. Such data is then fed into AI models to train them.

Advertisement

However, the gig workers are poorly paid and often have to complete many tasks very quickly. No wonder some make do with AIs like ChatGPT to maximize their earning potential. But how many do that? To find out, a team of researchers from the Swiss Federal Institute of Technology (EPFL) hired 44 people on the Amazon Mechanical Turk gigwork platform to compile 16 excerpts from medical research papers.

Search for characters from ChatGPT

They then analyzed their responses using a self-trained AI model that looks for telltale signals of ChatGPT output, such as a lack of variety in word choice. They also extracted workers’ keystrokes to find out if they had copied and pasted their responses. That would indicate that they didn’t create their answers themselves.

The researchers estimated that between 33 and 46 percent of workers had used AI models like OpenAI’s ChatGPT. That percentage is likely to increase further as ChatGPT and other AI systems become more powerful and more accessible, the authors write in the study, which is available on the preprint server arXiv has been published and is still awaiting peer review.

“I don’t think this means the end of crowdsourcing platforms. It just changes the dynamic,” says Robert West, an assistant professor at EPFL, who co-authored the study. However, using AI-generated data to train AI could inject more errors into already error-prone models.

Errors in AI models are amplifying

Advertisement

Large language models routinely present incorrect information as facts. When such errors are adopted by AI models, they compound over time and their origins become more difficult to trace, says Ilia Shumailov, a junior research fellow in computer science at Oxford University who was not involved in the project.

Even worse, there is no easy solution. “The problem is that when you use artificial data, you take on the errors from the misunderstandings of the models as well as statistical errors,” he says. “You have to make sure that your own errors don’t skew the results of other models, and that’s not an easy thing to do.”

The study underscores the need for new methods to verify whether the data was created by humans or by AI. It also highlights one of the issues arising from tech companies’ tendency to rely on gigworkers to do the vital work of data sanitization for AI systems.

“I don’t think everything will collapse,” says West. “But I think the AI ​​community needs to look closely at which tasks are most likely to be automated and work to prevent that.”

(jl)

To home page

Share this: Twitter

Facebook

