[Image: Moor Studio / Getty Images]
Concerns in academia about the potential for large language models like GPT to enable plagiarism and cheating have given rise to detectors that claim to identify content generated by artificial intelligence (AI). But it seems that those products come with a flaw. In a peer-reviewed opinion paper, researchers at Stanford University, USA, show that commonly used GPT detectors labeled essays by authors who are not native English speakers as AI-written at a higher rate than those by native speakers (Patterns, doi: 10.1016/j.patter.2023.100779).
“We should be very cautious about using any of these detectors in classroom settings,” said senior author James Zou in a press release accompanying the paper, “because there’s still a lot of biases, and they’re easy to fool with just the minimum amount of prompt design.”
ChatGPT launched in November 2022 and amassed over a million signups in just five days, according to its developer OpenAI. As the chatbot’s popularity grew, however, educators started to worry that it might allow students to cheat, for example, by asking it to write essays for them.
As humans often fail to recognize AI-generated content, GPT detectors started to pop up to help. To assess how well those products work, Zou and colleagues selected seven AI detectors and tested each platform using 91 essays written in English by nonnative speakers for TOEFL—the Test of English as a Foreign Language, a common standardized exam that measures test takers’ English-language ability.
The products flagged more than 50% of the essays as AI-written, including one detector that misidentified them 98% of the time.
The results? The products flagged more than 50% of the essays as AI-written, including one detector that misidentified them 98% of the time. When tested with essays written by American 8th graders, the platforms correctly labeled more than 90% of the writings as human generated.
Fooled by fancy words
The detectors’ algorithms evaluate the given texts’ “perplexity,” or how surprising the word choices are, explained Zou. “If you use common English words, the detectors will give a low perplexity score, meaning [your] essay is likely to be flagged as AI-generated,” he said. “If you use complex and fancier words, then it’s more likely to be classified as human-written by the algorithms.”
This is because, to mimic average human speech, large language models such as GPT are trained to use simple words when generating content. So essays by nonnative English writers, who tend to use more straightforward words, have a higher chance of being flagged as AI-created, the researchers argue.
To prove this point, the researchers plugged the TOEFL essays into ChatGPT and asked it to “enhance the word choices to sound more like that of a native speaker.” When they fed the AI-edited results back into the GPT detectors, the platforms labeled the essays as human-written about 88% of the time.
Recommendations for the future
Given their findings, the researchers write that they “strongly caution” against using the AI detectors in evaluative or educational environment
Given their findings, the researchers write that they “strongly caution” against using the AI detectors in evaluative or educational environment, especially when nonnative English speakers’ works are involved, as the detectors could perpetuate the existing biases against those individuals.
Zou and colleagues also argue that future detectors should move away from relying solely on text perplexity. More advanced tools, such as second-order perplexity methods and watermarking techniques, could help improve their accuracy and reliability, the researchers add.