It may soon become common to encounter a tweet, essay or news article and wonder if it was written by artificial intelligence software. There could be questions over the authorship of a given piece of writing, like in academic settings, or the veracity of its content, in the case of an article. There could also be questions about authenticity: If a misleading idea suddenly appears in posts across the internet, is it spreading organically, or have the posts been generated by A.I. to create the appearance of real traction?
Tools to identify whether a piece of text was written by A.I. have started to emerge in recent months, including one created by OpenAI, the company behind ChatGPT. That tool uses an A.I. model trained to spot differences between generated and human-written text. When OpenAI tested the tool, it correctly identified A.I. text in only about half of the generated writing samples it analyzed. The company said at the time that it had released the experimental detector “to get feedback on whether imperfect tools like this one are useful.”
Identifying generated text, experts say, is becoming increasingly difficult as software like ChatGPT continues to advance and turns out text that is more convincingly human. OpenAI is now experimenting with a technology that would insert special words into the text that ChatGPT generates, making it easier to detect later. The technique is known as watermarking.
The watermarking method that OpenAI is exploring is similar to one described in a recent paper by researchers at the University of Maryland, said Jan Leike, the head of alignment at OpenAI. Here is how it works.
If someone tried to remove a watermark by editing the text, they would not know which words to change. And even if they managed to change some of the special words, they would most likely only reduce the total percentage by a couple of points.
Tom Goldstein, a professor at the University of Maryland and co-author of the watermarking paper, said a watermark could be detected even from “a very short text fragment,” such as a tweet. By contrast, the detection tool OpenAI released requires a minimum of 1,000 characters.
Like all approaches to detection, however, watermarking is not perfect, Mr. Goldstein said. OpenAI’s current detection tool is trained to identify text generated by 34 different language models, while a watermark detector could only identify text that was produced by a model or chatbot that uses the same list of special words as the detector itself. That means that unless companies in the A.I. field agree on a standard watermark implementation, the method could lead to a future where questionable text must be checked against several different watermark detection tools.
To make watermarking work well every time in a widely used product like ChatGPT, without reducing the quality of its output, would require a lot of engineering, Mr. Goldstein said. Mr. Leike of OpenAI said the company was still researching watermarking as a form of detection, and added that it could complement the current tool, since the two “have different strengths and weaknesses.”
Still, many experts believe a one-stop tool that can reliably detect all A.I. text with total accuracy may be out of reach. That is partly because tools could emerge that could help remove evidence that a piece of text was generated by A.I. And generated text, even if it is watermarked, would be harder to detect in cases where it makes up only a small portion of a larger piece of writing. Experts also say that detection tools, especially those that do not use watermarking, may not recognize generated text if a person has changed it enough.
“I think the idea that there’s going to be a magic tool, either created by the vendor of the model or created by an external third party, that’s going to take away doubt — I don’t think we’re going to have the luxury of living in that world,” said David Cox, the director of the MIT-IBM Watson A.I. Lab.
Sam Altman, the chief executive of OpenAI, shared a similar sentiment in an interview with StrictlyVC last month.
“Fundamentally, I think it’s impossible to make it perfect,” Mr. Altman said. “People will figure out how much of the text they have to change. There will be other things that modify the outputted text.”
Part of the problem, Mr. Cox said, is that detection tools themselves present a conundrum, in that they could make it easier to avoid detection. A person could repeatedly edit generated text and check it against a detection tool until the text is identified as human-written — and that process could potentially be automated. Detection technology, Mr. Cox added, will always be a step behind as new language models emerge, and as existing ones advance.
“This is always going to have an element of an arms race to it,” he said. “It’s always going to be the case that new models will come out and people will develop ways to detect that it’s a fake.”
Some experts believe that OpenAI and other companies building chatbots should come up with solutions for detection before they release A.I. products, rather than after. OpenAI launched ChatGPT at the end of November, for example, but did not release its detection tool until about two months later, at the end of January.
By that time, educators and researchers had already been calling for tools to help them identify generated text. Many signed up to use a new detection tool, GPTZero, which was built by a Princeton University student over his winter break and was released on Jan. 1.
“We’ve heard from an overwhelming number of teachers,” said Edward Tian, the student who built GPTZero. As of mid-February, more than 43,000 teachers had signed up to use the tool, Mr. Tian said.
“Generative A.I. is an incredible technology, but for any new innovation we need to build the safeguards for it to be adopted responsibly, not months or years after the release, but immediately when it is released,” Mr. Tian said.