Tagging the Intangible: Authenticating the Age of AI Authorship
As AI systems become more skilled at producing human-like content, examining the similarities between physical taggants and digital watermarking provides insights into the complexity of the issue.
On December 21, 1988, the world witnessed a tragedy that shook it. Pan Am Flight 103, a Boeing 747 carrying 259 passengers and crew, had departed London's Heathrow Airport bound for New York City. But just 38 minutes into its journey, a bomb exploded, ripping the aircraft apart in the skies over the Scottish town of Lockerbie. The devastating terror attack left no survivors, and the staggering loss of life sent shockwaves of grief and horror rippling across the globe.
In the wake of this horrific act, the International Civil Aviation Organization (ICAO) recognized the urgent need to prevent such untraceable attacks from occurring again. Their solution? The introduction of taggant technology into the manufacturing process of explosives themselves. Taggants are unique chemical, isotopic, or microparticle markers embedded within bulk explosives, acting as covert forensic fingerprints. Their purpose is to allow law enforcement to trace any explosive material back to its source, rendering anonymity impossible. Switzerland became the first nation to mandate taggants in explosives produced within its borders.
The concept behind taggant technology draws striking parallels to the anti-counterfeiting measures employed in currency printing, where hidden markers or watermarks are embedded to verify authenticity. Both initiatives share a common goal: to provide a means of detection and attribution, deterring illicit activities.
In the rapidly evolving field of artificial intelligence (AI), a parallel effort has emerged to detect and attribute content generated by language models and other AI systems. As AI capabilities advance, concerns have arisen about the potential misuse of these technologies, from generating misinformation to enabling academic dishonesty. Just as the aviation industry sought to address the threats posed by untraceable explosives, there is a growing push to develop methods for identifying AI-generated content.
This article will delve into the parallels between taggant technology in explosives and the efforts to "watermark" AI-generated content, exploring the various approaches and challenges.
Forensic Fingerprints for AI Images
On March 24, 2023, an image depicting Pope Francis wearing an oversized white puffer coat took the internet by storm after being shared on Reddit. The highly realistic-looking image had been created by Midjourney, an AI system that generates images from text prompts. While intended as an experiment, the photo-like render of the Pope in such an unusual outfit quickly went viral, fooling many into believing it was an actual photograph.
The "Pope in a Puffer Coat" incident highlighted the potential for AI-generated media to spread misinformation and misleading content. As Ryan Broderick, a web culture expert, described it, "the first real mass-level AI misinformation case." The controversy crystallized growing concerns around the need for robust methods to authenticate and attribute AI-generated content, much like the call for taggant technology in explosives after the Lockerbie bombing.
In response, AI companies have begun exploring techniques akin to digital watermarking to encode attribution data into their synthetic media outputs. For instance, OpenAI's DALL-E 3 model embeds a cryptographically signed metadata marker called C2PA (Content's Concordance to Allowable-Use Policy) into the images it generates. This metadata is intended to allow verification that the image originated from DALL-E and follows OpenAI's usage policies.
However, the C2PA metadata can be easily stripped from image files or bypassed by simply taking a screenshot of the AI-generated picture. Efforts by other companies like Meta to add visible watermarks to their text-to-image outputs suffer from similar shortcomings, as these visible marks can be cropped or edited out. In August 2023, DeepMind unveiled its SynthID approach - a technique to embed an imperceptible digital watermark directly into the image data produced by Google's Imagen AI model. While more robust against simple removal, SynthID and similar approaches still face the possibility of being defeated.
The Authenticity Paradox: Embracing AI's Authorship
Images have the advantage of being binary data, making the forensic fingerprinting of AI-generated visuals a more tractable challenge compared to detecting synthetic text. As language models advance, differentiating between human-authored and AI-generated text content grows increasingly difficult. The best approach thus far has been to statistically analyze the "perplexity" (randomness) and "burstiness" (variance) of the text, as AI outputs tend to exhibit an unnaturally consistent and flawless quality compared to human writing.
Yet, these detection methods are far from foolproof. As language models are further trained on more diverse data, their outputs will inevitably become harder to distinguish from human-authored text. In January 2023, OpenAI launched a classifier designed to differentiate between text created by humans and AI, including outputs from their own models like ChatGPT. However, by July 20, 2023, this tool had been discontinued due to its low accuracy rates, highlighting the challenges of reliable detection.
The rapid proliferation of open-source language models further complicates the issue. This predicament mirrors the challenges faced by taggant technology for explosives. In 1998, a National Research Council study found the taggant effort to be largely wasteful and counterproductive. Not only would the best tagging approaches fail to work on all types of explosives, including homemade varieties like the ammonium nitrate-based bomb used in the 1995 Oklahoma City bombing, but taggants would also create unnecessary waste for the vast majority of explosives used for legitimate purposes in mining, construction, and other industries.
Perhaps the way forward is to embrace the inevitability of AI authorship, much as we have accepted the integration of spell-checkers and other writing aids into our creative processes. Rather than futilely attempting to separate human and machine outputs, the emphasis should shift toward the substance and value of ideas, not their origins.