Cracking the Code: AI Watermarks Face New Threats

Author: Denis Avetisyan

A new analysis reveals vulnerabilities in the pseudorandom error-correcting codes used to protect AI-generated content, potentially undermining efforts to track its origin.

This review details the first successful cryptanalysis of pseudorandom error-correcting codes and proposes defenses against meet-in-the-middle and noise overlay attacks.

Despite the promise of robust content authentication, current pseudorandom error-correcting codes (PRC)-a novel cryptographic primitive proposed for watermarking AI-generated content-lack rigorous security analysis, particularly against concrete cryptographic attacks. This paper, ‘Cryptanalysis of Pseudorandom Error-Correcting Codes’, presents the first comprehensive cryptanalysis of PRC, revealing vulnerabilities in its undetectability and decoding processes across all parameter configurations, including successful attacks against real-world generative models like DeepSeek and Stable Diffusion. Our analysis demonstrates that existing PRC implementations fall short of 128-bit security, even with proposed mitigations, due to inherent limitations within large generative models, and we identify a practical attack capable of detecting watermarks with overwhelming probability at a cost of $2^{22}$ operations. Can future research overcome these fundamental constraints and achieve truly secure watermarking for the rapidly evolving landscape of AI-generated content?

The Illusion of Authenticity: Why We Need Watermarks Now

The exponential growth of AI-Generated Content (AIGC) – encompassing text from Large Language Models (LLMs) and images from Generative Image Models (GIMs) – presents a critical challenge to establishing content authenticity and provenance. As these models become increasingly sophisticated and accessible, discerning between human-created and machine-generated work becomes exceptionally difficult, fostering opportunities for misinformation, plagiarism, and malicious use. Consequently, robust authentication methods are no longer simply desirable, but essential for maintaining trust in digital information ecosystems. Without verifiable signals embedded within AIGC, attributing authorship, detecting manipulation, and safeguarding intellectual property rights become significantly compromised, demanding innovative solutions to address this rapidly evolving landscape.

Existing methods for content authentication falter when applied to the outputs of increasingly sophisticated AI models. Historically, techniques like digital signatures or forensic analysis of compression artifacts relied on identifying the source of a piece of media, but generative AI fundamentally alters this premise-the ‘source’ isn’t a creator, but an algorithm. Furthermore, the ability of these models to subtly alter and remix existing content introduces new challenges; manipulations can be nearly imperceptible, rendering traditional forensic techniques ineffective. Detecting even intentional tampering becomes difficult, as the line between original creation and algorithmic modification blurs. This rapid evolution necessitates new approaches that move beyond identifying origins and instead focus on verifying the integrity and authenticity of content itself, regardless of how it was produced.

As artificially generated content becomes increasingly pervasive, establishing authenticity and detecting tampering presents a significant challenge. While watermarking emerges as a viable solution, its effective implementation demands cryptographic tools specifically suited to the unique characteristics of AI-generated content. Traditional watermarking techniques often prove fragile when applied to outputs from large language and generative image models, either easily removed by even minor modifications or disrupting the quality of the generated content. Consequently, research focuses on developing cryptographic primitives that balance robust security – ensuring the watermark remains detectable even after manipulation – with practical considerations like computational efficiency and minimal impact on the perceptual quality of the generated output. This necessitates a move beyond simple embedding techniques toward more sophisticated methods capable of withstanding adversarial attacks and preserving the integrity of the AIGC itself.

A robust approach to authenticating AI-Generated Content (AIGC) centers on the application of Pseudorandom Error-Correcting Codes (PRC). These codes allow for the embedding of a verifiable signal – a digital watermark – directly within the generated output, whether text or image. Unlike traditional watermarking techniques, PRC leverages cryptographic principles to ensure the signal is both resistant to manipulation and detectable even after typical processing or compression. The core innovation lies in the code’s ability to not only conceal the watermark but also to correct for minor distortions introduced during generation or transmission, effectively guaranteeing its presence as long as the alterations remain within defined parameters. This error-correcting capability is crucial, as AIGC is often subject to further modification and distribution. By strategically introducing controlled “errors” during embedding, the system can verify the signal’s authenticity with a high degree of confidence, providing a pathway to trace the origin and integrity of AI-generated works.

Hiding in Plain Sight: The Logic of Private Retrieval Codes

Private Retrieval Codes (PRC) utilize error-correcting code principles to conceal a message within Artificial Intelligence Generated Content (AIGC). This is achieved by encoding the hidden message as a codeword, a modified version of the original content, using a public key. The encoding process introduces subtle alterations designed to be statistically indistinguishable from natural variations within the AIGC. Without knowledge of the corresponding secret key, these alterations appear as random noise, rendering the embedded message imperceptible and preventing unauthorized retrieval. The robustness of the code ensures the message remains recoverable even with moderate distortions or noise introduced to the AIGC, while undetectability prevents its presence from being statistically identified.

The encoding process within Private Retrieval Codes (PRC) utilizes a public key to transform the original content – text, image data, or other digital assets – into a codeword. This transformation is achieved through the application of an error-correcting code, specifically designed to introduce redundancy without significantly altering the perceptible characteristics of the content. The public key defines the specific encoding scheme and parameters, determining how the original data is modified to create the codeword. The resulting codeword is then the version of the content that is stored or transmitted, containing the hidden message embedded within its redundant structure. The size of the codeword is generally larger than the original content due to the added redundancy necessary for both embedding and later retrieval of the hidden message.

Successful decoding within Private Retrieval of Content (PRC) relies on the application of the secret key to the received codeword. This process reverses the encoding transformation, accurately reconstructing the original message. The recovered message is then used as verification of authenticity; a correct reconstruction confirms that the content originated from a legitimate source and hasn’t been tampered with during transmission or storage. Any deviation during decoding, resulting in an inaccurate reconstruction, indicates either a compromised secret key or manipulation of the codeword itself, thus failing the authenticity check.

The security of Private Retrieval Codes (PRC) relies on two key properties: robustness, which ensures the hidden message remains recoverable even with significant alterations to the AIGC, and undetectability, preventing an attacker from determining if a message is even present. Theoretical limits suggest a maximum achievable security level of 128 bits, representing the computational effort required to break the encoding. However, practical implementations of PRC currently achieve security levels below this threshold due to factors like code optimization and the specific error-correcting codes employed, necessitating ongoing research to approach this theoretical maximum and maintain a high degree of security against adversarial attacks.

Probing the Cracks: Attacks That Undermine PRC Security

Attack-I employs the Meet-in-the-Middle (MITM) technique as a method for partial key recovery in Private Retrieval Codes (PRC). This attack differentiates PRC-encoded codewords from standard vectors by evaluating a reduced search space. Specifically, the MITM approach reduces the computational complexity required to distinguish between valid codewords and random data to less than $2^{128}$ bits. This complexity level indicates a potential vulnerability, as it falls within the realm of feasibility for determined attackers with sufficient computational resources. The success of Attack-I relies on the attacker’s ability to efficiently search for a matching key segment in both forward and reverse computations, thereby narrowing down the possible key space.

Attack-II targets weaknesses within the KeyGen algorithm’s implementation, rather than the cryptographic principles of the PRC scheme itself. Testing demonstrated a 100% success rate in recovering the key when employing a threshold of $t=3$, meaning three parity bits were used for error correction. With a threshold of $t=4$, the success rate decreased to 63%, but still represents a significant vulnerability. This indicates a practical exploit is possible, as key recovery is achievable with a high probability, potentially compromising the security of codewords generated using the flawed KeyGen implementation.

Attack-III compromises the robustness property of the PRC scheme by intentionally injecting malicious noise that surpasses the decoding threshold, effectively breaking the embedded watermark. This attack operates with a computational complexity of less than 73 bits, indicating a feasible threat model. The injected noise is carefully constructed to manipulate the syndrome decoding process, allowing an adversary to bypass the PRC’s defenses without requiring knowledge of the secret key. Successful execution of Attack-III demonstrates a failure in the PRC’s ability to reliably recover the watermark in the presence of adversarial perturbations, highlighting a vulnerability in its security guarantees.

Syndrome decoding, employed in Attack-III, functions by leveraging the properties of linear block codes to recover the added noise vector. Specifically, given a received codeword that contains introduced noise, syndrome decoding calculates the syndrome – a vector representing the difference between the received codeword and the valid codewords of the PRC. This syndrome directly reveals information about the noise vector’s weight and composition. By efficiently calculating and analyzing the syndrome, the attack reconstructs the noise vector with a complexity of less than 73 bits, effectively bypassing the PRC’s error correction capabilities and allowing the malicious watermark to be successfully extracted. The technique relies on the inherent linear structure of the PRC code and the ability to solve a system of linear equations to determine the original noise.

Bolstering the Defenses: Towards More Resilient Watermarking

The generation of truly random secret keys is paramount in robust Private Retrieval of Content (PRC) systems, and recent advancements demonstrate that incorporating Gaussian Elimination into the KeyGen algorithm substantially bolsters this security. This technique, a cornerstone of linear algebra, refines the system of equations used to create the secret key, effectively mitigating vulnerabilities exploited by attackers attempting to deduce the key through algebraic manipulation. By systematically reducing the equations to row echelon form, Gaussian Elimination minimizes the number of potential solutions, thereby decreasing the probability of successful key recovery. This improved key generation process directly addresses weaknesses observed in prior implementations, offering a more resilient defense against attacks that rely on identifying patterns or redundancies within the key space and ultimately strengthening the overall security of the PRC system against unauthorized data access.

The efficacy of perceptual robust coding (PRC) hinges on a delicate balance between watermark embedding and resilience against malicious attacks; specifically, understanding the limits of noise tolerance in scenarios like Attack-III is paramount. Current decoding parameters operate under assumptions about the maximum permissible noise level, yet a comprehensive investigation into these noise rate thresholds remains incomplete. Determining at which point the watermark signal becomes irrecoverably corrupted by increasing noise would allow developers to establish more robust decoding algorithms, effectively widening the margin for error and bolstering the system’s resistance to manipulation. This research necessitates a detailed analysis of how varying noise levels impact watermark detection rates, potentially requiring the development of adaptive decoding strategies that dynamically adjust to the prevailing noise conditions and ensuring reliable watermark recovery even in heavily distorted signals.

Analysis of the Gaussian Integer Matrix (GIM) key generation process revealed a concerning vulnerability: a significant frequency of weak keys. Specifically, Attack-II consistently identified an average of 14.86 duplicated rows within the generated matrices. This indicates a substantial likelihood of non-unique keys being produced, thereby compromising the security of the overall Private Retrieval of Content (PRC) system. The presence of these duplicated rows suggests a weakness in the algorithm’s ability to generate truly random and distinct keys, potentially allowing an adversary to more easily recover the secret information. Addressing this issue is critical for strengthening the PRC’s security and preventing unauthorized access to sensitive data.

A thorough comprehension of potential attacks against Private Retrieval of Content (PRC) systems is paramount for building truly secure implementations. Investigations into weaknesses like those demonstrated by Attack-II and Attack-III reveal specific vulnerabilities in key generation and decoding processes, respectively. By dissecting how these attacks function – whether through identifying weak keys or exploiting noise thresholds – developers gain the insight necessary to proactively fortify their systems. This understanding allows for the implementation of countermeasures, such as improved key generation algorithms incorporating techniques like Gaussian elimination, and the establishment of robust decoding parameters resilient to adversarial noise. Ultimately, a security-focused design, informed by attack analysis, is not merely about patching vulnerabilities, but about building a fundamentally more secure foundation for private content retrieval.

The pursuit of elegant watermarking schemes, as explored in the cryptanalysis of Pseudorandom Error-Correcting Codes, invariably runs headfirst into the brick wall of production realities. This paper meticulously details how even theoretically sound constructions become vulnerable to attacks like the Meet-in-the-Middle, demonstrating the inevitable decay of even the most promising systems. As Edsger W. Dijkstra observed, “Simplicity is prerequisite for reliability.” The complexity introduced by attempting to hide watermarks within generative image models, while conceptually neat, provides more avenues for exploitation. It is a grim reminder that anything promising to simplify life adds another layer of abstraction – and thus, another point of failure. CI is, after all, our temple – and we pray nothing breaks.

What’s Next?

The demonstrated weaknesses in Pseudorandom Error-Correcting Codes are, predictably, not a refutation of the underlying principle, but a confirmation of production’s uncanny ability to discover failure modes. Any system touted as ‘robust’ against adversarial attacks hasn’t truly met an adversary worth mentioning. The current mitigations proposed offer a temporary reprieve, shifting the cost of attack rather than eliminating it. One suspects a continuing arms race, each layer of ‘watermarking’ becoming another layer of brittle complexity.

The real challenge isn’t creating codes that seem uncrackable, it’s accepting that perfect provenance is an illusion. The field will likely bifurcate: one branch chasing increasingly elaborate cryptographic schemes, the other grappling with the social and legal implications of imperfect detection. The latter seems the more fruitful, if less glamorous, pursuit. After all, a slightly leaky watermark is preferable to a system that confidently declares forgery when merely encountering a JPEG artifact.

It’s also worth noting that this analysis focused on the code itself. The larger system – the generation model, the distribution network, the end user – represents a far more porous attack surface. The next breakthrough won’t be a clever cryptanalytic technique, but a mundane observation about how people actually use these tools. Better one well-understood, auditable pipeline than a hundred opaque, ‘scalable’ microservices promising the impossible.

Original article: https://arxiv.org/pdf/2512.17310.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Illusion of Authenticity: Why We Need Watermarks Now

Hiding in Plain Sight: The Logic of Private Retrieval Codes

Probing the Cracks: Attacks That Undermine PRC Security

Bolstering the Defenses: Towards More Resilient Watermarking

What’s Next?

See also: