Hidden Threats in Language Models: When Small Changes Add Up

Researchers have discovered a method to bypass safety protocols in large language models by subtly combining seemingly harmless modifications, exposing a new vulnerability in the rapidly evolving AI supply chain.

![The presented results demonstrate performance gains-though inherently optimistic due to the lack of formal inclusion proofs within the codebase-across varied core counts, with instances of equality between 1-core and 32-core values consolidated for clarity, as observed with a dataset size of [latex]n=2^{16}[/latex].](https://arxiv.org/html/2603.12990v1/x8.png)
