Hidden Threats in Language Models: When Small Changes Add Up

Researchers have discovered a method to bypass safety protocols in large language models by subtly combining seemingly harmless modifications, exposing a new vulnerability in the rapidly evolving AI supply chain.
![The presented results demonstrate performance gains-though inherently optimistic due to the lack of formal inclusion proofs within the codebase-across varied core counts, with instances of equality between 1-core and 32-core values consolidated for clarity, as observed with a dataset size of [latex]n=2^{16}[/latex].](https://arxiv.org/html/2603.12990v1/x8.png)
![The study details a comparison of the [latex]D^{\*}[/latex] form factor [latex]F_3[/latex] near anomalous thresholds, demonstrating how accounting for mass differences between charged and neutral heavy-light mesons, alongside consideration of both the isospin limit and an explicit isovector linear combination-as defined in Eq. (3)-affects the real and imaginary components of the form factor.](https://arxiv.org/html/2603.11154v1/x29.png)