When State Space Models Forget: A New Attack and Its Defense

Author: Denis Avetisyan

Researchers have discovered a vulnerability in State Space Models where adversarial attacks can cause a ‘spectral radius collapse’, leading to performance degradation, and propose a novel monitoring system to detect and mitigate these attacks.

Under adversarial attack utilizing HiSPA, the Mamba-130M model experiences a precipitous decline in information retention-falling from <span class="katex-eq" data-katex-display="false">\rho(\bar{A}) = 0.98</span> to <span class="katex-eq" data-katex-display="false">0.32</span>-resulting in a <span class="katex-eq" data-katex-display="false">52.5</span> percentage-point accuracy collapse, manifested as a contraction of hidden-state trajectories toward the origin, though this collapse can be mitigated by SpectralGuard intervention. — Under adversarial attack utilizing HiSPA, the Mamba-130M model experiences a precipitous decline in information retention-falling from $\rho(\bar{A}) = 0.98$ to $0.32$ -resulting in a $52.5$ percentage-point accuracy collapse, manifested as a contraction of hidden-state trajectories toward the origin, though this collapse can be mitigated by SpectralGuard intervention.

This paper details SpectralGuard, a defense against memory collapse attacks targeting the spectral radius of State Space Models, offering improved security for recurrent neural networks and control systems.

While State Space Models (SSMs) offer efficient sequence processing, their recurrence mechanisms introduce a critical vulnerability to adversarial manipulation. This paper, ‘SpectralGuard: Detecting Memory Collapse Attacks in State Space Models’, reveals that an adversary can induce a spectral radius collapse-effectively erasing the model’s long-term memory-without triggering conventional output-based alarms. We demonstrate that monitoring the spectral stability of the transition operator provides a robust and real-time defense against these ‘memory collapse’ attacks, achieving high detection accuracy with minimal latency. Could this spectral monitoring approach offer a foundational safety layer for all recurrent foundation models susceptible to hidden state manipulation?

The Allure and Fragility of State Space Models

State Space Models (SSMs) are emerging as a potentially transformative approach to sequence modeling, offering a compelling alternative to the widely adopted Transformer architecture. Unlike Transformers, which rely on attention mechanisms that scale quadratically with sequence length, SSMs leverage the principles of dynamical systems to represent sequences through hidden states that evolve over time. This allows them to process long sequences with greater computational efficiency – a critical advantage for applications dealing with extensive data, such as video processing or genomic analysis. Furthermore, the inherent structure of SSMs facilitates the capture of long-range dependencies, meaning the model can effectively learn relationships between elements that are far apart in a sequence, a task where traditional recurrent neural networks often struggle. By distilling sequential data into a compact state representation, SSMs not only promise faster processing but also a more nuanced understanding of temporal dynamics.

Despite the potential of State Space Models (SSMs) in sequence modeling, research indicates a significant vulnerability to adversarial attacks targeting their internal state. These attacks, carefully crafted to exploit the model’s hidden dynamics, can induce catastrophic performance degradation, even with subtle input perturbations. Recent studies demonstrate this fragility; a model initially achieving 68.4% accuracy can experience a precipitous drop to just 23.1% when subjected to these state manipulation attacks. This susceptibility raises critical concerns about the robustness and security of SSMs, particularly in applications where reliability is paramount, and highlights the need for developing effective defense mechanisms against such vulnerabilities.

Spectral analysis of a hybrid SSM-Attention model (Zamba2-2.7B, N=250) reveals a mean spectral radius of <span class="katex-eq" data-katex-display="false">ar{ho} = 0.294</span> for benign inputs and <span class="katex-eq" data-katex-display="false">ar{ho} = 0.256</span> for adversarial ones, demonstrating that spectral monitoring effectively generalizes beyond state space models with an F1 score of 0.891 at a threshold of <span class="katex-eq" data-katex-display="false">ho_{min} = 0.271</span>. — Spectral analysis of a hybrid SSM-Attention model (Zamba2-2.7B, N=250) reveals a mean spectral radius of $ar{ho} = 0.294$ for benign inputs and $ar{ho} = 0.256$ for adversarial ones, demonstrating that spectral monitoring effectively generalizes beyond state space models with an F1 score of 0.891 at a threshold of $ho_{min} = 0.271$ .

The Spectral Radius: A Model’s Memory Horizon

The spectral radius of the state transition matrix $A$ in a State Space Model (SSM) is the largest absolute value of its eigenvalues. This value directly dictates the model’s capacity for retaining information over time; a smaller spectral radius corresponds to a shorter effective memory horizon, as past states decay more rapidly during iterative updates. Conversely, a larger spectral radius allows the model to maintain dependencies over longer sequences. Critically, stability is also governed by the spectral radius: for an SSM to be stable – meaning its state does not diverge with time – the spectral radius of $A$ must be strictly less than one. Therefore, the spectral radius represents a fundamental trade-off between memory capacity and model stability.

Theorem 1 formally defines the relationship between an SSM’s spectral radius, ρ, and its memory horizon, T. Specifically, the theorem establishes that for a given state transition matrix $A$ , the memory horizon is bounded by $T \leq \frac{\ln(1/\epsilon)}{\ln(1/\rho)}$ , where ε represents a desired level of accuracy in recalling past states. This bound demonstrates that a smaller spectral radius directly corresponds to a longer memory horizon, enabling the SSM to effectively capture and utilize long-range dependencies in sequential data. Conversely, a larger spectral radius limits the effective memory horizon, hindering the model’s ability to maintain information over extended sequences. The theorem provides a quantifiable link between the spectral properties of the state transition matrix and the model’s capacity for long-term memory.

State Space Models (SSMs) exhibit input dependence in their spectral radius, meaning the value is not a static property of the model but is instead influenced by the input data itself. This occurs because the input signal modulates the state transition dynamics, directly impacting the eigenvalues of the state transition matrix. Consequently, adversarial inputs can be crafted to deliberately increase the spectral radius beyond a stable threshold, leading to exponential divergence of internal states and a significant degradation in performance metrics such as accuracy and predictive capability. This vulnerability contrasts with models possessing fixed spectral radii, where stability is guaranteed regardless of input characteristics.

Mamba-130M exhibits a sharp phase transition in associative recall accuracy at a spectral radius of approximately 0.90, demonstrating that spectral radius is a strong predictor of performance on memory-dependent tasks, with accuracy exceeding 80% for <span class="katex-eq" data-katex-display="false">\rho \geq 0.95</span> and collapsing below 30% for <span class="katex-eq" data-katex-display="false">\rho < 0.85</span>. — Mamba-130M exhibits a sharp phase transition in associative recall accuracy at a spectral radius of approximately 0.90, demonstrating that spectral radius is a strong predictor of performance on memory-dependent tasks, with accuracy exceeding 80% for $\rho \geq 0.95$ and collapsing below 30% for $\rho < 0.85$ .

The Peril of Spectral Collapse and the Limits of Superficial Defenses

Spectral collapse in State Space Models (SSMs) is characterized by a significant reduction in the spectral radius – the largest singular value of the model’s state transition matrix. This decrease directly impacts the model’s ability to retain information from prior inputs; as the spectral radius approaches zero, the model’s internal state effectively ‘forgets’ past data. Consequently, the model’s capacity for complex reasoning and long-range dependency processing is severely diminished, leading to performance degradation on tasks requiring contextual understanding or memory of previous states. The severity of this effect is directly correlated with the magnitude of the spectral radius decrease; a smaller spectral radius indicates a greater loss of historical information and reduced reasoning capability.

HiSPA, or Hidden State Perturbation Attack, functions by crafting specific input sequences designed to intentionally reduce the spectral radius of a State Space Model (SSM) towards zero. The spectral radius, calculated as the largest absolute value of the eigenvalues of the SSM’s state transition matrix, directly correlates with the model’s ability to retain information over time; a decreasing spectral radius indicates increasingly rapid information loss. By driving this value towards zero, HiSPA effectively causes the SSM to ‘forget’ prior context, leading to a substantial degradation in performance and rendering the model incapable of coherent reasoning or prediction. This manipulation occurs without directly accessing the model’s parameters, relying solely on carefully constructed input perturbations.

Theorem 3 establishes a formal limitation of output-only defenses against adversarial attacks targeting state space model (SSM) spectral collapse. The theorem proves that defenses relying solely on observable outputs are incapable of detecting or mitigating internal state degradation within the SSM. Because spectral collapse is characterized by a reduction in the spectral radius – indicating a loss of retained information – and this degradation occurs within the hidden state of the model, any defense limited to input-output analysis lacks the necessary visibility to counteract the effect. Consequently, even if an output-only defense successfully masks malicious outputs, it cannot prevent the underlying spectral collapse and the resulting loss of reasoning ability, leaving the model vulnerable to future exploitation.

Analysis of Mamba-130M reveals that adversarial prompts induce a depth-wise spectral collapse in layers 4-10 (<span class="katex-eq" data-katex-display="false">\rho < 0.30</span>), significantly reducing contextual information flow and creating a unique multi-layer signature distinguishable from benign inputs (<span class="katex-eq" data-katex-display="false">\rho \approx 0.95</span>), which explains the effectiveness of a 48-dimensional feature classifier. — Analysis of Mamba-130M reveals that adversarial prompts induce a depth-wise spectral collapse in layers 4-10 ( $\rho < 0.30$ ), significantly reducing contextual information flow and creating a unique multi-layer signature distinguishable from benign inputs ( $\rho \approx 0.95$ ), which explains the effectiveness of a 48-dimensional feature classifier.

SpectralGuard: Proactive Defense Through Continuous Monitoring

SpectralGuard operates by continuously monitoring the spectral radius of the state transition matrix within a State Space Model (SSM). The spectral radius, defined as the maximum absolute value of the eigenvalues of this matrix, serves as a key indicator of the model’s stability. A decreasing spectral radius signals a potential contraction of the state space, which can lead to spectral collapse – a condition where the model loses its ability to effectively process information. The system calculates this radius in real-time during model operation, allowing for immediate detection of deviations from established stability thresholds. This continuous assessment differentiates SpectralGuard from post-hoc analysis methods and enables proactive intervention before performance degradation occurs.

SpectralGuard proactively addresses model instability by continuously monitoring the spectral radius of the State Space Model’s (SSM) state transition matrix. A decrease in this spectral radius indicates a potential spectral collapse, a condition where the model loses its ability to accurately process information. The system is designed to detect these decreases in real-time and implement corrective measures, thereby maintaining model stability. Evaluation demonstrates an F1-score of 0.961 in detecting adversarial attacks that attempt to induce spectral collapse, indicating a high degree of accuracy in identifying and mitigating these threats.

SpectralGuard’s efficacy is formally guaranteed by Theorem 4, which provides both completeness and soundness proofs regarding its ability to detect and mitigate spectral collapse attacks. Completeness ensures that all instances of spectral collapse are identified, while soundness confirms that any identified collapse is, in fact, a genuine occurrence, minimizing false positives. Crucially, this level of formal verification is achieved with a measured throughput overhead of only 15%, demonstrating a practical implementation without significant performance degradation. This overhead represents the computational cost of the monitoring and mitigation processes relative to a baseline system without SpectralGuard.

SpectralGuard achieves high detection performance (<span class="katex-eq" data-katex-display="false">F1 = 0.961</span>, <span class="katex-eq" data-katex-display="false">FPR = 0.060</span>) with a multi-layer classifier, as adaptive evasion only marginally shifts mean spectral radii (<span class="katex-eq" data-katex-display="false">\bar{\rho}_{benign} = 0.894 \pm 0.004</span> vs. <span class="katex-eq" data-katex-display="false">\bar{\rho}_{adv} = 0.910 \pm 0.007</span>), highlighting the limitations of threshold-based separation. — SpectralGuard achieves high detection performance ( $F1 = 0.961$ , $FPR = 0.060$ ) with a multi-layer classifier, as adaptive evasion only marginally shifts mean spectral radii ( $\bar{\rho}_{benign} = 0.894 \pm 0.004$ vs. $\bar{\rho}_{adv} = 0.910 \pm 0.007$ ), highlighting the limitations of threshold-based separation.

Towards Robust and Efficient Sequence Modeling: A Vision for the Future

Mamba represents a significant advancement in sequence modeling through its innovative use of selective State Space Models (SSMs). Traditional SSMs process entire sequences at once, leading to computational bottlenecks. Mamba addresses this by employing input-dependent discretization, a technique that dynamically adjusts the processing steps based on the incoming data. This selective approach allows the model to focus on the most relevant parts of the sequence, effectively reducing computational complexity from quadratic to linear – a crucial improvement for handling long sequences. The result is a model that not only scales more efficiently but also demonstrates enhanced performance across various sequence modeling tasks, offering a compelling alternative to conventional transformer architectures and opening new possibilities for applications requiring robust and efficient processing of sequential data.

Selective State Space Models (SSMs) represent a significant advancement in sequence modeling by introducing mechanisms for dynamic control over information flow. Unlike traditional SSMs which process all input equally, selective approaches allow the model to prioritize and filter information based on its relevance, effectively acting as a learned gating system. This selective attention not only enhances computational efficiency – as irrelevant data is downweighted or ignored – but also improves robustness against noisy or adversarial inputs. By focusing on the most salient features within a sequence, the model becomes less susceptible to distractions and more capable of generalizing to unseen data. The ability to proactively manage information flow enables these selective SSMs to achieve superior performance on a range of tasks, while simultaneously reducing computational demands and bolstering resilience against perturbations.

The synergy between proactive monitoring and novel architectures promises a significant leap forward in sequence modeling capabilities. Integrating systems like SpectralGuard, which actively scrutinize model behavior, with state-of-the-art structures such as Mamba allows for the identification and mitigation of vulnerabilities before they impact performance. Studies demonstrate this combined approach yields a robust defense against adversarial attacks, even as those attacks scale across different model sizes – performance degradation was contained to 38.2% when transferring attacks between 370M and 1.4B parameter models. This resilience suggests that by prioritizing proactive security measures alongside architectural innovation, the full potential of selective State Space Models can be realized, extending their application beyond traditional sequence modeling tasks and into more complex and critical domains.

Across Mamba models of varying scales (130M to 2.8B), SpectralGuard consistently detects anomalies (F1 of 0.59-0.65) with high reproducibility (F1 standard deviation of 0.018), although minimal separation <span class="katex-eq" data-katex-display="false">\Delta\rho < 0.001</span> between benign and adversarial prompts explains a relatively high false positive rate. — Across Mamba models of varying scales (130M to 2.8B), SpectralGuard consistently detects anomalies (F1 of 0.59-0.65) with high reproducibility (F1 standard deviation of 0.018), although minimal separation $\Delta\rho < 0.001$ between benign and adversarial prompts explains a relatively high false positive rate.

The pursuit of robust systems, as detailed in the exploration of State Space Models and their vulnerabilities, echoes a fundamental truth about complexity. This work illuminates how even mathematically elegant structures can succumb to subtle instabilities, a spectral radius collapse being a prime example. It’s a reminder that decomposition, while offering manageability, doesn’t eliminate inherent fragility. As Edsger W. Dijkstra observed, “It’s not enough to do the right thing; you have to do things right.” SpectralGuard, as a defensive monitoring system, attempts to address the ‘doing things right’ aspect, attempting to fortify against collapse. Yet, the underlying dependency remains – a system divided is still a system susceptible to cascading failure, a prophecy the research implicitly acknowledges.

What Lies Ahead?

The identification of spectral collapse as a vulnerability in State Space Models feels less like a solution and more like a refinement of the question. It reveals, once again, that optimization invariably narrows the space of possible futures. Everything optimized will someday lose flexibility. To secure these models against adversarial attacks through monitoring is, predictably, to trade one set of assumptions for another. SpectralGuard, for all its promise, simply shifts the battleground, inviting a more subtle attacker to probe the limits of its detection thresholds.

The true challenge isn’t building better defenses, but accepting the inherent fragility of these systems. Scalability is just the word used to justify complexity. The focus will inevitably drift toward proactive resilience – models designed not to prevent failure, but to gracefully absorb it. Perhaps the next iteration of this work will explore techniques for dynamic reconfiguration, allowing SSMs to shed components, to deliberately unoptimize themselves, in the face of sustained attack.

The perfect architecture is a myth to keep sane. This research nudges the field toward recognizing that State Space Models, like all complex systems, are not static fortresses to be defended, but evolving ecosystems. The future lies not in control, but in cultivation.

Original article: https://arxiv.org/pdf/2603.12414.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/