Building a Better Bar for Code Security
A new automated pipeline dramatically reduces the effort needed to create challenging benchmarks for evaluating the security of code generated by large language models.
A new automated pipeline dramatically reduces the effort needed to create challenging benchmarks for evaluating the security of code generated by large language models.

A new framework, MicroProbe, dramatically improves the efficiency of assessing whether large AI models are trustworthy and predictable.
A new framework, Anota, helps developers proactively identify and address vulnerabilities hidden within application logic through dynamic analysis and security policy definitions.
A new method moves past simply assessing how sure an AI is to directly verifying the validity of its reasoning steps.

New research shows that reducing the size of large AI models isn’t enough – optimizing how those models access memory is critical for achieving real-world performance gains on resource-constrained hardware.
![GateBreaker provides a comprehensive framework for addressing challenges in [latex] \text{AI} [/latex] safety through robust anomaly detection and mitigation.](https://arxiv.org/html/2512.21008v1/x4.png)
New research reveals a concerning weakness in large language models that rely on specialized ‘expert’ systems, potentially allowing attackers to bypass safety measures with surprising ease.

As digital defenses accumulate complexity and age, the cybersecurity landscape is entering a period of diminishing returns, raising critical questions about long-term resilience.

A new framework, zkFL-Health, is enabling privacy-preserving federated learning for medical AI, safeguarding sensitive data during collaborative model training.

A new framework, zkFL-Health, is leveraging cutting-edge cryptographic techniques to enable privacy-preserving federated learning for sensitive medical data.
![SPELL streamlines large language model serving through speculative decoding, pre-drafting potential continuations [latex] \hat{y}_{t} [/latex] with a small language model before verifying them with a larger, more accurate one, thereby reducing latency and improving throughput even under high load.](https://arxiv.org/html/2512.21236v1/figures/overview.png)
New research reveals a surprisingly effective method for prompting large language models to generate malicious code, even those considered highly secure.