Securing the IoT with Post-Quantum Crypto: A Tiny Core Challenge

Author: Denis Avetisyan

New research demonstrates the practical implementation of NIST-standardized post-quantum algorithms on severely resource-constrained ARM Cortex-M0+ microcontrollers used in many IoT devices.

ML-KEM-512 achieves a key exchange on an ARM Cortex-M0+ processor in 36.3 ms, demonstrating a 17-fold speed improvement over the ECDH P-256 algorithm.

This study benchmarks the performance of ML-KEM and ML-DSA on low-cost hardware, identifying latency variance in ML-DSA signing as a key consideration for real-world deployment.

Despite the urgent need to secure Internet of Things devices against future quantum threats, systematic performance evaluations of finalized post-quantum cryptography standards on severely resource-constrained platforms remain notably absent. This paper, ‘Benchmarking Post-Quantum Cryptography on Resource-Constrained IoT Devices: ML-KEM and ML-DSA on ARM Cortex-M0+’, addresses this gap by presenting the first isolated algorithm-level benchmarks of ML-KEM and ML-DSA-NIST-standardized lattice-based schemes-executed on a 133 MHz ARM Cortex-M0+ processor with limited SRAM. Results demonstrate the feasibility of these schemes, with ML-KEM-512 completing a key exchange in 36.3 ms-significantly faster than ECDH on the same hardware-though ML-DSA signing exhibits considerable latency variance; can these performance characteristics be optimized to enable widespread adoption of post-quantum cryptography in deeply embedded IoT applications?

The Looming Quantum Threat: A Call for Proactive Cryptography

The bedrock of modern digital security, public-key cryptography-specifically algorithms like RSA and Elliptic Curve Cryptography (ECC)-faces an existential threat from the rapidly advancing field of quantum computing. These algorithms rely on the computational difficulty of certain mathematical problems for their security; however, $Shor’s algorithm$ , designed for quantum computers, can efficiently solve these problems, effectively rendering RSA and ECC obsolete. This isn’t merely a theoretical concern; the potential for a future “quantum apocalypse” necessitates proactive measures. Data encrypted today using vulnerable algorithms could be decrypted by a quantum computer in the future, jeopardizing sensitive information like financial records, government secrets, and personal communications. The sheer scale of encrypted data requiring protection-and the time needed to transition to new systems-highlights the urgency of addressing this looming cryptographic vulnerability.

The escalating threat to modern cryptography isn’t solely a concern for the distant future; the “Harvest Now, Decrypt Later” attack strategy highlights the present danger. Malicious actors are actively collecting encrypted communications today, anticipating the eventual arrival of sufficiently powerful quantum computers. These actors can store the intercepted data indefinitely, awaiting a time when quantum algorithms can break currently secure encryption methods like RSA and ECC. This means sensitive information – financial records, intellectual property, state secrets – remains at risk for years, even decades, after initial transmission. Consequently, a proactive shift to post-quantum cryptography (PQC) is not merely a precautionary measure, but an urgent necessity to safeguard data against this looming, long-term vulnerability and prevent a potential cascade of compromised information when quantum computing capabilities mature.

The National Institute of Standards and Technology (NIST) is currently spearheading a vital process to establish the next generation of cryptographic standards, essential for safeguarding digital information in the approaching era of quantum computing. Recognizing the looming threat to currently used public-key algorithms, NIST launched a multi-year evaluation process, inviting cryptographers worldwide to submit and rigorously test potential post-quantum cryptographic (PQC) algorithms. This isn’t simply about finding replacements; it’s a complex undertaking involving analyzing algorithms for security, performance, and implementation challenges across diverse platforms. The culmination of this effort – the selection of standardized PQC algorithms – will provide organizations with the tools necessary to migrate away from vulnerable systems and ensure continued data confidentiality, integrity, and authenticity. The ongoing standardization work represents a critical step in proactively mitigating the risks posed by quantum computers and establishing a resilient cryptographic foundation for the future.

Lattice-Based Solutions: The Foundation of Resilience

ML-KEM and ML-DSA were selected by the National Institute of Standards and Technology (NIST) as post-quantum cryptographic (PQC) standards due to their reliance on the presumed intractability of lattice problems. Specifically, the security of these algorithms is grounded in the hardness of Module-Lattice-based Error Learning With Keys (Module-LWE) and Module-Short Integer Solution (Module-SIS) problems. These problems involve finding solutions to systems of linear equations over polynomial rings, where the difficulty arises from the addition of errors and the modular structure. The security reductions for ML-KEM and ML-DSA demonstrate that breaking these schemes requires solving these underlying lattice problems, offering a quantifiable security level against known and anticipated quantum attacks. $Module-LWE$ and $Module-SIS$ are considered strong candidates for long-term security in a post-quantum landscape.

ML-KEM and ML-DSA utilize the Number Theoretic Transform (NTT) to accelerate polynomial multiplications, a core operation in lattice-based cryptography. The NTT operates in a ring $\mathbb{Z}_q[x]/(f(x))$ , where $q$ is a prime modulus and $f(x)$ is an irreducible polynomial, enabling efficient computation in the frequency domain. However, certain steps, such as key and ciphertext rejection, necessitate the use of Rejection Sampling. This probabilistic method ensures that generated values meet specific criteria for security, adding computational overhead despite the NTT’s optimization of core algebraic operations. The combination of these techniques balances performance and security within the algorithms.

While lattice-based cryptography offers a robust foundation for post-quantum cryptography (PQC) due to the presumed intractability of underlying mathematical problems, successful implementation necessitates careful performance evaluation on resource-constrained devices. Factors such as memory footprint, computational cycles, and energy consumption become critical limitations in embedded systems, IoT devices, and mobile platforms. Assessing these parameters involves benchmarking algorithms like ML-KEM and ML-DSA on target hardware, considering optimizations like code size reduction, algorithmic variations, and hardware acceleration where available. Performance characteristics will dictate the feasibility of deploying these algorithms in practical applications requiring limited resources and extended battery life.

The variance in ML-DSA signing latency increases with security level, as indicated by the expanding interquartile ranges and the presence of outliers resulting from high-iteration rejection sampling.

Performance on Constrained Devices: A Case Study on the Cortex-M0+

The ARM Cortex-M0+ microcontroller is prevalent in Internet of Things (IoT) deployments due to its low cost, energy efficiency, and small footprint. This makes it a representative target platform for evaluating Post-Quantum Cryptography (PQC) algorithms intended for resource-constrained devices. Its widespread adoption in applications such as sensors, wearables, and smart home devices necessitates the assessment of PQC implementations on this architecture to understand their feasibility and performance characteristics in real-world deployments. Benchmarking PQC schemes on the Cortex-M0+ provides critical data for determining the trade-offs between security, speed, and energy consumption in IoT environments.

The Raspberry Pi RP2040 microcontroller, equipped with a dual-core ARM Cortex-M0+ processor, serves as a representative platform for evaluating the performance characteristics of post-quantum cryptography (PQC) algorithms. Utilizing this hardware allows for precise measurement of execution time and energy consumption for algorithms like ML-KEM and ML-DSA. The RP2040’s readily available development tools and relatively low cost facilitate repeatable benchmarking, providing data relevant to constrained devices commonly found in Internet of Things (IoT) deployments. The dual-core configuration enables analysis of potential parallelization strategies, while the Cortex-M0+’s limited resources accurately reflect the challenges faced when implementing PQC on resource-constrained embedded systems.

Benchmarking on the RP2040’s dual-core ARM Cortex-M0+ microcontroller demonstrates that the ML-KEM-512 key exchange protocol achieves a completion time of 36.3 milliseconds with an energy consumption of 2.87 millijoules. This performance represents a significant improvement over the ECDH-P-256 algorithm, exhibiting a 17-fold increase in speed and a 94% reduction in energy usage when implemented with the same reference C code on this platform. These results indicate ML-KEM-512 is a viable post-quantum cryptographic option for resource-constrained devices.

Memory requirements for post-quantum cryptographic algorithms vary significantly. ML-KEM-512 decapsulation exhibits a peak SRAM usage of 9.4KB, comfortably fitting within a 10KB memory allocation. In contrast, ML-DSA-65 requires 55.6KB of peak stack usage during operation, mandating a larger 64KB stack allocation to prevent overflow. These figures demonstrate a substantial difference in resource utilization between the two algorithms, with ML-DSA-65 demanding approximately 5.9 times more stack space than ML-KEM-512 during decapsulation.

Performance analysis of Post-Quantum Cryptography (PQC) implementations on the ARM Cortex-M0+ microcontroller reveals a performance difference when compared to the Cortex-M4. Utilizing identical reference C code for both platforms, the Cortex-M0+ consistently exhibits a slowdown of 1.8 to 1.9 times. This discrepancy is attributed to architectural differences, specifically the Cortex-M4’s more powerful instruction set and wider data bus, enabling faster processing of the same cryptographic operations. While the Cortex-M0+ remains a viable target for PQC due to its low power consumption and widespread use, this performance gap necessitates careful consideration when deploying resource-constrained IoT devices.

Despite lacking hardware acceleration features like UMULL, DSP, and SIMD instructions, the Cortex-M0+ implementation of the ML-KEM handshake exhibits only a modest 1.8-1.9× slowdown compared to the Cortex-M4 (pqm4).

Practical IoT Integration: Securing Communication with CoAP and DTLS

The proliferation of Internet of Things (IoT) devices, often operating on limited power and bandwidth, necessitates communication protocols designed for these constraints. Constrained Application Protocol (CoAP) emerges as a specialized web transfer protocol tailored for resource-limited networks, offering a streamlined alternative to HTTP. However, secure communication is paramount, even in these environments. Datagram Transport Layer Security (DTLS) builds upon the foundational security of TLS, adapting it for use with UDP-based protocols like CoAP. This pairing – CoAP and DTLS – provides essential confidentiality, integrity, and authentication for IoT data exchange. By minimizing overhead and computational demands, DTLS ensures that security doesn’t come at the cost of device functionality or battery life, making it a cornerstone of secure IoT deployments where resources are scarce and reliable communication is vital.

The convergence of Machine Learning-Key Encapsulation Mechanism (ML-KEM) and Machine Learning-Digital Signature Algorithm (ML-DSA) with the Constrained Application Protocol (CoAP) and Datagram Transport Layer Security (DTLS) establishes a robust, end-to-end security framework for the Internet of Things. This integration goes beyond simply securing data during transmission; it safeguards information both while in transit and when stored on the device itself. ML-KEM facilitates secure key exchange, leveraging machine learning to enhance cryptographic agility and resilience against evolving threats, while ML-DSA provides a method for verifying data authenticity and integrity. By combining these advanced cryptographic techniques with the lightweight protocols of CoAP and DTLS-designed for resource-constrained devices-a comprehensive security posture is achieved, protecting sensitive data from unauthorized access and manipulation throughout its lifecycle within the IoT ecosystem.

Achieving seamless integration of security protocols within the Internet of Things necessitates a delicate equilibrium between robust protection, swift data transmission, and minimized power consumption. Real-world IoT deployments, often characterized by resource-constrained devices and unpredictable network conditions, demand careful optimization; simply layering security measures can introduce unacceptable latency or drain limited battery reserves. Consequently, developers must strategically prioritize cryptographic algorithms – such as those leveraging ML-KEM and ML-DSA – and communication protocols like CoAP and DTLS, tailoring their implementation to the specific requirements of each application. This involves evaluating trade-offs between key sizes, encryption overhead, and energy expenditure, ultimately ensuring that security enhancements do not compromise the functionality or longevity of interconnected devices and the networks they inhabit.

The exploration of ML-KEM and ML-DSA on the ARM Cortex-M0+ underscores a fundamental principle: effective security needn’t demand exorbitant resources. This work demonstrates a commitment to minimizing complexity while maximizing impact, a pursuit aligning with the observation that ‘It’s easier to ask forgiveness than it is to get permission.’ Grace Hopper articulated this sentiment, and it resonates with the practical approach taken in this study. The successful implementation, despite challenges like ML-DSA signing latency variance, proves that robust cryptographic defenses can be embedded within the constraints of resource-limited IoT devices, advocating for proactive, rather than reactive, security measures. The focus remains on achieving functional security, not theoretical perfection.

Where To Next?

The demonstrated feasibility is not a destination. It is a starting point. Lattice-based cryptography, for all its mathematical elegance, still yields practical variances. ML-DSA signing latency, specifically, demands scrutiny. Abstractions age, principles don’t. Reducing variance is not merely optimization; it’s a matter of predictable security.

Resource-constrained devices present unique attack surfaces. Side-channel resistance, demonstrated in isolation, does not guarantee resilience in deployment. Every complexity needs an alibi. Future work must move beyond benchmarks and embrace holistic security evaluations, considering power analysis, timing attacks, and fault injection – not as separate exercises, but as concurrent threats.

The current focus correctly prioritizes algorithm implementation. However, the true challenge lies in lifecycle management. Key rotation, secure storage, and post-compromise recovery on billions of tiny devices are problems of scale, logistics, and ultimately, cost. The pursuit of perfect security is a fool’s errand. Pragmatic, auditable security is the only achievable goal.

Original article: https://arxiv.org/pdf/2603.19340.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Looming Quantum Threat: A Call for Proactive Cryptography

Lattice-Based Solutions: The Foundation of Resilience

Performance on Constrained Devices: A Case Study on the Cortex-M0+

Practical IoT Integration: Securing Communication with CoAP and DTLS

Where To Next?

See also: