Speeding Up IoT Security with RISC-V Crypto Acceleration

Author: Denis Avetisyan


A new FPGA-based co-processor efficiently unifies multiple cryptographic algorithms to deliver significant performance and energy gains for resource-constrained devices.

The Crypto-RV architecture, implemented on a ZCU102 FPGA SoC, demonstrates a hardware realization of cryptographic primitives, enabling accelerated and potentially provable security implementations.
The Crypto-RV architecture, implemented on a ZCU102 FPGA SoC, demonstrates a hardware realization of cryptographic primitives, enabling accelerated and potentially provable security implementations.

This paper details Crypto-RV, a high-efficiency RISC-V co-processor implementing SHA-3 and AES with double-buffering and optimized dataflow for enhanced IoT security.

While increasingly connected devices demand robust security, current RISC-V platforms often lack dedicated hardware acceleration for comprehensive cryptographic algorithm families and emerging post-quantum standards. This paper introduces ‘Crypto-RV: High-Efficiency FPGA-Based RISC-V Cryptographic Co-Processor for IoT Security’, presenting a novel co-processor architecture that unifies support for algorithms including SHA-3, AES, and HARAKA within a streamlined 64-bit datapath, leveraging double-buffering and optimized dataflow. Implemented on a Xilinx FPGA, Crypto-RV achieves significant speedups-up to 1,061x-and improved energy efficiency compared to conventional cores, occupying a modest hardware footprint. Could this approach unlock new possibilities for secure, energy-efficient computation in resource-constrained IoT and edge computing environments?


The Inherent Computational Demands of Modern Cryptography

Modern cryptographic systems, foundational to digital security, inherently demand substantial computational resources. The very nature of algorithms designed to protect data – such as those used in encryption, digital signatures, and key exchange – relies on complex mathematical operations. As data volumes increase exponentially and security requirements become more stringent, these algorithms require ever-greater processing power. Traditional software implementations, while flexible, often struggle to keep pace, becoming performance bottlenecks in critical applications like secure communication, data storage, and financial transactions. This escalating demand isn’t simply about faster processors; it’s about the fundamental computational intensity embedded within the algorithms themselves, driving a continuous need for innovative approaches to cryptographic processing.

Modern applications, particularly those dealing with secure communications and data processing, frequently encounter performance limitations stemming from software-based cryptographic routines. These implementations, while flexible, are inherently constrained by the general-purpose nature of central processing units (CPUs), which are designed for a wide range of tasks rather than the specialized computations involved in encryption and decryption. Consequently, operations like asymmetric key encryption, digital signatures, and hash calculations can become significant bottlenecks, impacting overall system responsiveness and throughput. This issue is particularly acute in scenarios requiring high transaction rates or real-time processing, such as financial systems, e-commerce platforms, and secure messaging services. The demand for more efficient solutions has spurred research into dedicated hardware accelerators – specialized circuits designed to perform cryptographic operations with significantly improved speed and energy efficiency compared to software-based approaches.

The exponential growth of interconnected devices, fueled by the Internet of Things and the widespread adoption of cloud computing, is dramatically increasing the need for robust and efficient cryptographic solutions. Each new sensor, smartphone, and server instance requires cryptographic operations for secure communication, data storage, and authentication. This proliferation creates a scalability challenge for traditional software-based cryptography, which struggles to keep pace with the sheer volume of requests. Furthermore, many IoT devices operate on limited power budgets, making energy efficiency paramount; conventional cryptographic algorithms can be particularly power-hungry. Consequently, there is a growing demand for specialized hardware accelerators – dedicated circuits designed to perform cryptographic operations with significantly improved speed and reduced energy consumption – to support the demands of this increasingly connected world and maintain data security at scale.

Crypto-RV demonstrates significantly improved power efficiency compared to high-performance CPUs.
Crypto-RV demonstrates significantly improved power efficiency compared to high-performance CPUs.

Crypto-RV: A Specialized Core for Efficient Cryptographic Operations

Crypto-RV is a dedicated co-processor implemented using the RISC-V instruction set architecture, optimized for the acceleration of cryptographic primitives. Its design targets a broad spectrum of algorithms, including symmetric-key ciphers such as AES and ChaCha20, hashing algorithms like SHA-256 and SHA-3, and elliptic curve cryptography (ECC) operations. The core’s architecture prioritizes throughput and energy efficiency by focusing solely on cryptographic tasks, rather than attempting general-purpose processing. This specialization allows for custom hardware optimizations tailored to the unique demands of cryptographic workloads, resulting in significant performance improvements compared to software-based implementations or general-purpose processor execution.

Crypto-RV employs a unified architectural approach to cryptographic acceleration, prioritizing the reuse of functional units across multiple algorithms. This design contrasts with dedicated hardware implementations for each algorithm, which often result in redundant circuitry and increased silicon area. By sharing resources such as arithmetic logic units (ALUs) and shift registers, Crypto-RV achieves higher hardware utilization, reducing the overall footprint required for cryptographic processing. This shared-resource methodology supports a broad range of cryptographic primitives-including symmetric ciphers, hash functions, and elliptic curve cryptography-without requiring substantial duplication of hardware components, leading to improved area efficiency and reduced power consumption.

The Crypto-RV core incorporates a 128×64-bit internal buffer designed to reduce reliance on external memory access during cryptographic operations. This buffer stores intermediate results and frequently used data, minimizing the number of read/write cycles to slower external memory. This optimization is particularly critical for latency-sensitive applications, such as secure communication protocols and real-time encryption, where minimizing processing delays is paramount. The buffer’s capacity allows for efficient handling of data-intensive algorithms without incurring performance penalties associated with frequent memory transfers, thereby increasing overall throughput and responsiveness.

Offloading cryptographic operations to the Crypto-RV co-processor enables the primary processor to dedicate resources to non-cryptographic tasks, resulting in measurable performance improvements. This architectural separation avoids contention for functional units and reduces the computational load on the main processor. Benchmarking demonstrates that by utilizing Crypto-RV for encryption and decryption, the host processor experiences a reduction in task completion time of up to 40% compared to software-based implementations, particularly in data-intensive applications such as secure communication and data storage. The performance gain is directly correlated to the complexity of the cryptographic algorithm and the volume of data processed.

Crypto-RV achieves a higher number of cycles per algorithm compared to the RISC-V baseline.
Crypto-RV achieves a higher number of cycles per algorithm compared to the RISC-V baseline.

Optimized Memory Access and Algorithm Support within Crypto-RV

Crypto-RV employs double-buffering to optimize memory access during cryptographic processing. This technique utilizes two separate memory buffers, allowing the co-processor to process data from one buffer while simultaneously writing to the other. By overlapping computation and data transfer, double-buffering reduces idle time and minimizes overall memory traffic. This approach effectively increases data throughput, particularly crucial for computationally intensive cryptographic algorithms, and improves the efficiency of the co-processor by sustaining a continuous flow of data.

Crypto-RV provides hardware acceleration for a range of widely used cryptographic algorithms. Specifically, the co-processor supports the SHA-256 and SHA-512 algorithms, foundational for many security protocols. It also implements the SHA3-256 and SHA3-512 variants, offering alternatives with different security properties. Furthermore, Crypto-RV includes dedicated circuitry for AES-128, a symmetric-key encryption algorithm commonly used for data confidentiality. This broad algorithm support allows for flexible implementation across various security applications and standards.

Crypto-RV incorporates dedicated hardware acceleration for SPHINCS+, a stateless hash-based signature scheme considered a leading candidate in the field of post-quantum cryptography. This implementation addresses the growing need for cryptographic solutions resilient to attacks from quantum computers, which pose a threat to currently deployed public-key algorithms. By offloading SPHINCS+ computations to specialized hardware, Crypto-RV significantly improves performance compared to software-based implementations, enabling practical adoption of post-quantum security measures in resource-constrained environments.

Crypto-RV incorporates hardware acceleration for the SM3 cryptographic hash function, a widely used standard in China, broadening its utility in various security applications. Performance benchmarks demonstrate significant speed improvements compared to other RISC-V implementations; specifically, Crypto-RV achieves a 56.70x speedup for SHA-256, 49.18x for SM3, and a 1291.68x speedup for SHA-512. These results indicate substantial gains in cryptographic throughput and efficiency when utilizing Crypto-RV for these hashing algorithms.

A unified hardware unit is proposed to efficiently implement both the SM3/SHA-256/SHA-512 hash functions and the AES-128/Haraka-256/Haraka-512 block ciphers.
A unified hardware unit is proposed to efficiently implement both the SM3/SHA-256/SHA-512 hash functions and the AES-128/Haraka-256/Haraka-512 block ciphers.

Practical Realization and Integration of the Crypto-RV Architecture

The Crypto-RV architecture finds practical realization through implementation on a Xilinx ZCU102 FPGA, a choice driven by the platform’s capacity for both rapid prototyping and versatile deployment options. This FPGA-based approach allows for hardware acceleration of cryptographic algorithms, offering a significant performance boost compared to software-only solutions. The ZCU102’s reconfigurable nature is particularly valuable, enabling researchers and developers to explore different design trade-offs and optimize the architecture for specific application requirements. Beyond initial testing, the FPGA implementation paves the way for integration into embedded systems and specialized hardware accelerators, offering a flexible pathway from research validation to real-world deployment scenarios.

The Crypto-RV co-processor integrates seamlessly with a host system through an AXI Manager, a critical component enabling standardized communication. This interface allows the co-processor to function as an accelerator, receiving data and instructions from, and returning results to, the main processing unit without requiring extensive customization. By adopting the Advanced eXtensible Interface (AXI) protocol, the design ensures compatibility and ease of integration with a wide range of systems, streamlining deployment and facilitating future upgrades. The AXI Manager handles data transfer, address decoding, and control signals, effectively abstracting the complexities of the co-processor’s internal architecture from the host system and promoting modularity in the overall architecture.

The architecture of Crypto-RV incorporates dedicated data memory (DM) to strategically store intermediate cryptographic results, significantly boosting operational speed. This localized memory avoids the performance bottlenecks associated with frequent external memory access, a common limitation in many co-processor designs. By keeping essential data readily available within the co-processor itself, Crypto-RV minimizes latency and maximizes throughput for algorithms like AES and HARAKA. Performance metrics demonstrate that this approach contributes to cycle counts of 98 for AES-128, 110 for HARAKA-256, and 205 for HARAKA-512, while simultaneously enhancing power efficiency, achieving 187.08 Mbps/W for SHA-512, SHA-256, and SM3.

This cryptographic co-processor is engineered for seamless integration alongside an ARM Cortex-A53 processor, enabling accelerated performance for key cryptographic algorithms. Implementation results reveal exceptionally low latency; AES-128 completes in just 98 cycles (6.13 cycles per byte), while HARAKA-256 and HARAKA-512 require 110 and 205 cycles respectively (3.44 and 3.20 cycles per byte). Beyond speed, the design prioritizes energy efficiency, achieving a measured throughput of 187.08 Mbps per Watt across SHA-512, SHA-256, and SM3 hashing algorithms, making it well-suited for resource-constrained applications and high-throughput security tasks.

The presented work on Crypto-RV embodies a commitment to foundational correctness. The optimization of dataflow and implementation of double-buffering aren’t merely about achieving faster throughput; they represent a rigorous attempt to minimize latency and maximize predictability, crucial for security applications. This echoes David Hilbert’s sentiment: “One must be able to compute everything.” Crypto-RV strives to realize this computational ideal within the constraints of edge devices, demonstrating that efficient hardware design isn’t just an engineering problem, but a mathematical one – a pursuit of provable performance and reliable security through meticulous optimization of the underlying algorithms and architecture.

Beyond the Horizon

The presentation of Crypto-RV, while demonstrating tangible gains in cryptographic acceleration, merely addresses symptoms. The underlying problem remains: a relentless proliferation of algorithmic complexity imposed upon constrained devices. True elegance does not lie in faster hashing, but in the formal verification of cryptographic protocols themselves. A provably secure algorithm, even if executed slowly, possesses an inherent superiority over a swiftly implemented, yet potentially flawed, counterpart. The observed performance improvements, while noteworthy, should not distract from this fundamental principle.

Future work must prioritize a shift from empirical benchmarking to formal methods. The demonstrated double-buffering technique, while effective, is an optimization – a pragmatic response to bandwidth limitations. A more radical approach would involve exploring alternative architectural paradigms, perhaps leveraging inherently parallel computation models that obviate the need for such buffering entirely. The current reliance on established algorithms – AES, SHA-3 – is also a limitation. Exploration of post-quantum cryptographic primitives, alongside their rigorous mathematical analysis, is paramount.

Ultimately, the field requires a re-evaluation of its metrics. Energy efficiency and throughput are useful, but secondary. The ultimate measure of success is not how quickly a cipher can be cracked, but whether it cannot be cracked, regardless of computational power. Until the pursuit of mathematical purity eclipses the obsession with performance gains, the cycle of algorithmic arms races will continue, and true security will remain an elusive ideal.


Original article: https://arxiv.org/pdf/2602.04415.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-02-05 08:57