From Code to Chip: Automating Post-Quantum Crypto Hardware

Author: Denis Avetisyan

A new framework streamlines the process of translating complex cryptographic algorithms into efficient hardware implementations, promising a boost for post-quantum security.

The system implements LLM4PQC, a four-phase process encompassing post-quantum cryptography subroutine extraction, high-level synthesis preprocessing, automated <span class="katex-eq" data-katex-display="false">HLS-C</span> code generation and design space exploration-leveraging integration of <span class="katex-eq" data-katex-display="false">C2HLSC</span> with <span class="katex-eq" data-katex-display="false">Catapult</span>-and culminating in synthesis and verification, thereby establishing a complete workflow for hardware acceleration of post-quantum cryptographic algorithms. — The system implements LLM4PQC, a four-phase process encompassing post-quantum cryptography subroutine extraction, high-level synthesis preprocessing, automated $HLS-C$ code generation and design space exploration-leveraging integration of $C2HLSC$ with $Catapult$ -and culminating in synthesis and verification, thereby establishing a complete workflow for hardware acceleration of post-quantum cryptographic algorithms.

This work introduces LLM4PQC, an agentic system leveraging large language models for accurate and efficient high-level synthesis of Post-Quantum Cryptography cores for FPGA and ASIC platforms.

Designing hardware for post-quantum cryptography (PQC) is hampered by the labor-intensive process of converting reference code into efficient, synthesizable hardware. This paper introduces LLM4PQC-an agentic framework leveraging large language models to automate the translation of PQC specifications and C code into high-level synthesis (HLS)-ready implementations. Through a hierarchy of verification steps, LLM4PQC demonstrably reduces manual effort and accelerates design-space exploration for complex PQC kernels. Could this approach unlock a new era of efficient and scalable PQC acceleration in resource-constrained environments?

Bridging the Gap: Translating PQC Algorithms to Efficient Hardware

Though mathematically proven to resist attacks from quantum computers, Post-Quantum Cryptography (PQC) algorithms often encounter significant obstacles when translated into practical, high-performance hardware. These algorithms, designed with a focus on security rather than implementation efficiency, frequently involve complex operations and substantial computational demands. Simply porting existing software implementations to hardware accelerators proves insufficient; the inherent structure of many PQC schemes doesn’t map cleanly onto the parallel processing capabilities of dedicated hardware. This disconnect stems from the algorithms’ reliance on operations that are either inefficient or directly unsupported in typical hardware architectures, requiring substantial redesign and optimization to achieve viable performance for widespread deployment in critical security applications. Ultimately, the theoretical security of PQC is only fully realized when paired with efficient, custom-designed hardware implementations.

Post-Quantum Cryptography (PQC) implementations often begin with reference code designed primarily to verify mathematical correctness, rather than optimize for speed or resource usage. This software-centric approach, while crucial for establishing confidence in the algorithms, creates a significant bottleneck when transitioning to hardware acceleration. These initial codes frequently prioritize clarity and portability over efficient coding practices suitable for High-Level Synthesis (HLS) tools. Consequently, directly mapping this reference code to hardware descriptions often results in suboptimal performance, increased resource consumption, and limited scalability – hindering the realization of PQC’s full potential in real-world applications demanding both security and speed. The emphasis on correctness, while necessary, therefore requires a subsequent, dedicated optimization phase to unlock the benefits of hardware acceleration.

The translation of Post-Quantum Cryptography (PQC) algorithms from software to dedicated hardware frequently encounters obstacles when employing High-Level Synthesis (HLS) techniques. Standard PQC implementations are often written in C, prioritizing correctness and portability, but this approach introduces elements problematic for hardware realization. Specifically, constructs like dynamic memory allocation – where memory is requested during program execution – and the use of floating-point arithmetic are difficult, if not impossible, to directly map onto the fixed resources and data types of hardware. These software conveniences require complex workarounds or complete redesign in order to generate efficient hardware descriptions, ultimately hindering performance and increasing design complexity. Consequently, a significant effort is required to adapt these algorithms for effective hardware acceleration, demanding careful consideration of data structures and computational methods compatible with HLS tools.

The recent standardization of post-quantum cryptographic algorithms, while a crucial step towards securing future communications, introduces a significant engineering challenge demanding automated hardware co-design tools. These algorithms, designed with mathematical security as the primary goal, often lack the inherent structure needed for efficient hardware implementation. Consequently, realizing their full potential-achieving both security and speed-requires a shift from manual optimization to automated workflows. Such tools would bridge the gap between high-level algorithmic descriptions and optimized hardware descriptors, enabling designers to explore a vast design space and identify architectures that maximize throughput and minimize resource utilization. Without these advancements, the benefits of standardized PQC may be limited by performance bottlenecks, hindering widespread adoption and leaving systems vulnerable in a post-quantum world.

LLM4PQC: Automating the Synthesis of PQC Hardware

LLM4PQC is an automated workflow designed to transform existing Post-Quantum Cryptography (PQC) reference code into High-Level Synthesis C (HLS-C) code suitable for hardware implementation. This workflow utilizes Large Language Models (LLMs) in an agentic capacity, meaning the LLM is not simply executing a single command, but rather operating as an autonomous agent within a larger system. The core functionality involves refactoring the original PQC code, which is typically written for software execution, into a format compatible with HLS tools. This enables the generation of hardware descriptions from the C code, facilitating the creation of Application-Specific Integrated Circuits (ASICs) or Field-Programmable Gate Arrays (FPGAs) for accelerated cryptographic operations.

The LLM4PQC workflow utilizes C2HLSC, a Large Language Model-based code converter, within an iterative refinement loop to enhance High-Level Synthesis (HLS) compatibility. Initial conversion of Post-Quantum Cryptography (PQC) reference code from C to HLS-C is performed by C2HLSC. The resulting HLS-C code is then assessed for compatibility issues, and feedback regarding these issues is provided back to C2HLSC. The LLM then modifies the code based on this feedback, and the process repeats until the code meets predefined HLS compatibility criteria or a maximum number of iterations is reached. This feedback loop allows the system to progressively address and resolve potential blockers to hardware synthesis without requiring manual intervention.

The LLM4PQC workflow incorporates a dedicated High-Level Synthesis (HLS) preprocessing stage to mitigate common hardware synthesis blockers. This stage specifically focuses on Static Memory Mapping, which involves explicitly defining memory access patterns to optimize resource utilization and avoid runtime memory allocation issues within the hardware implementation. Additionally, Initialization Removal identifies and eliminates unnecessary variable initializations present in the reference code; these initializations are often redundant in hardware and can introduce unnecessary logic or resource constraints. By proactively addressing these issues before the HLS compilation process, LLM4PQC increases the likelihood of successful synthesis and improves the overall performance of the generated hardware.

The automation of Post-Quantum Cryptography (PQC) hardware design through LLM4PQC demonstrably reduces manual intervention in the High-Level Synthesis (HLS) flow. Traditional PQC implementation requires substantial engineer time for code refactoring to address HLS tool limitations and achieve efficient hardware architectures. By integrating LLM-driven code conversion and preprocessing, LLM4PQC minimizes these manual steps, decreasing design cycles from weeks to days in initial testing. This accelerated process enables faster prototyping of PQC hardware accelerators and facilitates rapid deployment for security applications, allowing for quicker integration of these critical cryptographic algorithms into practical systems.

Validation and Optimization: Performance Across PQC Standards

LLM4PQC has been successfully applied to the hardware acceleration of several post-quantum cryptography (PQC) algorithms selected by the National Institute of Standards and Technology (NIST) for standardization. Specifically, the framework supports the implementation of `Kyber`, a lattice-based key-encapsulation mechanism; `Dilithium`, a lattice-based digital signature scheme; `Falcon`, another lattice-based signature scheme; and `SPHINCS+`, a stateless hash-based signature scheme. This demonstrates the versatility of LLM4PQC across different PQC approaches and its potential for broad adoption in securing future cryptographic systems. The framework’s adaptability is crucial given the diverse range of algorithms being considered for standardization and deployment.

The LLM4PQC workflow incorporates specific optimizations for Number Theoretic Transform (NTT) and Sampler operations, which are computationally intensive components of lattice-based cryptographic algorithms. NTT implementation benefits from tailored code generation targeting efficient polynomial multiplication in the finite field. The Sampler component utilizes techniques to accelerate the generation of uniformly random values required for key generation and signature creation. These specialized implementations within the workflow allow for the effective translation of complex mathematical operations into optimized High-Level Synthesis (HLS) C code, contributing to overall performance gains for algorithms like Kyber, Dilithium, and Falcon.

Design Space Exploration (DSE) was implemented to automatically optimize the generated High-Level Synthesis C (HLS-C) code for Post-Quantum Cryptography (PQC) implementations. This process systematically varied key architectural parameters within the HLS flow, including loop unrolling factors, dataflow configurations, and resource allocation strategies. The Catapult HLS tool was utilized to explore this design space, generating and evaluating multiple hardware implementations for each parameter combination. The objective of this DSE was to identify configurations that minimize both area and latency, resulting in more efficient hardware accelerators for PQC algorithms like Kyber, Dilithium, and Falcon.

Performance evaluations demonstrate hardware efficiency gains utilizing the described workflow. Specifically, the average area required for implementation of the Kyber Number Theoretic Transform (NTT) primitive is 2,957.54 μm², and the average latency for the Dilithium NTT is 2.8 cycles. These results represent improvements over several manual baseline implementations. Furthermore, area and latency metrics were improved for the Kyber, Dilithium, and Falcon NTT primitives when compared to previously published implementations and state-of-the-art results.

Future Directions: Towards Scalable PQC Hardware Co-design

The development of post-quantum cryptography (PQC) hardware traditionally demands substantial manual effort, involving iterative cycles of design, implementation, and optimization – a process both time-consuming and resource-intensive. The LLM4PQC workflow offers a pivotal advancement by introducing a level of automation to this co-design process. By leveraging large language models, this approach facilitates the translation of high-level PQC algorithm specifications into synthesizable hardware descriptions, significantly reducing the need for manual coding and expert intervention. This automated pathway not only accelerates the development timeline but also lowers associated costs, paving the way for broader and more rapid deployment of PQC solutions to safeguard data and systems against emerging quantum threats. The potential for streamlined hardware design promises to democratize access to robust, quantum-resistant cryptographic infrastructure.

Continued development centers on broadening the capabilities of automated code transformation to encompass increasingly sophisticated algorithms and data arrangements, with particular emphasis on a technique called Data Structure Expansion. This approach moves beyond simple algorithmic optimization by dynamically adapting the underlying data representations to better suit the target hardware. By intelligently restructuring how data is stored and accessed, the system aims to unlock performance gains that would be unattainable through code-level optimizations alone. The goal is to enable the automated generation of hardware-specific data layouts that minimize memory access latency, reduce resource utilization, and ultimately accelerate post-quantum cryptographic computations – fostering a more adaptable and efficient co-design process.

Current approaches to hardware co-design using large language models often treat code generation as a one-way process. However, incorporating data from actual hardware implementations back into the LLM’s training cycle presents a pathway to substantial improvements. This iterative feedback loop allows the model to learn directly from the performance characteristics of its designs – identifying bottlenecks, inefficiencies, and opportunities for optimization that might be missed through simulation alone. By analyzing metrics like latency, power consumption, and area utilization derived from physical implementations, the LLM can refine its code generation strategies, prioritizing designs that demonstrably translate into high-performing hardware. This closed-loop system promises to move beyond simply generating syntactically correct code, toward creating designs inherently suited for efficient hardware realization and unlocking significant performance gains in post-quantum cryptographic systems.

Analysis of the Falcon Sampler reveals a substantial performance and resource utilization variance, exhibiting latency from a swift 1 to a maximum of 20,436 clock cycles with an average of 12,240, and demanding an area between 1,188.87 and 159,425.27 μm². This wide range underscores the data-dependent behavior intrinsic to this post-quantum cryptographic primitive, meaning its execution time and hardware footprint shift considerably based on the input data. Despite this variability, optimized implementations of the Falcon Sampler, and similar primitives, are critical for widespread adoption of post-quantum cryptography, ultimately safeguarding sensitive data and bolstering the security of essential infrastructure as quantum computing capabilities advance.

The presented LLM4PQC framework embodies a holistic approach to hardware design, recognizing that optimizing for performance in Post-Quantum Cryptography requires understanding the interplay between algorithm and architecture. This mirrors a fundamental principle articulated by Donald Knuth: “Premature optimization is the root of all evil.” The framework doesn’t simply translate code; it explores the design space, intelligently balancing resource utilization and efficiency – a discipline of distinguishing the essential from the accidental. By automating the synthesis of PQC cores, LLM4PQC allows designers to focus on the broader system-level considerations, ensuring that the resulting hardware implementation truly serves the cryptographic needs, rather than being constrained by suboptimal, locally-optimized components.

The Road Ahead

The automation of Post-Quantum Cryptography (PQC) core generation, as demonstrated by LLM4PQC, is not merely a question of translating code. It exposes a fundamental tension: the desire for algorithmic purity versus the messy reality of silicon. One might replace a computational engine, but without understanding the data pathways – the very bloodstream of the hardware – performance gains remain theoretical. Future work must therefore focus on a holistic co-design, where the LLM isn’t simply a translator, but a system architect, aware of the trade-offs between resource utilization, latency, and power consumption.

Currently, the framework operates on established reference code. The true test lies in its ability to synthesize designs from novel cryptographic constructions, or even to suggest optimizations to the algorithms themselves. This demands a shift from pattern recognition to genuine understanding – a subtle, yet critical distinction. The LLM must move beyond mimicking existing solutions to formulating new ones, guided by both cryptographic principles and hardware constraints.

Ultimately, the field faces a broader challenge: the increasing complexity of both cryptographic algorithms and hardware architectures. Simplification, not merely automation, will be key. Elegant design emerges not from adding layers of abstraction, but from stripping away the unnecessary. The goal isn’t to build more complex systems, but to understand the fundamental principles that allow complex behavior to arise from simple interactions.

Original article: https://arxiv.org/pdf/2602.09919.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Bridging the Gap: Translating PQC Algorithms to Efficient Hardware

LLM4PQC: Automating the Synthesis of PQC Hardware

Validation and Optimization: Performance Across PQC Standards

Future Directions: Towards Scalable PQC Hardware Co-design

The Road Ahead

See also: