Building Robust Quantum Simulations: Lessons from Continuous Testing

Author: Denis Avetisyan

This article explores how applying modern software engineering practices can dramatically improve the reliability and sustainability of high-performance computing codes used in quantum materials research.

Following a system update on September 9, 2025, performance of the libNEGF CB setup on the JUWELS Booster exhibited a subtle decrease across two and four nodes, as indicated by elapsed time and energy consumption data-with recent runs distinguished by brighter coloration to highlight the impact of the maintenance.

Researchers detail the implementation of continuous integration, benchmarking, and defect detection strategies for the libNEGF quantum transport code.

Despite increasing recognition of the importance of software quality in scientific discovery, ensuring robust and maintainable high-performance computing codes remains a significant challenge. This paper, ‘RSE of a Quantum Transport Code and its Effects’, details a two-year application of research software engineering practices-including continuous integration, automated testing, and continuous benchmarking-to the development and maintenance of libNEGF, a Fortran-based quantum transport code. Our systematic approach revealed a surprisingly prevalent class of defects-ranging from memory errors to misunderstandings of the underlying mathematical model-comparable to those found in other languages, and exposed performance regressions due to hardware configuration changes. Does proactive application of these RSE principles represent a necessary evolution for ensuring the sustainability and reproducibility of scientific software, and if so, how can these practices be more widely adopted?

Unveiling the Limits of Precision: The Electronic Structure Challenge

The foundation of modern materials science rests heavily on the ability to accurately model the electronic structure of matter. This requires sophisticated computational techniques, with Density Functional Theory (DFT) standing as a cornerstone method. DFT allows researchers to predict a material’s properties – from its conductivity and magnetism to its optical response and stability – by describing the interactions between electrons within the material. However, this accuracy comes at a significant cost; DFT calculations are inherently computationally intensive, demanding substantial processing power and time, particularly when dealing with complex systems containing many atoms. The challenge lies in balancing the need for precise modeling with the practical limitations of available computational resources, driving ongoing research into more efficient algorithms and approximation methods.

Density Functional Theory (DFT) stands as a cornerstone of modern materials science, enabling researchers to predict and understand the properties of matter at the atomic level. However, the very precision that defines DFT comes at a significant computational price; the method’s demands scale rapidly with the number of atoms in the system. Consequently, simulating realistically sized materials – such as complex alloys, proteins, or extended defects in crystals – often proves intractable, even with access to high-performance computing resources. This limitation forces scientists to either focus on smaller, simplified models, or to explore alternative, less computationally expensive methods, frequently necessitating a trade-off between accuracy and the ability to model systems of practical relevance. The quest for efficient and accurate electronic structure methods remains a central challenge in computational materials science, driving the development of algorithms and approximations designed to overcome these inherent limitations.

Computational efficiency often necessitates approximations in electronic structure calculations, and Density Functional Tight Binding (DFTB) represents a notable strategy for accelerating simulations. While DFTB significantly reduces computational demands compared to standard Density Functional Theory, this speedup isn’t without potential drawbacks. The method achieves efficiency by simplifying the complex many-body interactions between electrons, relying on a pre-calculated tight-binding Hamiltonian. However, the accuracy of DFTB is critically dependent on the quality of these pre-calculated parameters and the careful selection of basis sets; insufficient attention to these details can introduce errors in predicted material properties, particularly when dealing with systems exhibiting strong electronic correlations or complex bonding. Consequently, validation against more accurate, albeit computationally expensive, methods remains essential to ensure the reliability of DFTB results and to quantify the level of approximation introduced.

libNEGF: Forcing Reality to Yield Its Secrets

libNEGF is a software package implementing the Non-Equilibrium Green’s Function (NEGF) formalism, a many-body quantum mechanical approach used to model the behavior of electrons in systems driven out of equilibrium. This method is particularly well-suited for simulating nanoscale devices, such as transistors and quantum dots, where quantum effects and non-equilibrium conditions are prominent. NEGF calculates the $G^<(E,r,t)$ and $G^>(E,r,t)$ Green’s functions, which describe the probability amplitude for an electron to propagate through the system, allowing for the determination of current-voltage characteristics and other transport properties. Compared to simpler methods, NEGF offers a more accurate description of quantum interference effects and electron-electron interactions, crucial for understanding the behavior of modern electronic devices.

libNEGF’s core computational routines are implemented in Fortran, a programming language with a longstanding history in high-performance scientific and engineering applications. This choice leverages Fortran’s established strengths in numerical computation, particularly its efficient handling of array-based operations crucial for solving the $O(N^3)$ scaling equations inherent in the Non-Equilibrium Green’s Function (NEGF) method. While modern languages offer advantages in certain areas, Fortran continues to provide a performance baseline and a mature ecosystem of optimized libraries for linear algebra and other mathematical operations fundamental to electronic structure calculations.

libNEGF utilizes NVIDIA’s CUDA framework to enable parallel processing on both CPUs and GPUs, significantly reducing computation time for complex simulations. Effective implementation demands careful attention to parallelization strategies to maximize GPU throughput and minimize CPU overhead. Memory management is also critical; data transfer between the host CPU and the GPU device is a potential bottleneck, necessitating optimized data layouts and reduced data transfers. Furthermore, algorithms must be adapted to exploit the massively parallel architecture of GPUs, often involving decomposition of large matrices and vectors into smaller blocks for efficient processing. The balance between CPU and GPU workload must be tuned to achieve optimal performance, considering factors such as data size, algorithm complexity, and hardware specifications.

GPU utilization on the JUWELS Booster, visualized through the LLview job report, demonstrates periods of both idle time (purple) and high activity (green) throughout the job's execution. — GPU utilization on the JUWELS Booster, visualized through the LLview job report, demonstrates periods of both idle time (purple) and high activity (green) throughout the job’s execution.

The Silent Errors: Unmasking Hidden Vulnerabilities

Fortran programs, despite their computational capabilities, are prone to untrapped errors – conditions that produce incorrect results without halting execution. Analysis of the Debian package repository indicates a significant prevalence of these errors, with approximately 40% of packages containing at least one instance. This susceptibility arises from features of the language and common programming practices that do not always guarantee immediate failure upon encountering problematic conditions. Consequently, undetected errors can propagate through calculations, leading to silently corrupted data or inaccurate outputs, necessitating robust verification and validation strategies.

Strict adherence to programming language standards, such as those defined by ISO for Fortran, and the consistent utilization of compiler warnings are essential practices for identifying potential errors during software development. Compilers are equipped to detect deviations from established standards and flag potential issues like unused variables, implicit type conversions, or non-portable code constructs. Enabling and treating all compiler warnings as errors forces developers to address these issues proactively, preventing them from propagating into runtime errors. This practice significantly reduces the likelihood of encountering untrapped errors, undefined behavior, and other vulnerabilities that can compromise the reliability and correctness of the software. Ignoring these warnings can lead to subtle bugs that are difficult to diagnose and may manifest only under specific conditions.

Defensive programming involves anticipating potential failure points within code and implementing strategies to prevent or mitigate their impact. Techniques include input validation to ensure data conforms to expected formats and ranges, error handling to gracefully manage unexpected conditions, and the use of assertions to verify assumptions about program state during execution. Implementing bounds checking on array accesses, utilizing exception handling mechanisms where appropriate, and employing code reviews to identify potential vulnerabilities are also key practices. These proactive measures reduce the likelihood of untrapped errors and undefined behavior, ultimately increasing the reliability and stability of the software, even in complex environments like those found in high-performance computing.

Modern supercomputer architectures, such as JUWELS Booster, employ Non-Uniform Memory Access (NUMA) systems where memory access times vary depending on the memory location relative to the processor. This introduces complexity as optimal performance requires careful data placement and memory allocation strategies to minimize latency. Compounding this architectural challenge, a recent analysis of Debian packages revealed that 16% exhibit undefined integer behavior, potentially leading to unpredictable results and program instability. These instances of undefined behavior often arise from implicit conversions or overflow conditions and necessitate thorough code analysis and the implementation of robust error handling mechanisms to ensure correct execution in complex computing environments.

Analysis of a 3.5 million line C++ code base at CERN revealed approximately 40,000 bug fixes, demonstrating the substantial error rates inherent in large-scale software projects. This finding underscores the critical importance of rigorous testing methodologies, including unit tests, integration tests, and system-level validation, to identify and rectify defects before deployment. The sheer volume of identified bugs highlights that even well-maintained codebases require continuous and thorough testing to ensure reliability and prevent potentially critical failures, particularly in complex scientific applications.

Continuous Vigilance: A System for Sustained Reliability

The EoCoE-III Project leverages continuous benchmarking, driven by the JUBE tool, to meticulously monitor the performance of the libNEGF library as it evolves. This isn’t a one-time assessment, but rather a sustained, automated process that runs regularly, tracking key metrics over time. By consistently evaluating libNEGF against a standardized suite of tests, the project gains a detailed understanding of how modifications to the code, or upgrades to the underlying hardware, impact its computational efficiency. This data-driven approach allows developers to pinpoint performance bottlenecks, validate optimizations, and ensure that libNEGF remains at the forefront of nanoscale device simulation capabilities. The continuous nature of the benchmarking provides an ongoing record of performance, establishing a baseline for future improvements and facilitating proactive identification of potential regressions.

The iterative benchmarking process yields crucial data regarding the effects of modifications to libNEGF’s code and the introduction of new hardware. This allows developers to move beyond subjective assessments of performance and instead rely on quantifiable metrics to guide optimization efforts. By systematically evaluating each change, the project team can pinpoint improvements, identify regressions, and ensure that enhancements genuinely translate to faster, more efficient calculations. This data-driven approach not only accelerates performance gains but also fosters a deeper understanding of how libNEGF interacts with underlying hardware, ultimately leading to a more robust and adaptable codebase.

Effective management of the libNEGF codebase relies heavily on a robust system of version control and automated testing. Utilizing Git allows developers to track every modification, facilitating collaboration and enabling a return to previous, stable versions if necessary. This is powerfully coupled with a Continuous Integration/Continuous Deployment (CI/CD) pipeline implemented through GitLab. This automated system triggers benchmarking runs – powered by the JUBE tool – with each code change, providing immediate feedback on performance impacts and identifying potential regressions. By automating this crucial validation step, the CI/CD pipeline not only accelerates development but also significantly reduces the risk of introducing errors, ensuring a consistently reliable and optimized libNEGF library.

The pursuit of enhanced performance within libNEGF is intrinsically linked to its sustained dependability and future adaptability. A rigorous, systematic benchmarking process, driven by tools like JUBE, has proven crucial not only for identifying performance gains from code modifications and hardware advancements, but also for proactively uncovering subtle, yet critical, errors. This approach facilitated the detection and resolution of several previously untrapped issues – including signed integer overflows, out-of-bounds writes, memory leaks, double frees, and null pointer dereferences – significantly bolstering the software’s robustness. By consistently validating changes and maintaining a vigilant error-detection system, the project ensures libNEGF remains a reliable and maintainable resource for the scientific community, capable of evolving alongside future computational challenges.

The pursuit of robust, sustainable research software, as detailed in the development of libNEGF, mirrors a fundamental principle of understanding any complex system: deliberate probing. One must actively test the boundaries to truly grasp its inner workings. This echoes Tim Bern-Lee’s sentiment: “The Web is more a social creation than a technical one.” Just as the Web evolved through constant interaction and refinement, so too does high-performance computing software. The application of continuous integration and benchmarking isn’t merely about defect detection; it’s about iteratively challenging the code, pushing its limits, and fostering emergent behavior, ultimately revealing a deeper understanding of quantum transport phenomena.

What Breaks Next?

The successful application of research software engineering to libNEGF, as detailed within, isn’t a validation of process, but a precise mapping of its failure points. The code now survives a battery of automated tests, but survival is merely delayed dissolution. The interesting questions aren’t about what the code currently does right, but what carefully constructed inputs will inevitably reveal its next, unforeseen collapse. This isn’t pessimism; it’s an acknowledgement that robust understanding demands systematic dismantling.

Future work shouldn’t focus on extending functionality, but on intentionally stressing the limits of these automated systems. Can adversarial testing, designed to mimic the ingenuity of a frustrated user, expose subtle defects missed by conventional benchmarking? The current emphasis on defect detection is valuable, but the true prize lies in defect prediction – anticipating failure before it manifests. A predictive model demands a deep understanding of the code’s vulnerabilities, gained only through relentless, purposeful breakage.

Ultimately, the sustainability of research software isn’t about creating immortal code. It’s about creating systems that fail predictably, allowing for rapid diagnosis and repair. The goal isn’t perfection, but controlled demolition-a constant cycle of construction and deconstruction, driven by the fundamental principle that if one cannot break it, one doesn’t truly understand it.

Original article: https://arxiv.org/pdf/2605.21334.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Unveiling the Limits of Precision: The Electronic Structure Challenge

libNEGF: Forcing Reality to Yield Its Secrets

The Silent Errors: Unmasking Hidden Vulnerabilities

Continuous Vigilance: A System for Sustained Reliability

What Breaks Next?

See also: