Smarter QEMU Coverage: A Plugin That Doesn’t Slow You Down

Author: Denis Avetisyan

Researchers have developed a new QEMU plugin, NQC2, that significantly accelerates code coverage analysis for embedded systems without compromising performance.

NQC2 delivers up to 8.5x speed improvements over existing methods through asynchronous writing and buffering techniques for non-intrusive code coverage.

While code coverage is essential for robust software development, traditional instrumentation-based approaches falter when applied to bare-metal embedded systems lacking operating systems and file systems. This paper introduces NQC2: A Non-Intrusive QEMU Code Coverage Plugin, a novel solution that extracts coverage data directly from the QEMU virtualized execution environment without modifying the target software. By employing techniques such as asynchronous writing and buffering, NQC2 achieves up to an 8.5x performance improvement over comparable methods. Could this non-intrusive approach redefine code coverage analysis for resource-constrained embedded platforms and accelerate the development of more reliable systems?

The Inevitable Cost of Coverage

The pursuit of robust, dependable embedded systems necessitates thorough code coverage, yet achieving this presents unique difficulties. Traditional methods of measuring how much of a program’s code is executed during testing frequently demand substantial computational resources or necessitate modifications to the target hardware – an unacceptable trade-off in resource-constrained environments. These intrusive techniques can alter the system’s timing and behavior, potentially masking critical errors or introducing new ones, and are often impractical for bare-metal applications where operating system support is absent. Consequently, developers face a significant challenge: ensuring the safety and reliability of embedded software without incurring unacceptable performance penalties or hardware costs through conventional coverage analysis approaches.

Traditional code coverage tools, while effective in many software development contexts, often present significant challenges when applied to embedded systems, particularly those running on bare metal. Solutions like Xilinx Etrace, designed to trace execution flow, frequently introduce substantial performance overhead due to the increased data logging and processing demands. Furthermore, these techniques may necessitate modifications to the target hardware – adding probes or utilizing dedicated debug ports – which is often impractical or impossible in resource-constrained environments or when dealing with sealed, production-ready devices. This reliance on intrusive methods limits the ability to thoroughly assess software reliability without impacting real-time behavior or requiring costly hardware revisions, creating a critical gap in the verification process for embedded applications where safety and efficiency are paramount.

The increasing complexity of embedded systems, coupled with stringent safety requirements across industries like automotive, aerospace, and medical devices, is dramatically heightening the challenge of achieving effective code coverage. Traditional methods often introduce unacceptable overhead, impacting real-time performance and potentially altering system behavior-a critical flaw when striving for reliable validation. Developers face a growing need for techniques that can accurately measure code execution without significantly burdening limited resources, such as memory and processing power. This demand extends beyond simply identifying untested lines; it necessitates a granular understanding of code paths and branching logic, all while maintaining the integrity of the bare-metal environment and minimizing the impact on timing-critical operations. Consequently, the pursuit of non-intrusive, efficient code coverage analysis is no longer merely a best practice, but a fundamental necessity for building dependable and secure embedded applications.

Achieving robust code coverage in embedded systems fundamentally depends on the precision of instrumentation and its minimal impact on real-time performance. Traditional methods often introduce substantial overhead through the insertion of monitoring code, potentially altering the system’s behavior and masking critical timing issues. Sophisticated instrumentation techniques now focus on minimizing this interference, employing strategies like compiler-based optimization and hardware-assisted tracing to collect coverage data with reduced runtime penalties. This allows developers to gain a trustworthy assessment of code execution, identifying untested branches and potential vulnerabilities without compromising the system’s responsiveness or functional integrity – a delicate balance crucial for safety-critical applications where even minor deviations can have significant consequences.

NQC2: A Less Disruptive Approach

NQC2 operates as a plugin within the QEMU-TCG (Tiny Code Generator) framework, providing code coverage analysis without requiring modifications to the target code or its compilation process. This non-intrusive approach is achieved by leveraging QEMU’s dynamic binary translation capabilities; NQC2 intercepts and analyzes instructions after they have been translated from the target architecture’s native format into TCG instructions. Integration is seamless as the plugin directly hooks into the TCG execution pipeline, allowing coverage data to be collected during normal simulation without altering the execution flow or introducing instrumentation overhead beyond the analysis itself. This allows for analysis of both user-mode and kernel-mode code within the emulated environment.

NQC2 leverages QEMU’s Dynamic Binary Translation (DBT) to achieve code coverage analysis without requiring alterations to the target application. DBT involves translating source code instructions into an intermediate representation during runtime, which is then executed by QEMU. NQC2 instruments this translated intermediate code, effectively monitoring execution flow without needing to recompile the original binary or access its source code. This approach eliminates the need for potentially disruptive modifications to the target application, simplifying the integration of coverage analysis into existing development workflows and enabling analysis of closed-source or pre-built binaries.

NQC2 leverages the Tiny Code Generator (TCG) plugin interface within QEMU to perform code coverage analysis by monitoring instructions after they have been translated from the guest architecture to the host’s. This approach avoids the overhead and limitations of traditional instrumentation methods, such as requiring source code modifications or recompilation of the target binary. By observing the translated instructions executed by the TCG backend, NQC2 can accurately track code execution paths without interfering with the simulation process itself. This direct observation of translated instructions results in a clean and efficient data collection process, minimizing performance impact and providing precise coverage metrics.

NQC2 leverages the extensive architecture support inherent in QEMU, enabling code coverage analysis on a diverse set of target platforms. QEMU’s modular design and backends for numerous CPUs – including ARM, x86, MIPS, RISC-V, and PowerPC – are directly accessible by NQC2 without requiring platform-specific adaptations. This compatibility extends to embedded systems, as QEMU supports various virtualization and emulation modes suitable for resource-constrained environments. Consequently, NQC2 can be deployed for coverage analysis across a broad spectrum of software targets, from desktop applications to firmware running on embedded devices, without necessitating separate toolchains or modifications for each architecture.

Performance Gains: Asynchronous Buffering and Merging

NQC2 minimizes performance overhead through the implementation of Asynchronous Writer threads. These threads operate in parallel with the core tracing processes, allowing data to be written to storage without blocking the execution of the traced application. This parallel data handling significantly reduces the performance impact associated with tracing, as the primary application threads are not stalled waiting for I/O operations to complete. The Asynchronous Writer threads buffer data in memory and write it to disk independently, improving overall system responsiveness and maintaining a higher level of performance during tracing sessions.

Multi-buffering in NQC2 improves performance by decoupling data writing from data generation, thereby minimizing contention for shared resources. This technique utilizes multiple buffers to stage data, allowing the core processing to continue without waiting for I/O operations to complete. The implementation results in increased data throughput and, consequently, a measurable reduction in Elog file size; testing has demonstrated up to a 44% decrease in file size through the use of multi-buffering.

Merging techniques within NQC2 reduce Elog file size by consolidating redundant data blocks. Specifically, during the Coremark benchmark, 42.08% of etrace_entry64 blocks were successfully merged, resulting in decreased storage requirements and improved data access times. This process identifies and combines identical or highly similar blocks, effectively compressing the Elog file without data loss and enhancing overall system efficiency.

NQC2 utilizes QEMU’s Instruction Set Simulator (ISS) and Translation Block (TB) processing to facilitate efficient data collection during simulation environments. The ISS enables NQC2 to monitor instruction execution, while TB processing allows for granular tracking of translated code blocks. This combination provides detailed performance metrics without significantly impacting simulation speed. Data is collected directly from QEMU’s internal structures, minimizing overhead and ensuring accuracy. The system is designed to capture relevant data points as code transitions between interpreted and translated states, providing a comprehensive view of runtime behavior.

The Inevitable Improvement, and Where We Go From Here

Traditional code coverage techniques often prove impractical for embedded systems due to their limited processing power and memory. NQC2 presents a viable alternative by employing a novel approach that minimizes overhead and resource consumption. This is achieved through a carefully optimized architecture that allows for effective coverage analysis even on severely constrained platforms. Unlike methods requiring extensive recompilation or intrusive debugging, NQC2 facilitates non-intrusive analysis, making it particularly suitable for safety-critical applications where minimizing interference is paramount. The tool’s ability to deliver meaningful coverage data without significant performance penalties broadens the scope of testability for a wider range of embedded devices, ultimately contributing to more robust and reliable software.

Non-intrusive code coverage, as facilitated by NQC2, represents a substantial refinement in embedded systems development practices. Traditional methods often require modification of the application’s source code or the introduction of instrumentation, which carries the inherent risk of altering program behavior and potentially masking or introducing new defects. By operating without such invasive procedures, NQC2 allows developers to assess the thoroughness of their testing without disturbing the integrity of the original code. This simplification streamlines the development workflow, reducing the time and effort required for both test implementation and debugging. Consequently, the decreased potential for unintended side effects contributes to more reliable and robust embedded systems, minimizing the possibility of errors slipping through to deployment.

The practical utility of NQC2 is significantly enhanced by its seamless integration with Lcov, a widely adopted tool for generating code coverage reports. This compatibility allows developers to readily visualize and analyze test suite effectiveness, pinpointing areas of code that lack sufficient testing with minimal overhead. Beyond simple reporting, this integration streamlines the implementation of continuous integration (CI) pipelines; coverage data can be automatically collected and assessed as part of each build, ensuring that code quality remains consistently high and regressions are quickly identified. This automation not only improves software reliability but also reduces the time and effort required for thorough testing, fostering a more efficient and robust development workflow.

Recent evaluations demonstrate that NQC2 represents a substantial leap forward in code coverage analysis efficiency, particularly when contrasted with Xilinx’s established QEMU-based methodology. Benchmarking reveals NQC2 consistently achieves up to 8.5 times faster execution speeds – a significant reduction in slowdown during the coverage assessment process. This performance improvement isn’t merely academic; it translates directly into faster development cycles, enabling engineers to more rapidly iterate on designs and identify potential vulnerabilities within embedded systems. By minimizing the overhead associated with coverage analysis, NQC2 empowers developers to conduct more frequent and thorough testing, ultimately contributing to the creation of more robust and reliable hardware.

The pursuit of perfect tooling often feels like chasing a mirage. NQC2, with its asynchronous writing and buffering to achieve up to 8.5x performance gains in code coverage, is a testament to pragmatic compromise. It isn’t elegance that sustains a system, but resilience. As Donald Knuth observed, “Premature optimization is the root of all evil,” and this plugin embodies that wisdom. It doesn’t attempt to redefine code coverage; it simply makes the existing process…survivable. The gains aren’t about achieving some theoretical ideal, but about making instrumentation practical in the face of real-world performance constraints, especially within the demanding landscape of embedded systems. Everything optimized will one day be optimized back, and NQC2 seems prepared for that inevitable cycle.

What’s Next?

The pursuit of ever-finer-grained code coverage, even within the confines of a virtualized environment, feels…predictable. NQC2 achieves a performance uplift, certainly. But the underlying truth remains: someone, somewhere, will attempt to push the boundaries of instrumentation further. And then, inevitably, production will find a way to expose the limitations of even asynchronous buffering. It’s a perpetual arms race, elegantly disguised as progress.

The real challenge isn’t simply measuring code execution; it’s extracting genuinely useful information from the deluge of data. A 8.5x speedup simply highlights more areas where tests don’t reach. Future work might reasonably focus on intelligent filtering, automated test-case generation guided by coverage gaps, or – a particularly cynical thought – acceptance that complete coverage is a comforting myth.

One suspects the next iteration won’t be a plugin, but a dedicated hardware acceleration unit for coverage analysis. Or perhaps a return to the analog days, with someone meticulously tracing execution paths with a multimeter. Everything new is old again, just renamed and still broken. The core problem isn’t solved, merely shifted.

Original article: https://arxiv.org/pdf/2601.02238.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Inevitable Cost of Coverage

NQC2: A Less Disruptive Approach

Performance Gains: Asynchronous Buffering and Merging

The Inevitable Improvement, and Where We Go From Here

What’s Next?

See also: