Finding the Keys: AI-Powered Discovery of Cryptographic Software

Author: Denis Avetisyan

Researchers are leveraging the power of artificial intelligence to automatically identify software components that handle sensitive cryptographic operations.

This study demonstrates a collaborative large language model approach for identifying cryptographically relevant software packages, enhancing crypto-agility and preparing for the transition to post-quantum cryptography.

Maintaining comprehensive cryptographic inventories is increasingly challenging given escalating security threats and the need for proactive migration to post-quantum cryptography. This paper, ‘Detecting Cryptographically Relevant Software Packages with Collaborative LLMs’, introduces a novel approach leveraging large language models to automatically identify cryptographically relevant packages within complex software ecosystems. Our research demonstrates that ensembles of locally hosted LLMs, guided by optimized prompts and a majority-voting consensus, can effectively filter for cryptographic software, significantly reducing manual effort. Could this methodology provide a scalable and privacy-preserving solution for organizations preparing for the transition to a quantum-resistant future?

The Inevitable Crypto-Asset Blind Spot

Contemporary software ecosystems are fundamentally built upon a vast and ever-increasing dependence on cryptographic components. These components, ranging from encryption algorithms to digital signature schemes, underpin critical functionalities such as data protection, secure communication, and authentication. However, this widespread integration creates a complex attack surface, as vulnerabilities within these cryptographic building blocks – or their improper implementation – can have cascading effects across entire systems. The sheer volume and diversity of these crypto-assets, coupled with the speed of software development, present significant challenges for organizations attempting to comprehensively assess and mitigate the associated risks. Consequently, a lack of clear understanding regarding the cryptographic landscape within software introduces substantial security exposures and necessitates more robust methods for identifying and managing these potential weaknesses.

Identifying cryptographic components within modern software – the ‘Crypto-Assets’ essential to security – historically relies on painstaking manual reviews and audits. These processes are inherently limited in scope, frequently missing newly integrated or obscure libraries, and struggle to address the sheer velocity of software updates. As development cycles accelerate and open-source dependencies proliferate, the static nature of these traditional methods becomes increasingly inadequate; a snapshot assessment quickly becomes outdated, leaving organizations vulnerable to risks associated with unmanaged or compromised cryptographic implementations. Consequently, maintaining a current and comprehensive inventory of Crypto-Assets proves exceptionally challenging, hindering effective vulnerability management and proactive security posture.

The absence of comprehensive crypto-asset visibility significantly compromises an organization’s capacity for robust risk management and preventative security protocols. Without a clear understanding of the cryptographic components embedded within their software supply chain, businesses struggle to accurately assess potential vulnerabilities and prioritize remediation efforts. This informational gap extends beyond simple identification; it impacts the ability to monitor cryptographic configurations, detect outdated or compromised algorithms, and respond effectively to emerging threats like $Quantum-resistant cryptography$ . Consequently, organizations face heightened exposure to security breaches, data loss, and reputational damage, all stemming from an inability to proactively manage the cryptographic risks inherent in modern software development and deployment.

Maintaining robust security in modern software demands a shift towards ‘Crypto-Agility’-the capacity to swiftly identify, assess, and replace cryptographic components-and this necessitates scalable, automated solutions. Manual inventory and analysis of crypto-assets simply cannot keep pace with the velocity of software updates and the constant emergence of new vulnerabilities. Automated systems can continuously scan codebases, identify all instances of cryptographic functions and algorithms, and track their associated metadata-such as version numbers and certificate validity. This proactive approach enables organizations to swiftly respond to newly discovered weaknesses, migrate to stronger algorithms, and ensure ongoing compliance with evolving security standards, ultimately minimizing risk and bolstering the resilience of critical systems against increasingly sophisticated threats.

LLMs: A Pragmatic Approach to Crypto-Asset Discovery

Large Language Models (LLMs) are utilized to perform static analysis of software packages to detect the presence of cryptographic elements. This process involves submitting source code or compiled binaries to the LLM, which then identifies code patterns associated with known cryptographic libraries – such as OpenSSL, libsodium, or Botan – and algorithms – including AES, RSA, and SHA-256. The LLM’s ability to understand code semantics allows it to differentiate between legitimate cryptographic usage and potentially malicious code employing similar patterns. The analysis isn’t limited to direct library imports; the LLM can also identify algorithmic implementations directly within the codebase, offering a more comprehensive assessment of cryptographic functionality.

Accurate identification of cryptographic code within software packages relies heavily on the construction of specific prompts used to query Large Language Models (LLMs). These prompts must clearly define the desired output – namely, the detection of code exhibiting cryptographic functionality – and provide sufficient context to avoid false positives. Effective prompt engineering involves iteratively refining the prompt’s phrasing, incorporating examples of relevant code patterns, and specifying the expected format of the LLM’s response. Furthermore, prompts should instruct the LLM to prioritize precision and recall, balancing the need to identify all instances of cryptographic code with the minimization of incorrect classifications. The inclusion of negative examples – code that does not represent cryptographic functionality – can also significantly improve the LLM’s accuracy.

Knowledge-driven pattern matching supplements LLM analysis by introducing a layer of expert-defined rules for improved accuracy. This process involves utilizing curated lists of keywords associated with cryptographic functions, algorithms, and protocols – such as “SHA256”, “RSA”, or “elliptic curve” – to narrow the search space. Furthermore, regular expressions are employed to identify specific code patterns indicative of cryptographic implementation, even with variations in naming conventions or code structure. By combining LLM-identified candidates with these pre-defined patterns, the system reduces false positives and enhances the precision of crypto-asset identification within software packages.

The identification process leverages open-source frameworks, specifically Ollama and GPT4All, to facilitate access to and querying of a variety of Large Language Models (LLMs). These frameworks enable the deployment and operation of LLMs locally, bypassing reliance on external APIs and associated costs or limitations. Utilizing multiple LLMs through these frameworks increases the breadth of code analysis and improves the reliability of identifying cryptographic assets by mitigating potential biases or inaccuracies inherent in any single model. This approach allows for a more comprehensive and robust assessment of software packages, enhancing the accuracy of crypto-asset identification.

Ensemble Methods: Reducing Stochasticity and Improving Reliability

To address the stochastic nature of Large Language Model (LLM) outputs, a ‘Majority Vote’ ensemble method was implemented. This approach involves querying multiple LLMs with the same prompt and aggregating their responses via a voting mechanism. The final output is determined by the most frequent response across all models in the ensemble. This technique reduces the impact of individual model errors or biases, enhancing the overall stability and consistency of the results. The system is designed to identify the predominant response, effectively filtering out outlier or anomalous predictions generated by any single LLM instance.

The implementation of an ensemble method, utilizing responses from multiple Large Language Models, resulted in a demonstrable improvement in crypto-asset identification performance. Specifically, this approach yielded a combined F1-score of 0.82, representing a balanced measure of both precision and recall. Precision, in this context, indicates the accuracy of positive identifications, while recall reflects the proportion of actual crypto-assets correctly identified. The F1-score of 0.82 signifies a high level of performance in both metrics, demonstrating the effectiveness of aggregating LLM outputs for this task.

The evaluation methodology utilized the ‘Dnf Package Manager’, the default package manager for Fedora Linux, to assemble a dataset of software packages for analysis. This approach allowed for programmatic access to a diverse and representative sample of applications commonly found within a standard Fedora Linux installation. The Dnf Package Manager facilitated the collection of package metadata, including names, descriptions, and dependencies, which served as the basis for identifying potential crypto-asset related software. This ensured a controlled and reproducible data sourcing process, minimizing bias and enabling consistent evaluation of the LLM ensemble’s performance.

Analysis of the ensemble method revealed a limited benefit from increasing the number of Large Language Models (LLMs) beyond a specific threshold. The ‘Effective Sample Size’, a metric indicating the contribution of each additional LLM to the overall accuracy, approached a maximum value of 2 when utilizing an ensemble of five models. This indicates that the incremental gains in performance diminish rapidly after adding a small number of LLMs; subsequent models contribute negligibly to improving the identification of crypto-assets, suggesting a point of diminishing returns for this particular application and dataset.

The ensemble method demonstrated a Recall score of 0.85 during evaluation, indicating its ability to correctly identify 85% of all relevant crypto-asset packages within the tested dataset. This metric signifies a substantial improvement in identifying a high proportion of true positives, minimizing the instances of failing to recognize actual crypto-assets. The achieved Recall value confirms the effectiveness of aggregating LLM responses as a strategy for improving the identification rate of crypto-assets compared to relying on a single model’s output.

Mapping the Web: Understanding Systemic Risk Through Dependencies

Package Dependency Graphs offer a powerful method for understanding the intricate connections within modern software ecosystems. These graphs visually represent software packages as nodes, with edges illustrating the dependencies – particularly those related to cryptographic libraries – that link them. By mapping these relationships, researchers and developers can trace how a vulnerability in a single, seemingly isolated cryptographic component could propagate through a complex network of applications. This visualization is not merely a static snapshot; it dynamically reveals potential cascading failures, highlighting which packages are most vulnerable to compromise and identifying critical points where a security incident could have far-reaching consequences. The resulting dependency map provides a clear, actionable overview of systemic risk, moving beyond individual package vulnerabilities to assess the overall resilience of the software supply chain.

Package Dependency Graphs enable the identification of potential cascading failures by visually representing the intricate relationships between software components and their cryptographic underpinnings. A vulnerability in a single, widely-used cryptographic library can propagate through numerous dependent packages, creating a systemic risk that extends far beyond the initially affected software. By mapping these dependencies, researchers and developers can proactively model the impact of potential failures, pinpoint critical choke points, and assess the overall resilience of complex software ecosystems. This approach moves beyond isolated vulnerability assessments to provide a holistic understanding of how a single compromised component could trigger widespread disruption, facilitating more effective risk mitigation strategies and bolstering the stability of the digital infrastructure.

A holistic understanding of software vulnerability requires mapping not only package dependencies, but also the cryptographic components within those packages. This methodology integrates automated crypto-asset identification – pinpointing specific cryptographic libraries and their versions – with package dependency graph analysis. The resulting composite view reveals the complete attack surface, extending beyond direct package relationships to encompass the underlying cryptographic infrastructure. This comprehensive approach allows security professionals to identify weaknesses not just in how packages connect, but also in the cryptographic algorithms and implementations they rely upon, enabling a more nuanced and effective risk assessment and mitigation strategy.

A robust security posture necessitates moving beyond reactive vulnerability patching towards proactive risk mitigation, and this methodology directly supports that shift. By mapping the intricate web of cryptographic dependencies within software ecosystems, security teams gain the ability to anticipate potential failure points before exploits occur. This allows for preemptive strengthening of critical components and the development of targeted remediation strategies. Furthermore, the speed with which this system identifies affected packages in response to newly discovered vulnerabilities dramatically shortens the window of exposure, enabling rapid containment and minimizing potential damage – a crucial capability in today’s rapidly evolving threat landscape. The result is a transition from simply responding to breaches, to actively shaping a more resilient and secure software foundation.

The methodology prioritizes identifying a broad spectrum of cryptographic dependencies, as evidenced by its optimization for recall with a weight of 0.7. This strategic focus yielded a weighted F1-score of 0.82, indicating a robust balance between precision – minimizing false positives – and comprehensive identification of relevant assets. This score demonstrates the system’s ability to reliably detect the vast majority of cryptographic dependencies within software packages, even at the cost of a slightly increased rate of identifying non-critical elements. Such a trade-off is crucial for systemic risk assessment, where failing to identify a single vulnerability could have cascading consequences, making a high recall rate paramount for proactive security management and effective vulnerability response.

The pursuit of automated cryptographic asset discovery, as detailed in this work, feels predictably optimistic. It’s a tidy solution – leveraging locally hosted large language models and majority voting – to a problem guaranteed to evolve beyond its initial parameters. One recalls Andrey Kolmogorov’s observation: “The most important discoveries are often the most obvious.” The obviousness here lies in applying LLMs to SBOM analysis, but the real challenge – anticipating the next cryptographic vulnerability or the subtle shifts in package dependencies – remains. This research offers a structured panic mechanism, a temporary reprieve, before the inevitable entropy of production exposes the limitations of even the most elegantly engineered systems. The promise of crypto-agility, while appealing, will eventually confront the harsh reality that every abstraction dies in production – at least it dies beautifully.

What’s Next?

The demonstrated capacity to locate cryptographic packages within a software bill of materials using locally hosted large language models feels… familiar. The current enthusiasm for LLMs echoes previous cycles of automation, where elegant solutions in research environments inevitably encounter the messy reality of production code. The reliance on prompt engineering, while effective in a controlled setting, invites a future of adversarial prompts and the constant recalibration of linguistic filters. One anticipates a period of diminishing returns as attackers refine techniques to obfuscate cryptographic dependencies.

A more pressing issue remains largely unaddressed: the accuracy of identifying a cryptographic package is not the same as verifying its correct implementation. A tool that highlights the presence of OpenSSL is only useful if one also assesses whether it’s configured securely and patched against known vulnerabilities. This work successfully locates the problem; solving it is a separate, and arguably more difficult, endeavor. The eventual scaling of this approach-to truly catalog the cryptographic landscape of an entire organization-will likely reveal a surprisingly large number of false positives, demanding substantial human review.

The push towards post-quantum cryptography (PQC) migration, touted as a key driver for this research, will undoubtedly expose further limitations. The field is still actively defining PQC standards; the LLMs will need constant updating to recognize newly adopted algorithms. One suspects the true bottleneck won’t be finding the crypto, but the sheer cost and complexity of replacing it, regardless of how efficiently it’s located.

Original article: https://arxiv.org/pdf/2603.07204.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Inevitable Crypto-Asset Blind Spot

LLMs: A Pragmatic Approach to Crypto-Asset Discovery

Ensemble Methods: Reducing Stochasticity and Improving Reliability

Mapping the Web: Understanding Systemic Risk Through Dependencies

What’s Next?

See also: