Can AI Learn to Spot Contract Flaws?

Author: Denis Avetisyan


New research explores how large language models can be trained to identify critical security vulnerabilities in smart contracts, even with limited data.

A comparison reveals that a dataset-constructed example of reentrancy mirrors the characteristics of a typical reentrancy pattern, demonstrating the dataset’s ability to accurately represent complex system behaviors.
A comparison reveals that a dataset-constructed example of reentrancy mirrors the characteristics of a typical reentrancy pattern, demonstrating the dataset’s ability to accurately represent complex system behaviors.

This paper investigates a novel decomposition and fusion approach to improve compositional generalization in large language models for detecting reentrancy vulnerabilities in smart contracts.

Despite the remarkable progress of large language models (LLMs) in natural language processing, their application to specialized domains like smart contract security remains challenging, particularly given limited training data. This paper, ‘Towards Compositional Generalization in LLMs for Smart Contract Security: A Case Study on Reentrancy Vulnerabilities’, introduces a post-training algorithm that decomposes complex vulnerability detection-specifically, reentrancy flaws-into linearly independent atomic tasks. By fusing the learning from these decomposed tasks with a LoRA adapter and low-rank normalization, the approach achieves state-of-the-art accuracy and a 20% recall improvement on real-world contracts. Could this compositional approach unlock the potential of LLMs for broader application in data-driven software security and beyond?


The Evolving Landscape of Smart Contract Security

Smart contracts represent a paradigm shift in transactional security, yet their very architecture introduces unique vulnerabilities. These self-executing agreements, designed to automate and enforce the terms of an agreement, are often incredibly complex – lines of code can quickly swell to thousands, mirroring the intricacy of traditional legal contracts but without the benefit of human oversight during execution. This complexity, coupled with the fact that smart contracts frequently manage substantial digital assets, creates a compelling target for malicious actors. A single flaw in the code can lead to the theft or manipulation of funds, as demonstrated by several high-profile exploits in decentralized finance (DeFi). The immutable nature of blockchain – while a strength for data integrity – also means that once a vulnerable contract is deployed, it is exceedingly difficult, if not impossible, to rectify the issue without deploying a new contract and migrating assets, making proactive security measures critically important.

Conventional vulnerability detection techniques, designed for static codebases, frequently falter when applied to smart contracts due to the contracts’ inherent complexity and dynamic execution. These contracts often rely on intricate interactions between multiple functions and external accounts, creating a stateful and time-dependent logic that is difficult for static analysis tools to fully comprehend. Consequently, subtle flaws – such as reentrancy vulnerabilities or improper access control – can remain hidden during initial audits, only to be exploited once the contract is deployed and managing substantial digital assets. This discrepancy between perceived security and actual risk has resulted in numerous high-profile exploits, demonstrating that traditional methods are insufficient for safeguarding the rapidly growing ecosystem of decentralized applications and highlighting the urgent need for more sophisticated and dynamic analysis techniques.

Deconstructing Vulnerability Detection with Modular Analysis

Atomic Task Decomposition, as applied to smart contract vulnerability detection, involves partitioning the overall problem into a series of discrete, independent subtasks. This approach moves away from treating vulnerability analysis as a single, monolithic process and instead defines it as a composition of smaller units, each with a specific objective. These subtasks are designed to be atomic – meaning they represent the smallest logical unit of work – and are mutually exclusive, minimizing overlap and facilitating parallel processing. By reducing the complexity of each individual task, the method aims to improve the accuracy and efficiency of vulnerability detection, particularly when leveraging the capabilities of Large Language Models which benefit from focused input and well-defined objectives.

The process of smart contract vulnerability detection is facilitated by breaking down analysis into three core subtasks: External Call Identification, State Update Identification, and Data Dependency Analysis. External Call Identification involves locating all instances where the contract interacts with other contracts or external addresses, which represent potential attack vectors. State Update Identification focuses on pinpointing modifications to the contract’s stored data, as these changes are crucial for understanding the contract’s behavior and identifying potential vulnerabilities related to incorrect state transitions. Finally, Data Dependency Analysis examines the relationships between external calls and state updates, tracing how data flows through the contract and highlighting areas where manipulated data could lead to exploits. These subtasks, when performed in conjunction, provide a granular understanding of the contract’s functionality and improve the accuracy of vulnerability detection.

Decomposing smart contract vulnerability detection into atomic subtasks enables more efficient utilization of Large Language Models (LLMs). Rather than requiring LLMs to analyze entire contracts holistically, this approach isolates specific code characteristics – such as external function calls or state variable modifications – for focused analysis. This targeted application reduces the computational burden on the LLM, allowing it to dedicate its resources to identifying vulnerabilities within these well-defined scopes. Consequently, LLMs can achieve higher accuracy and reduced latency in vulnerability detection compared to processing entire contracts as a single unit, as the complexity of the input is significantly reduced.

A pipeline constructs synthetic smart contract data by collecting interfaces, generating valid and diverse statements, compiling them into minimal contracts, extracting control flow graphs with Slither, and assembling the results into instruction-answer templates for training.
A pipeline constructs synthetic smart contract data by collecting interfaces, generating valid and diverse statements, compiling them into minimal contracts, extracting control flow graphs with Slither, and assembling the results into instruction-answer templates for training.

Tracing Contract Behavior Through Data Dependencies

Data Dependency Analysis is employed to identify how data values are propagated and utilized throughout a smart contract’s execution. This process involves tracing the relationships between variables, functions, and storage locations to determine which data elements influence the outcome of contract operations. By constructing a data flow graph, the analysis reveals how inputs affect outputs, highlighting potential vulnerabilities arising from untrusted data sources or improper data handling. Specifically, it maps each variable’s definition, use, and modification points, enabling the detection of data-related flaws such as information leakage, incorrect state updates, and arithmetic errors. The resulting dependency map is crucial for pinpointing the root cause of vulnerabilities and assessing their potential impact on contract security.

Data Dependency Analysis leverages Control Flow Graphs (CFGs) and Data Flow Graphs (DFGs) to represent the structure of contract code and track data movement. CFGs depict the possible execution paths through the code, illustrating the order in which instructions are executed. DFGs, conversely, focus specifically on how data values are defined and used, mapping the relationships between variables and expressions. Nodes in a DFG represent operations or variables, while edges indicate data dependencies – for example, an edge from variable ‘x’ to operation ‘y’ signifies that ‘y’ uses the value of ‘x’. By visually representing these dependencies, analysts can trace the flow of data from its source, through any transformations, to its ultimate use, aiding in the identification of potential security vulnerabilities related to data manipulation and state changes.

The integration of Control Flow Graphs and Data Flow Graphs enables the identification of contract vulnerabilities stemming from anomalous data handling. Specifically, discrepancies between intended data usage-as defined by the control flow-and actual data dependencies-mapped by the data flow graph-highlight potential issues. These issues include unintended modifications of contract state variables due to incorrect data propagation, unauthorized access to sensitive data, and logical errors arising from unexpected data interactions. Analyzing these combined graphs allows for tracing the origin and impact of data throughout the contract’s execution, thereby revealing vulnerabilities that might not be apparent through individual analysis methods.

Adapting and Fusing Expertise for Robust Vulnerability Detection

Large Language Models, while powerful, often require extensive and costly retraining to adapt to specific tasks. This work leverages LoRA – Low-Rank Adaptation – adapters, a technique that circumvents full model retraining by introducing a smaller set of trainable parameters. These adapters function alongside the pre-trained model, allowing for focused fine-tuning without altering the original weights, dramatically reducing both computational expense and the time needed for adaptation. This efficient approach enables rapid customization of the model for vulnerability detection, making it practical to apply to evolving codebases and emerging threat landscapes, and paving the way for continuous improvement without prohibitive resource demands.

The system leverages a fusion mechanism to intelligently combine the strengths of multiple, specialized LoRA adapters. Rather than relying on a single, monolithic model, this approach decomposes vulnerability detection into distinct aspects, with each adapter trained to excel in a specific area – such as identifying improper access control or arithmetic overflows. The fusion mechanism then aggregates the outputs of these adapters, weighting their contributions based on their individual confidence and expertise. This allows the system to benefit from a diversity of perspectives, mitigating the risk of overlooking subtle vulnerabilities and ultimately leading to a more robust and accurate detection process. By synthesizing insights from multiple specialized modules, the fusion mechanism effectively enhances the overall performance and reliability of the vulnerability analysis.

The system’s enhanced robustness and accuracy stem from a synergistic approach to vulnerability detection. By combining the specialized insights of multiple LoRA adapters through a fusion mechanism, the system mitigates the limitations inherent in any single detection strategy. This ensemble method effectively reduces both false positive and false negative rates, resulting in a more reliable identification of reentrancy vulnerabilities. Empirical results demonstrate a substantial performance increase, with the system achieving 98.2% detection accuracy and a 94.7% F1 score-improvements of 20% and 16.8% respectively over traditional analysis tools and single-task LoRA baselines. Furthermore, a recall rate of 87.1% surpasses the performance of the leading traditional analyzer, Slither, by a significant margin of 23.77%.

The system demonstrates exceptional performance in identifying reentrancy vulnerabilities within smart contracts, achieving 98.2% detection accuracy. This represents a substantial advancement over existing state-of-the-art methods and significantly outperforms traditional analysis tools, exhibiting a 20% improvement when tested on real-world contract code. Quantitative evaluation reveals a high F1 Score of 94.7%, a 16.8% increase over single-task LoRA adaptation and a 5.7% improvement over previous benchmarks. Furthermore, the system achieves a Recall rate of 87.1%, surpassing the performance of the leading traditional analyzer, Slither, by a notable 23.77%, indicating a marked reduction in missed vulnerabilities and enhanced reliability.

The pursuit of compositional generalization, as demonstrated in this study of smart contract security, echoes a fundamental principle of system design. A complex vulnerability, like reentrancy, isn’t a monolithic failure, but the emergent behavior of interacting components. This necessitates breaking down reasoning into discrete, atomic tasks-a strategy for managing complexity. As John von Neumann observed, “There’s no describing the last digit of pi.” While seemingly unrelated, the sentiment applies here; attempting to directly assess the entirety of a smart contract’s security is often intractable. Instead, focusing on the fundamental building blocks-the atomic tasks-and their interactions provides a pathway towards robust and scalable data-driven security solutions, acknowledging that the system’s behavior is the sum of its parts.

Beyond the Patch: Charting a Course for Robustness

The pursuit of compositional generalization in large language models for smart contract security reveals a familiar truth: one cannot simply replace a faulty component without considering the integrity of the surrounding architecture. This work, by dissecting complex vulnerability detection into atomic tasks, offers a promising, yet provisional, step toward addressing the limitations of data-driven security. The observed gains, particularly in data-sparse environments, suggest that the manner of learning – the decomposition and fusion – is at least as crucial as the quantity of data itself. However, this is not a resolution, merely a refinement of the problem.

Future work must grapple with the fundamental question of what constitutes ‘understanding’ in the context of code. Current approaches treat vulnerabilities as patterns to be identified, but a truly robust system will require an ability to reason about the intent of the code, and to anticipate unforeseen interactions. Furthermore, the reliance on specific vulnerability classes-reentrancy being but one-highlights the need for a more holistic framework, capable of adapting to novel attack vectors.

Ultimately, the challenge lies not in building more sophisticated detectors, but in constructing systems that are inherently resistant to exploitation. The model, like the contracts it seeks to secure, is only as strong as its weakest link. The focus must shift from reactive patching to proactive design – a principle often overlooked in the rush to deploy.


Original article: https://arxiv.org/pdf/2601.06914.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-01-14 00:04