From Vulnerability Reports to Working Code: Automating Security Checks

Author: Denis Avetisyan

A new framework automatically generates custom code analysis queries from vulnerability descriptions, bridging the gap between natural language reports and actionable security insights.

A vulnerability pattern, distilled from existing CVE data via QLCoder, is repurposed as a CodeQL query for robust regression testing, nuanced variant analysis, and comprehensive patch validation—demonstrating a capacity to learn from past failures and proactively reinforce system resilience against decay.

QLCoder leverages large language models and a vector database to synthesize CodeQL queries for improved static analysis of security vulnerabilities.

Despite the increasing reliance on static analysis for proactive security, crafting effective vulnerability queries remains a significant bottleneck, demanding specialized expertise. To address this challenge, we introduce QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities, an agentic framework that automatically generates CodeQL queries directly from Common Vulnerabilities and Exposures (CVE) metadata. By embedding a large language model within an iterative refinement loop constrained by structured code analysis tools, QLCoder achieves a substantial improvement in query correctness – synthesizing valid queries that detect vulnerabilities in 53.4% of cases, compared to 10% with a standalone LLM. Could this approach unlock a new era of automated vulnerability discovery and remediation, enabling more scalable and responsive security practices?

The Inevitable Expansion of Vulnerability

Modern software development prioritizes rapid iteration and frequent releases. This agility, however, expands the attack surface and generates a continuous flow of potential vulnerabilities. The sheer volume of code changes introduces complexity that traditional security measures struggle to address effectively. Traditional static analysis tools often fall short in this dynamic environment, producing high rates of false positives that overwhelm security teams and obscure genuine threats. This demands significant manual effort to validate findings and prioritize remediation, leaving critical vulnerabilities undetected. Automated, precise, and scalable vulnerability detection is therefore paramount to maintain software integrity. Effective solutions must minimize false positives while accurately identifying genuine threats, enabling security teams to focus on the most critical issues. Like the subtle accrual of technical debt, each simplification in a codebase creates a shadow cost – a future vulnerability waiting to be discovered, and time itself is the ultimate auditor.

QLCoder: Automating the Pursuit of Secure Code

QLCoder presents an agentic framework designed to automate the synthesis of CodeQL queries from natural language descriptions of vulnerabilities, leveraging information found in Common Vulnerabilities and Exposures (CVE) descriptions. The system employs Large Language Model (LLM) Agents to translate vulnerability descriptions into functional CodeQL code, reducing the need for manual query creation. A core component is its implementation of Structured Prompting, guiding the LLM through defined instructions and constraints to ensure accurate and high-quality CodeQL queries. These prompts focus the LLM’s attention on critical vulnerability aspects, such as affected code patterns and potential exploitation vectors. QLCoder incorporates an Iterative Refinement Loop to continuously improve query performance and correctness, evaluating proposed queries against a validation dataset and refining prompts accordingly. To further enhance semantic understanding and query quality, the system utilizes a Vector Database containing extensive CodeQL documentation and curated example queries.

Measuring Performance: A Quantitative Assessment

QLCoder’s performance was assessed using key metrics including Success Rate, Precision, Recall, and False Positive Rate, providing a quantitative evaluation of its ability to accurately identify vulnerabilities. A CodeQL Validator plays a crucial role in the Iterative Refinement Loop, evaluating the correctness and coverage of generated queries and minimizing false positives for practical application. Ablation studies were conducted to determine the impact of individual components on overall performance, aiding in model optimization. QLCoder demonstrates superior performance compared to baseline agents such as Codex and Gemini, achieving a success rate of 53.4% on a dataset of 111 Java CVEs.

Towards Proactive Security: Integrating Automation into the Workflow

QLCoder demonstrates a capacity for automated vulnerability query synthesis, potentially streamlining security reviews and lessening the workload on cybersecurity professionals. This automation addresses a critical need for scalability in software security assessment, particularly as codebases grow in complexity and the demand for frequent security checks increases. The system’s architecture leverages the Language Server Protocol (LSP), enabling real-time vulnerability detection directly within integrated development environments (IDEs). This immediate feedback loop empowers developers to address security flaws during the coding process, minimizing remediation costs and enhancing the overall security posture of the software. QLCoder’s underlying methodology is designed for adaptability, suggesting potential integration with other static analysis tools and programming languages. Future work will center on improving the system’s generalization capabilities and optimizing prompt engineering techniques. Like all systems, QLCoder represents a fleeting moment of order wrested from the inevitable entropy of software development—a testament to the enduring struggle against decay.

The pursuit of automated vulnerability detection, as exemplified by QLCoder, reveals a fundamental truth about complex systems. Every failure—a discovered vulnerability—is a signal from time, a testament to the inevitable entropy that affects even the most meticulously crafted code. QLCoder’s iterative refinement process, combining the generative power of LLMs with the precision of CodeQL, acknowledges this decay. It doesn’t seek to eliminate flaws, but to engage in a continuous dialogue with the past, adapting and strengthening defenses as new signals emerge. As G. H. Hardy observed, ‘The most beautiful and profound mathematical theories are those which arise from a single, simple idea.’ Similarly, QLCoder’s strength lies in its elegant synthesis of existing tools to address a persistent challenge, accepting that ongoing adaptation is the only true path to resilience.

What’s Next?

The synthesis of CodeQL queries from natural language vulnerability descriptions, as demonstrated by QLCoder, is not a solution, but a temporary stay of execution. Each successful query is merely a localized reduction of entropy, a brief ordering of the inevitable decay inherent in any complex software system. The true challenge lies not in detecting known vulnerabilities—those are echoes of past failures—but in anticipating the shape of future errors. Current approaches treat CVEs as definitive markers, yet these are post-mortem analyses, snapshots of systems already compromised.

Future work must address the inherent limitations of relying on past incidents as predictors of future states. A shift toward proactive error modeling, perhaps leveraging techniques from formal methods and adversarial testing, may yield more resilient systems. The vector database component, while effective for retrieval, remains a historical archive; its potential expands when coupled with predictive models that simulate system behavior under stress.

Ultimately, the field will be judged not by the volume of vulnerabilities detected, but by the grace with which systems degrade. Each incident is a step toward maturity, but only if the lessons are integrated into the core architecture, not simply patched onto the surface. The pursuit of perfect security is a delusion; the goal should be systems that fail predictably, and recover elegantly—accepting that time, not code, is the ultimate arbiter.

Original article: https://arxiv.org/pdf/2511.08462.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Inevitable Expansion of Vulnerability

QLCoder: Automating the Pursuit of Secure Code

Measuring Performance: A Quantitative Assessment

Towards Proactive Security: Integrating Automation into the Workflow

What’s Next?

See also: