Negotiating for Better Requirements: An AI-Powered Approach

Author: Denis Avetisyan


A new framework uses artificial intelligence and multi-agent negotiation to balance competing priorities in the complex process of defining software and system requirements.

The QUARE framework establishes a five-phase pipeline-encompassing parallel agent-based generation, dialectical conflict negotiation, KAOS goal model integration with topology validation, multi-layer verification and compliance checking, and standardized engineering materials generation-to systematically construct and validate complex systems, ensuring adherence to defined objectives and constraints.
The QUARE framework establishes a five-phase pipeline-encompassing parallel agent-based generation, dialectical conflict negotiation, KAOS goal model integration with topology validation, multi-layer verification and compliance checking, and standardized engineering materials generation-to systematically construct and validate complex systems, ensuring adherence to defined objectives and constraints.

This paper introduces QUARE, a system leveraging large language models and formal methods to improve requirements engineering through explicit negotiation and verifiable models.

Balancing competing quality attributes remains a central challenge in requirements engineering, despite advancements in large language models. This paper introduces ‘QUARE: Multi-Agent Negotiation for Balancing Quality Attributes in Requirements Engineering’, a novel framework that formulates requirements analysis as a structured negotiation between specialized agents representing key qualities like safety, efficiency, and trustworthiness. Through iterative proposal, critique, and synthesis, QUARE generates verifiable and compliant KAOS goal models, achieving significant improvements in coverage and semantic preservation compared to existing approaches. Does this principled architectural decomposition and explicit interaction offer a more effective path toward automated requirements engineering than simply scaling model size?


The Erosion of Predictability in Requirements Engineering

Conventional requirements engineering, designed for predictable systems, now faces significant challenges with the advent of increasingly complex, autonomous technologies. These systems, capable of independent action and adaptation, introduce inherent uncertainty that traditional methods struggle to capture and manage effectively. The static, document-centric approaches historically employed often fail to adequately address emergent behaviors and the dynamic interplay between system components and their environments. Consequently, requirements become ambiguous, incomplete, or even contradictory as the system evolves, leading to costly rework, delays, and potentially unsafe outcomes. The core difficulty lies in shifting from specifying what a system should do to defining how it should behave under a vast range of unforeseen circumstances, demanding a fundamental rethinking of the requirements process.

The proliferation of AI-driven applications and autonomous systems is fundamentally challenging traditional requirements engineering practices. These systems, characterized by complex interactions with dynamic environments and the need for adaptive behavior, necessitate a shift towards more robust and scalable approaches. Simply defining static functional requirements proves insufficient; instead, engineers must model intricate behavioral constraints, safety protocols, and ethical considerations. This demand extends beyond merely capturing what a system should do, to precisely defining how it should respond to unforeseen circumstances and operate within acceptable boundaries. Consequently, research focuses on automating requirements elicitation, employing formal methods for verification, and leveraging machine learning to predict potential system failures – all crucial steps in building trustworthy and reliable autonomous technologies.

Traditional requirements engineering methodologies, such as KAOS, while foundational, frequently encounter limitations when applied to contemporary, large-scale projects. These methods, often reliant on manual specification and analysis, struggle to effectively capture the intricacies of complex systems-particularly those incorporating artificial intelligence and autonomous functionalities. The lack of automated reasoning and formal verification capabilities creates significant bottlenecks, hindering the ability to consistently validate requirements, manage changes, and ensure system correctness. Consequently, development teams experience increased effort, prolonged timelines, and a heightened risk of defects, ultimately impacting the overall success and reliability of the final product. Addressing these expressiveness and automation gaps is therefore critical for enabling the efficient and robust development of increasingly sophisticated systems.

QUARE: A Rational Framework for Multi-Agent Requirements Negotiation

The QUARE framework employs a multi-agent system to model stakeholders and facilitate requirements engineering through dialogical negotiation. Each agent represents a stakeholder with specific goals and knowledge, and interacts with other agents to collaboratively refine and validate requirements. This negotiation process isn’t simply a compromise; it’s a structured exchange of arguments and justifications, enabling the identification and resolution of conflicts based on rational discourse. The system’s architecture is designed to mimic real-world stakeholder interactions, promoting a more comprehensive and nuanced understanding of project needs than traditional, monolithic requirements gathering approaches.

The QUARE framework employs a Multi-Agent System (MAS) to concurrently investigate multiple facets of the requirements space. This parallel exploration is achieved by assigning distinct agents to represent stakeholder perspectives, quality attributes, or specific system functionalities. Each agent operates autonomously, generating and negotiating requirements based on its assigned goals and knowledge. This concurrent activity significantly reduces the time required for requirements elicitation and validation compared to traditional sequential methods. The MAS architecture allows for the simultaneous consideration of diverse viewpoints, identification of potential conflicts early in the process, and accelerated convergence towards a complete and consistent set of requirements.

QUARE utilizes Large Language Models (LLMs) through Retrieval-Augmented Generation (RAG) to enhance requirements justification and consistency. RAG enables the LLM to access and incorporate relevant information from a knowledge base during requirements generation and analysis, improving accuracy and reducing hallucination. Furthermore, the framework integrates principles of Argumentation Theory, specifically employing argument schemes and critical questions, to formally evaluate the rationale behind each requirement. This allows for the explicit representation of supporting evidence and potential counterarguments, facilitating a structured process for identifying and resolving inconsistencies or weaknesses in the requirements set, and ensuring each requirement is demonstrably well-justified.

The QUARE framework explicitly integrates quality attributes as defined by the ISO/IEC 25010 standard, encompassing characteristics such as functionality, performance, reliability, usability, security, maintainability, and portability. These attributes are not treated as post-development verification criteria, but are actively incorporated into the requirements elicitation and negotiation phases. Each requirement within QUARE is assessed and refined against these characteristics, ensuring traceability from initial stakeholder needs to concrete system properties. This proactive approach facilitates the development of systems demonstrably aligned with pre-defined quality expectations and enables systematic validation against recognized industry standards.

Geometric and Semantic Quantification of Requirements Quality

QUARE utilizes Convex Hull Volume and Mean Distance to Centroid as quantifiable metrics for assessing the characteristics of a requirements set when represented within a multi-dimensional quality space. The Convex Hull Volume, calculated by encompassing all requirement points with the smallest possible convex set, provides a measure of the overall diversity of the requirements. Concurrently, the Mean Distance to Centroid – the average Euclidean distance of each requirement from the centroid of all requirements – quantifies the dispersion or spread of the requirements around their central tendency. These values, calculated for each quality attribute, offer a numerical profile of requirement coverage and can indicate potential gaps or redundancies within the elicited set. \text{Mean Distance} = \frac{1}{n} \sum_{i=1}^{n} d(x_i, c) , where d is the Euclidean distance, x_i represents each requirement point, and c is the centroid.

The assessment of requirement set quality using geometric metrics relies on representing requirements as points within a multi-dimensional quality attribute space. Convex Hull Volume (CHV) calculates the volume encompassed by the outermost points of this set, indicating the breadth of coverage of the quality attribute space; a larger CHV suggests more comprehensive coverage. Mean Distance to Centroid (MDC) measures the average distance of each requirement from the centroid of the requirement set, providing an indication of the dispersion or spread of requirements; a greater MDC suggests a more diverse and less clustered set. Combined, CHV and MDC offer quantifiable values representing both the extent and distribution of requirements within the defined quality space, allowing for objective evaluation and comparison of requirement sets.

BERTScore, a metric leveraging pre-trained language models, is incorporated into the QUARE framework to quantitatively assess the semantic similarity between elicited requirements and the originally stated stakeholder needs. This evaluation is performed by comparing contextual embeddings of requirement statements against those representing stakeholder needs, providing a precision-recall based similarity score. Testing has demonstrated a semantic preservation rate of 94.9%, indicating a high degree of correlation between the meaning of the elicited requirements and the intended stakeholder expectations as captured by the model. This allows for objective measurement of how well the requirements reflect the underlying needs, supporting identification of potential misinterpretations or omissions.

Automated quality assessment, facilitated by the implemented geometric and semantic metrics, allows for iterative refinement of requirements sets by providing data-driven feedback on coverage and consistency. The system calculates Convex Hull Volume and Mean Distance to Centroid, alongside BERTScore for semantic similarity, and generates quantifiable scores representing the quality of the elicited requirements. These scores are then used in an iterative process where requirements can be modified or added to improve the overall quality, with subsequent metric recalculations providing a measurable assessment of the changes. This closed-loop process enables developers to proactively identify and address quality deficiencies, reducing the need for costly rework later in the development lifecycle and improving the alignment between requirements and stakeholder needs.

Empirical Validation via OpenReBench: Demonstrating QUARE’s Superiority

The challenge of objectively assessing requirements engineering frameworks necessitates a consistent and reproducible evaluation environment. To address this, the OpenReBench platform was developed as a standardized benchmark, allowing for direct comparison of tools like QUARE against established methodologies. This platform provides a common dataset and evaluation metrics, eliminating ambiguity and enabling researchers to rigorously test and validate the effectiveness of different approaches to requirements elicitation, analysis, and management. By offering a level playing field, OpenReBench fosters innovation and accelerates progress in the field, ensuring that advancements are based on empirical evidence rather than subjective assessments.

Evaluations utilizing the OpenReBench platform reveal that the QUARE framework demonstrably outperforms existing requirements engineering approaches. Specifically, QUARE achieves a 98.2% compliance coverage rate, representing a substantial 105% improvement when contrasted with both the MARE and iReDev frameworks. This heightened level of compliance suggests a greater ability to accurately capture and adhere to specified requirements, potentially leading to more reliable and successful project outcomes. The significant margin of improvement highlights QUARE’s effectiveness as a robust solution for ensuring thorough requirement fulfillment and minimizing the risk of costly errors or rework during development.

Agent specialization within the QUARE framework demonstrably enhances the scope of requirements coverage, as evidenced by a 53.6% improvement in Convex Hull Volume when contrasted with systems employing single-agent approaches. This metric, representing the area encompassed by the identified requirements, suggests that distributing responsibility amongst specialized agents allows for a more thorough exploration of the problem space. Rather than a single agent attempting to address all facets of a requirement, the framework’s architecture fosters focused expertise, leading to the discovery of a wider range of relevant considerations and ultimately, a more comprehensive and robust set of elicited requirements. This increase in coverage isn’t simply about quantity; it indicates a superior ability to capture the full dimensionality of the problem, reducing the risk of overlooked edge cases and ensuring a more complete understanding of stakeholder needs.

Evaluations using OpenReBench indicate that the QUARE framework achieves a Quality-Axis Coverage of 0.20, a metric designed to assess the breadth and balance of requirements addressed. This score suggests a more even distribution of coverage across different quality attributes when contrasted with competing frameworks; specifically, MARE registered a score of 0.30 and iReDev reached 0.45, potentially indicating a focus on fewer, more specific requirements. Importantly, QUARE accomplished this balanced coverage within a runtime of 55.4 seconds, demonstrating an efficient approach to comprehensive requirements engineering and suggesting a viable solution for projects demanding both thoroughness and timeliness.

The presented QUARE framework embodies a commitment to rigorous, mathematically grounded requirements engineering. It moves beyond simply achieving functional correctness to explicitly balancing quality attributes – a pursuit aligning with Donald Knuth’s observation that ā€œPremature optimization is the root of all evil.ā€ While QUARE leverages the power of LLMs, its core strength lies in the formal verification aspect, ensuring generated models are not merely plausible, but provably compliant. This focus on provability, rather than superficial testing, reinforces the idea that a solution’s true elegance stems from its consistent boundaries and predictable behavior, mirroring the spirit of Knuth’s emphasis on algorithmic purity and mathematical consistency. The framework prioritizes a demonstrable correctness, a concept central to Knuth’s philosophies.

Beyond Compromise: Future Directions

The presented framework, while a step toward formalizing the inherently messy process of requirements elicitation, does not dissolve the fundamental tension between competing quality attributes. It merely shifts the locus of compromise from tacit assumptions within a single model to explicit negotiation between agents. This is not a trivial gain, but it is not a resolution. The true challenge lies not in finding a balance, but in proving its optimality – or, failing that, rigorously characterizing the nature of the trade-offs made. To claim ā€˜compliance’ without demonstrating the absence of unintended consequences is, at best, optimistic.

Further work must address the limitations inherent in relying on Large Language Models as rational agents. While LLMs can mimic negotiation, their internal reasoning remains opaque. Verification of agent behavior-ensuring that stated preferences align with actual actions-is crucial. Moreover, the formalization of ā€˜quality attributes’ themselves demands greater scrutiny. Are these truly atomic, or do they conceal hidden dependencies that render the entire negotiation space ill-defined? A focus on provable guarantees, rather than empirically observed performance, remains paramount.

Ultimately, the success of such systems will depend not on their ability to generate more requirements, but on their capacity to generate correct ones. Optimization without analysis is self-deception, a trap for the unwary engineer. The field must resist the allure of incremental improvements and instead strive for a mathematically grounded understanding of requirements engineering as a problem of formal specification and verifiable reasoning.


Original article: https://arxiv.org/pdf/2603.11890.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-03-14 04:28