Ask Your EHR: A New Approach to Clinical Question Answering

Author: Denis Avetisyan


Researchers have developed a system that directly queries electronic health records, offering a more efficient and reliable path to actionable insights than traditional text-based searches.

The construction of FHIRPath-QA proceeds through a defined process, establishing a framework for querying and validating healthcare data standards.
The construction of FHIRPath-QA proceeds through a defined process, establishing a framework for querying and validating healthcare data standards.

FHIRPath-QA introduces a benchmark and dataset for evaluating question answering systems over FHIR electronic health records using executable queries.

Despite increasing patient access to electronic health records (EHRs), reliably answering specific questions remains a challenge due to the limitations of current interfaces. This work introduces ‘FHIRPath-QA: Executable Question Answering over FHIR Electronic Health Records’, presenting a new dataset and benchmark for patient-specific question answering that leverages the open standard FHIRPath for querying real-world clinical data. Our research demonstrates that shifting from free-text generation to FHIRPath query synthesis significantly improves efficiency and reduces reliance on large language models. Could this text-to-FHIRPath approach pave the way for safer, more interoperable consumer health applications and unlock new possibilities in clinical data analysis?


The Erosion of Access: Data Silos in Modern Healthcare

Accessing data within Electronic Health Records often presents a significant obstacle to timely clinical research and informed decision-making. Traditionally, retrieving specific patient information necessitates navigating complex data structures and employing Structured Query Language (SQL), a specialized programming language unfamiliar to many clinicians and researchers. This reliance on technical expertise creates bottlenecks, delaying investigations into patient outcomes, treatment efficacy, and population health trends. The cumbersome process not only limits the scope of inquiries but also increases the resources required, potentially diverting funding and personnel from direct patient care. Consequently, valuable insights remain locked within EHR systems, hindering the potential for data-driven improvements in healthcare delivery and innovation.

Electronic Health Records, while rich with patient information, present a significant challenge in converting clinical questions into usable data. The very structure of these records is often fragmented and non-standardized, varying widely between institutions and even within departments. This lack of uniformity means that a simple question – such as “How many patients with diabetes experienced a heart attack last year?” – requires complex parsing and translation into specific database queries. Furthermore, the absence of universally adopted querying interfaces forces researchers and clinicians to become proficient in specialized languages like SQL, creating a considerable barrier to entry and slowing down the process of knowledge discovery. Consequently, valuable insights remain locked within the data, hindering both timely clinical decision-making and the advancement of medical research; a streamlined, natural language interface is crucial to unlock this potential.

Answer accuracy improves with larger training sets, demonstrating that both base and fine-tuned models benefit from increased patient data in the query-first generation pipeline.
Answer accuracy improves with larger training sets, demonstrating that both base and fine-tuned models benefit from increased patient data in the query-first generation pipeline.

The Promise of Linguistic Bridges: LLMs and Data Liberation

Large Language Models (LLMs) present a potential solution to the challenge of accessing data within Electronic Health Records (EHRs) by allowing queries formulated in natural language. Traditionally, extracting specific information from EHRs requires proficiency in Structured Query Language (SQL), a barrier for many healthcare professionals. LLMs circumvent this requirement by directly interpreting human language requests and translating them into actionable data retrievals. This capability democratizes access to EHR data, enabling clinicians, researchers, and administrators to independently explore patient information without relying on specialized database administrators or IT support. The use of natural language interfaces powered by LLMs aims to improve efficiency and facilitate data-driven decision-making within healthcare settings.

Large Language Models (LLMs), while powerful, are constrained by their finite context window, limiting the amount of input data they can effectively process for a single query. Furthermore, LLMs can exhibit a propensity to “hallucinate,” generating responses that appear plausible but are factually incorrect or unsupported by the provided data. To mitigate these limitations, Retrieval-Augmented Generation (RAG) is employed. RAG functions by first retrieving relevant documents or data snippets from a knowledge source based on the user’s query, and then providing these retrieved results as context to the LLM before generating a response. This external knowledge injection reduces reliance on the LLM’s potentially inaccurate internal parameters and expands the effective context window, leading to improved response reliability and factual accuracy.

Supervised fine-tuning improves Large Language Model (LLM) performance on Electronic Health Record (EHR) data by training the model on datasets comprised of clinical queries and corresponding correct answers. This process adapts the LLM to the specific vocabulary, phrasing, and data structures common in clinical contexts, and allows it to learn patterns specific to how healthcare professionals formulate questions. Evaluations demonstrate that fine-tuned LLMs achieve approximately 80% accuracy when processing novel paraphrases of questions – that is, questions reworded but with the same intended meaning – representing a substantial improvement over general-purpose LLMs and indicating enhanced robustness to variations in user input.

A multi-stage query-involving resource type selection (blue), specimen filtering (green), temporal constraints (orange), and observation selection (purple)-identifies whether organisms were detected in a patient's recent serology/blood microbiology test, with a final constraint (grey) confirming a positive result.
A multi-stage query-involving resource type selection (blue), specimen filtering (green), temporal constraints (orange), and observation selection (purple)-identifies whether organisms were detected in a patient’s recent serology/blood microbiology test, with a final constraint (grey) confirming a positive result.

Text-to-FHIRPath: Establishing Deterministic Pathways to Data

Text-to-FHIRPath addresses the challenge of accessing data within FHIR (Fast Healthcare Interoperability Resources) systems by automating the translation of human-readable questions into executable FHIRPath queries. FHIRPath is a query language specifically designed for navigating and extracting information from FHIR resources, providing a standardized method for data retrieval. This automated conversion eliminates the need for manual query construction, enabling users to pose questions in natural language and receive structured data in return. The system leverages the inherent structure of FHIR resources to accurately map linguistic elements to appropriate FHIRPath expressions, thereby facilitating efficient and precise data access.

Deterministic query generation within the Text-to-FHIRPath system ensures result verifiability by producing queries based on defined rules rather than probabilistic language model outputs. This contrasts with typical large language model (LLM) approaches prone to “hallucinations” – generating plausible but factually incorrect responses. By guaranteeing a traceable, rule-based translation from natural language to FHIRPath, the system minimizes errors and maximizes the reliability of data retrieved from Electronic Health Records (EHRs). This deterministic process directly contributes to improved accuracy and allows for validation of the query logic, ensuring consistent and dependable results.

The utilization of FHIRPath as the query language is central to enabling data interoperability and access within Electronic Health Records (EHRs). FHIRPath, a standardized query language for FHIR resources, allows for the precise and consistent extraction of structured clinical data regardless of the specific EHR system. This standardization bypasses the need for custom parsing or translation layers often required when querying disparate EHRs, thereby facilitating seamless data exchange and aggregation. By directly querying the FHIR representation of clinical data using a common language, this method ensures that information can be consistently retrieved and utilized across different healthcare settings and applications, promoting interoperability and data-driven insights.

Supervised fine-tuning of a model on question-query pairs yields significant efficiency gains compared to retrieval-based methods for converting natural language to FHIRPath. Specifically, this approach demonstrates a 391-fold reduction in token usage, lowering computational costs and improving processing speed. Despite this efficiency, the method maintains a high degree of accuracy, achieving nearly 80% success on novel paraphrases of questions not encountered during training, indicating strong generalization capabilities and robustness to variations in user input.

Towards Adaptive Systems: Intelligent Agents and the Future of Healthcare

The convergence of large language models (LLMs) and Text-to-FHIRPath translation is fostering a new generation of intelligent agents poised to revolutionize clinical data handling. These agents move beyond simple data retrieval by interpreting natural language requests and converting them into precise FHIRPath queries – a standardized language for navigating electronic health records (EHRs). This capability allows for autonomous interaction with complex EHR systems, enabling agents to independently locate, extract, and analyze specific clinical information without manual intervention. Consequently, these systems can synthesize patient data, identify relevant trends, and potentially support clinical decision-making with increased efficiency and accuracy, all while adhering to established data standards and privacy protocols.

Agentic interaction, driven by the synergy of large language models and FHIRPath, promises a significant reshaping of clinical practice. This technology enables autonomous agents to execute complex tasks within electronic health record systems – tasks previously demanding considerable time from healthcare professionals. By automating repetitive processes such as data extraction, report generation, and even preliminary diagnosis support, these agents free clinicians to focus on direct patient care and more nuanced decision-making. The potential extends beyond simple automation; these systems can proactively identify relevant information, synthesize insights from disparate data sources, and present findings in a readily digestible format, ultimately boosting efficiency and reducing the administrative burden currently experienced throughout healthcare settings. This shift toward intelligent automation isn’t merely about speed, but about enabling a more focused and effective healthcare workforce.

Modern healthcare relies on a complex web of interconnected systems, and the Fast Healthcare Interoperability Resources (FHIR) standard has emerged as a pivotal enabler of seamless data exchange between them. FHIR APIs provide a standardized method for accessing and sharing clinical information, moving away from the historically fragmented landscape of proprietary data formats. This interoperability is achieved through the use of RESTful APIs and widely adopted data formats, allowing different electronic health record (EHR) systems, mobile applications, and research platforms to communicate effectively. Consequently, FHIR not only streamlines data retrieval for clinicians and researchers but also unlocks opportunities for innovative applications, such as personalized medicine and population health management, by fostering a more connected and collaborative healthcare ecosystem.

The MIMIC-IV dataset represents a pivotal advancement for the field of healthcare artificial intelligence, offering a freely accessible and comprehensive collection of de-identified clinical data from intensive care units. This resource is uniquely valuable because it allows researchers to rigorously test and refine new technologies, such as those combining Large Language Models with FHIRPath, in a realistic clinical context. By providing a substantial volume of patient data-including notes, medications, laboratory results, and vital signs-MIMIC-IV facilitates the development of algorithms capable of autonomously interacting with Electronic Health Records. Crucially, the dataset’s scale enables statistically significant evaluations, moving beyond small pilot studies to robust validation of agentic systems and automated workflows, ultimately accelerating the translation of innovative AI solutions into practical healthcare applications.

The pursuit of data interoperability, as exemplified by FHIRPath-QA, mirrors the inevitable entropy of all systems. Just as code requires constant refactoring to maintain functionality, healthcare data standards demand continuous adaptation to evolving clinical needs. This benchmark, with its emphasis on a query-first approach, isn’t merely about achieving accurate answers; it’s about building systems that age gracefully within the complex landscape of electronic health records. Paul Erdős observed, “A mathematician knows a lot of things, but a good mathematician knows what to ignore.” Similarly, FHIRPath-QA’s efficiency gains stem from focusing on relevant data, streamlining the process and acknowledging that not all information is equally valuable in the passage of time.

What Lies Ahead?

The introduction of FHIRPath-QA marks a point on the timeline, not an arrival. The system’s chronicle, logged in this benchmark, reveals a preference for query-first methodologies, a pragmatic advantage in the face of data’s inherent entropy. Yet, the illusion of ‘solved’ interoperability should be resisted. FHIRPath, while a structured language, remains susceptible to the ambiguities of clinical expression – a limitation less about the path and more about the terrain it maps.

Future iterations will inevitably confront the issue of drift. Electronic health records are not static archives, but evolving narratives. A query effective today may be meaningless tomorrow as terminology shifts and data models mature. The true measure of any question-answering system isn’t just its current accuracy, but its resilience to these inevitable changes-its capacity to age gracefully.

Furthermore, the focus will likely shift from merely finding answers to understanding them. Systems capable of contextualizing responses, of recognizing nuance and uncertainty, will be paramount. This is not simply a matter of improving retrieval, but of building systems that can reason about clinical information-a far more challenging, and arguably more worthwhile, endeavor.


Original article: https://arxiv.org/pdf/2602.23479.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-03-03 04:12