Conversational diagnostic AI // help and augment doctors

Enhance clinical communication, accessibility, and quality of care by creating an AI conversational partner for clinicians and patients

sbagency
5 min readJan 15, 2024
https://blog.research.google/2024/01/amie-research-ai-system-for-diagnostic_12.html

AMIE was optimized for diagnostic conversations, asking questions that help to reduce its uncertainty and improve diagnostic accuracy, while also balancing this with other requirements of effective clinical communication, such as empathy, fostering a relationship, and providing information clearly.

The blog post discusses the development and evaluation of Articulate Medical Intelligence Explorer (AMIE), an AI system based on large language models (LLMs) designed for diagnostic reasoning and conversations in the medical domain. The goal is to enhance clinical communication, accessibility, and quality of care by creating an AI conversational partner for clinicians and patients. The study describes the challenges of approximating clinicians’ expertise and introduces a self-play based simulated dialogue learning environment to train AMIE for diverse medical conditions.

AMIE is evaluated through a randomized, double-blind crossover study involving text-based consultations with trained actors simulating patients. The study assesses diagnostic conversations based on various axes, including history-taking, diagnostic accuracy, clinical management, clinical communication skills, relationship fostering, and empathy. The results indicate that AMIE performs at least as well as primary care physicians (PCPs) in simulated diagnostic conversations, demonstrating greater diagnostic accuracy and performance in multiple axes.

The research acknowledges limitations, such as the study design not fully replicating real-world clinical practices and the need for further research to address issues like health equity, fairness, privacy, and technology robustness. Additionally, the article highlights a separate study where AMIE demonstrated standalone performance exceeding unassisted clinicians in generating differential diagnoses for challenging medical cases.

The authors emphasize that AMIE is a research-only system, not a product, and caution that extensive further scientific studies are required to ensure the safety, helpfulness, and accessibility of conversational, empathic, and diagnostic AI systems in healthcare. The ultimate goal is to explore the possibilities of aligning AI systems with the attributes of skilled clinicians, recognizing the dedication to principles like safety, communication, partnership, trust, and professionalism.

https://arxiv.org/pdf/2401.05654.pdf

At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians’ expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue. AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE’s performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI.

AI models (LLMs) have the ability to hallucinate, and the use of AI models seems dangerous. Who will be responsible for mistakes?

In general, relying on conversational diagnostics may not be the most effective concept. Symptoms described based on patients’ feelings and words may lack reliability.

But progress cannot be stopped, multi-factor and multi-modal diagnosis systems are the future of medicine. AMIE is a research-only system, not a product.

Today current AI systems can be used to help and augment human doctors, not to replace them. Conversational AI systems can primarily be utilized to assist with routine and time-consuming tasks. In the long term, AI systems will likely replace humans, at least in most routine tasks.

https://arxiv.org/pdf/2310.02374.pdf

Conversational Health Agents (CHAs) are interactive systems that provide healthcare services, such as assistance, self-awareness, and diagnosis. Current CHAs, especially those utilizing Large Language Models (LLMs), primarily focus on conversation aspects. However, they offer limited agent capabilities specifically lacking multi-step problem-solving, empathetic conversations, and multimodal data analysis. Our aim is to overcome these limitations. In this paper, we propose an LLM-powered framework to empower CHAs to generate a personalized response for users’ healthcare queries. This framework provides critical thinking, knowledge acquisition, and problem-solving abilities by integrating healthcare data sources, enabling multilingual and multimodal conversations, and interacting with various user data analysis tools. We illustrate the framework’s proficiency in handling complex healthcare tasks via a case study on stress level estimation, showcasing the agent’s cognitive and operational capabilities. Powered by our framework, the CHA can provide appropriate responses, when the user inquires about their stress level. To achieve this, it learns to collect photoplethysmogram signals, converts them into heart rate variability, and interprets them as indicators of stress levels. We provide the open-source framework on GitHub⋄ for the community.

--

--

sbagency
sbagency

Written by sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.

No responses yet