The Benefits and Risks of Medical AI Chatbots

By Michael Awood

September 7, 2023

Large Language Models (LLMs) are gaining prevalence in healthcare. OpenAI, in collaboration with Microsoft Research, is exploring the use cases of such technology in healthcare and medical applications. They aim to understand the opportunities, limitations, and risks in this context. Google’s Language Model for Dialog Application or “LaMDA” (replaced by their Pre-training with Abstracted Langage Modeling “PaLM”) and OpenAI’s previous Generative Pretrained Transformer 3.5 (GPT-3.5) have also been under study for medical applications. Interestingly, these LLMs, although not specifically trained for healthcare, demonstrated competence in the medical field using open-source internet information.

These LLMs have been used to develop the next tool in a physician’s pocket – a medical AI chatbot. By integrating OpenAI’s GPT-4 with medical expertise, a chatbot that engages users conversationally was created. Users initiate a session by entering a query or “prompt” in natural language, and GPT-4 responds, creating a human-like conversation. The system’s ability to maintain the context of an ongoing conversation enhances its usability and natural feel.

However, its responses are sensitive to the prompt’s wording, necessitating careful development and testing of prompts. GPT-4 can accurately answer definitive prompts, but it can also engage in complex interactions with prompts that lack a single correct answer. It provides error checking, identifying mistakes in its work and human-generated content.

GPT-4’s medical knowledge can serve tasks such as consultation, diagnosis, and education. It can read medical research material and engage in an informed discussion about it. However, like human reasoning, GPT-4 is fallible. It makes mistakes, but it can also identify them. The medical AI chatbot can write medical notes based on exchanges between providers and patients, even making sense of the subjective, objective, assessment, and plan (SOAP) format. It also includes billing codes as necessary. The chatbot can understand authorisation information, and prescriptions, that comply with Health Level Seven (HL7) and Fast Healthcare Interoperability Resources (FHIR) standards.

However, problems such as false responses or “hallucinations” pose dangers in a medical context. For instance, the AI chatbot created a medical note recording a body-mass index (BMI) without any related detail entered into the system. In another instance, the chatbot indicated no problems for the patient, but the clinician identified signs of medical complications.

While these tools can significantly enhance the consultation process and assist both the provider and patient, they are not without flaws and risks. The article speaks to a solution where the chatbot re-reads its information and it correctly identified these errors.

The article also highlights other improvements needed in the LLMs and chatbots. This is just the beginning of new possibilities and new risks. But, there is no denying that these tools hold the potential to optimise healthcare services.

Reference url

Recent Posts

Donanemab Alzheimer’s Approval: A New Era in Treatment

By João L. Carapinha

August 8, 2025

The EMA granted marketing authorization for donanemab, a monoclonal antibody targeting beta-amyloid plaques. It treats early symptomatic Alzheimer's disease (mild cognitive impairment or mild dementia with confirmed amyloid pathology). This approval highlights donanemab's role in Alzheimer's trea...
Building a Better Ecosystem for Digitally Enabled Diabetes Care in Europe
Digitally enabled diabetes care is rapidly transforming the management of Type 2 diabetes (T2D) across Europe. How does expanding access to these digital solutions improve health outcomes and reduce costs for individuals and health systems? Early evidence and recent policy recommendations (publis...
AI in Real-World Evidence: Insights from Mitch Higashi

By Staff Writer

August 7, 2025

In this update, we review the Q&A with Mitch Higashi, PhD, ISPOR’s Associate Chief Science Officer published on Pharmalive. It discusses AI in Real-World Evidence (RWE) and health technology assessments (HTA). Key themes include AI-driven early disease detection and NLP tackling unstructured ...