14 minutes to read - Apr 5, 2024

Application of ChatGPT in Routine Diagnostic Pathology: Promises, Pitfalls, and Potential Future Directions

VISIT
Application of ChatGPT in Routine Diagnostic Pathology: Promises, Pitfalls, and Potential Future Directions
Large Language Models are forms of artificial intelligence that use deep learning algorithms to decipher large amounts of text and exhibit strong capabilities like question answering and translation. Recently, an influx of Large Language Models has emerged in the medical and academic discussion, given their potential widespread application to improve patient care and provider workflow.

Free
Table of Contents
1What Is ChatGPT?
2How ChatGPT Works
3Technical Limitations of ChatGPT
4Promises, Pitfalls, and Potential Future Directions of Application in Routine Diagnostic Pathology

One application that has gained notable recognition in the literature is ChatGPT, which is a natural language processing “chatbot” technology developed by the artificial intelligence development software company OpenAI. It learns from large amounts of text data to generate automated responses to inquiries in seconds. In health care and academia, chatbot systems like ChatGPT have gained much recognition recently, given their potential to become functional, reliable virtual assistants. However, much research is required to determine the accuracy, validity, and ethical concerns of the integration of ChatGPT and other chatbots into everyday practice. One such field where little information and research on the matter currently exists is pathology.

 Herein, we present a literature review of pertinent articles regarding the current status and understanding of ChatGPT and its potential application in routine diagnostic pathology. In this review, we address the promises, possible pitfalls, and future potential of this application. We provide examples of actual conversations conducted with the chatbot technology that mimic hypothetical but practical diagnostic pathology scenarios that may be encountered in routine clinical practice.

 On the basis of this experience, we observe that ChatGPT and other chatbots already have a remarkable ability to distill and summarize, within seconds, vast amounts of publicly available data and information to assist in laying a foundation of knowledge on a specific topic. We emphasize that, at this time, any use of such knowledge at the patient care level in clinical medicine must be carefully vetted through established sources of medical information and expertise. We suggest and anticipate that with the ever-expanding knowledge base required to reliably practice personalized, precision anatomic pathology, improved technologies like future versions of ChatGPT (and other chatbots) enabled by expanded access to reliable, diverse data, might serve as a key ally to the diagnostician.

 Such technology has real potential to further empower the time-honored paradigm of histopathologic diagnoses based on the integrative cognitive assessment of clinical, gross, and microscopic findings and ancillary immunohistochemical and molecular studies at a time of exploding biomedical knowledge.


There has recently been an explosion of progress in the capability and adoption of Large Language Models (LLMs), which are forms of artificial intelligence (AI) models that have been trained on large amounts of text data using deep learning algorithms and exhibit strong emergent capabilities including question answering, translation, and computer programming.1,2 There has been a wave ofproduct releases built around these LLMs, including PaLM API3 and Bard4 from Google, New Bing5 from Microsoft, Claude6 from Anthropic, and ChatGPT7 from OpenAI. In addition to product launches, there has been astonishing progress in the underlying models powering these products, with recent announcements of PaLM 28 from Google, LLaMa9 from Meta, and GPT-410 from OpenAI, which have much stronger capabilities than the previous generations of models. Moreover, there have been incredible advances in open-sourced models, including the release of Meta’s LLaMa, open-assistant,11OpenLLaMA,12 StableLM,13 and many others. Improvements in Parameter Efficient Fine-Tuning methods have also been noted,14 which make it dramatically easier for anyone to give new knowledge and capabilities to these and other open-sourced models.


As recent scholarship regarding potential applications of ChatGPT and similar chatbot technologies (or, simply, “chatbots”) in various facets of health care has grown,15 its relevance in pathology has begun to be explored. Recent literature, albeit very limited, has begun to pave the path toward roles which chatbots like ChatGPT may play in the field. However, little to no scholarship to date has illustrated the actual application of chatbots in routine diagnostic pathology. In this review, we target ChatGPT (built on the GPT-3.5 base model) without the use of plugins or other accessories as a paradigm regarding this aforementioned implementation to pathology. This model meets the desired criteria of (1) having strong performance, (2) being freely accessible for public use, and (3) not being augmented with an internet search or other external knowledge, which allows us to directly probe the knowledge of the model and not any database it may be searching over.


Furthermore, we hope to address the promises, pitfalls, and possible future directions of ChatGPT’s utility in diagnostic anatomic pathology by way of providing examples of question-and-answer scenarios, illustrating the remarkable capabilities of current technology. Along the way, relevant literature will be referenced, and a brief discussion on the limitations of our methodologies will be offered.


What Is ChatGPT?

ChatGPT (Generative Pretrained Transformer) is an AI tool developed by OpenAI (Parent company: OpenAI, LLC).16 It learns a statistical model (“Artificial Neural Network”) from large amounts of text data and generates creative responses to inquiries. These inquiries simulate human conversation. Recently, ChatGPT has been gaining recognition as the “next big thing”17 in both health care18 and academia.19 Kung et al20 pitted ChatGPT’s previous model, GPT-3, against 305 publicly available US Medical Licensing Exam (USMLE) practice Step exam question prompts (step 1: n=93, step 2: CK: n=99, and step 3: n=113) to observe how its responses would hold up. They found that “ChatGPT performed at >50% accuracy across all exams, [while] exceeding 60% in most analyses.”20


According to the USMLE website, “examinees typically must answer approximately 60% of items correctly to achieve a passing score” on any of the USMLE Step exams from year to year.21 Thus, ChatGPT appears to have overcome the passing threshold of one of the most difficult series of credentialing examinations in the United States. Since then, additional progress has been made with Google’s Med-PaLM 2,22 a fine-tuned LLM that consistently performed at an “expert” doctor level on medical exam prompts, scoring 85% on USMLE-style questions. GPT-4 excelled at a variety of similar human benchmarks, including scoring at estimated 75th percentile in the Medical Knowledge Self-Assessment Program.10


This application has both free and paid tiers that offer access to an automated AI natural language processing chatbot system intended to help save users time in their daily routines. It can be thought of as a “virtual assistant.”23 To use the free version of this publicly available platform, users sign up to create an account via this link: https://chat.openai.com/auth/login.24


After creating an account, a user can subsequently type his or her questions/inquiries into text-designated areas using a variety of formats (such as spreadsheets, large-text summaries, etc.) Once the user prompts the software (ie, hits the “ENTER” button on a keyboard), ChatGPT will generate an automated response, and the user can ask follow-up questions, like a real conversation. These “conversations” (or chats) then become saved in the user’s profile history, where they can be accessed again later. Although uncertainties exist regarding whether the free version of will exist forever, paid versions (of ChatGPT and other chatbots, frankly) may be of minimal concern to the public as long as they are affordable and of great value.


How ChatGPT Works

ChatGPT is a product that allows conversing with (ie, is powered by) a LLM25 built on the GPT series of Transformer Artificial Neural Network26 models from OpenAI. These GPT models start with no understanding of language and learn to “autocomplete.” or predict the next piece of text over an extremely large set of documents and webpages on the internet. As the model gets increasingly accurate at predicting what text is likely to come next, it develops an increasingly rich understanding of language and develops strong performance across a broad range of tasks, including question answering, translation, and computer programming.1,2 The result of this training is a “foundation model,” 26 or a statistical model of human language, that distills a large amount of knowledge from the internet at the time of its training.27 ChatGPT is a chatbot built on top of these foundation models with additional training on feedback from humans to improve its truthfulness, reduce toxicity, and more reliably follow instructions.28 The end result is a chatbot that one can converse with on a broad range of topics.29


Technical Limitations of ChatGPT

As a result of its technical design, the version of ChatGPT we explore: (1) has no knowledge of events beyond the time of its training (early 2022),7 (2) cannot access external data that are not directly provided by the user in the conversation (Although a range of other products including new Bing, Google’s Bard, and ChatGPT plugins allow the model to search the internet directly30) and (3) may “hallucinate” plausible-sounding text that is not actually true. These models can also repeat common misconceptions or replicate undesirable biases in their training data.31 As such, it is crucial that humans vet the reliability of these chatbot outputs; this would be particularly relevant to our discussion of its potential use in routine diagnostic pathology when any aspect of the chat output may be used for patient care. In addition, the specific way of asking questions (“prompts”) to these models can be very important. For example, asking models to “think step-by-step” can reduce errors on mathematical reasoning tasks by nearly 75% and make the results more easily interpretable by humans.32 In this work, we have not optimized our question format for ChatGPT, but it is plausible that reliability and interpretability could be improved by more carefully questioning the model or asking it to refine its outputs in light of errors we find in its initial answers.


Promises, Pitfalls, and Potential Future Directions of Application in Routine Diagnostic Pathology

According to a recent review by Sallam,33 potential benefits of ChatGPT in health care include (1) improved scientific writing; (2) elevated research equity, versatility, and efficiency; (3) improved day-to-day practice (eg, cost savings, streamlined workflow, patient health literacy); and (4) enhanced education (eg, personalized learning). This potential can even be expanded to public health, where ChatGPT may be able to support patients and communities in making informed health decisions.34 However, while ChatGPT and other chatbots may figure prominently in the future of health care and medical writing, limitations exist that must be considered before widespread, mainstream use.


Before Sallam, Biswas,35 and Kitamura36 each individually noted some of these concerns. First, AI use in writing raises ethical questions regarding authorship, accountability, and authenticity. Second, legal issues, such as copyright infringement, health care regulatory compliance, and frameworks, for ensuring the privacy, quality, and validity of documentation in patients’ health records must be considered. Third, as these automated language technologies require large amounts of previously reported data to operate their responses, this may lead to a lack of innovation, accuracy, and, ultimately, bias, unless newer data (eg, medical knowledge) is included in the training or human feedback is used to improve model performance. (It is expected by many within the machine learning (ML) community that these models will continue to get much better at a rapid pace. For example, jumps between models in the GPT series (ie, GPT-1 to GPT-2 to GPT-3 to GPT-4) have all been accompanied by massive qualitative jumps in capability.) In fact, some studies have shown that AI may overlook certain social or cultural aspects of health care (eg, the social determinants of health).37 Last, but not the least, it is apparent that AI natural language processing–generated medical references must be made transparent and clear to readers whenever they are used.


Thus, what are some pitfalls regarding incorporating ChatGPT and other AI-automated processing systems into routine diagnostic practice? In March 2023, Nakagawa et al38 addressed this question by simply asking ChatGPT via the prompt: “Pathology and AI: What could possibly go wrong?” Its responses were in line with the concerns and findings of Sallam, Biswas, and Kitamura. ChatGPT acknowledges that current limitations and possible pitfalls include (1) lack of data diversity, (2) perpetuating pre-existing data biases, (3) lack of diagnosis understanding/interpretation, (4) protecting patient health data from unauthorized access, (5) compliance with health care regulations, and (6) validation and accreditation of routine AI usage by regulatory bodies.


These concerns are all valid, as Nakagawa and colleagues acknowledge that there currently exists much uncertainty regarding the impact of AI on pathology. Some of these concerns may also arise from the “black box” theory of AI, in which ML software like ChatGPT have complex decision-making processes, which are not fully understood by humans.39,40 This complexity often leads to a lack of trust by users, which can be ameliorated by improved ML algorithms trained with increased biological explanation and clinical experience.41


However, despite these hurdles, Nakagawa and colleagues also state that if AI technologies are widely accepted, they may be able to help improve efficiency and provide support for pathologists. As Nakagawa and colleagues appropriately identified, humans are best suited for helping to establish new ideas, approaches, and diagnostics and will always need to review AI-generated content for approval. But as workloads increase and AI-generated content become more prevalent, accurate, and efficient, systems such as ChatGPT will become essential aids. ChatGPT’s accuracy can already be seen when solving higher-level pathology problems, as determined by a cross-sectional study from Sinha et al42 in February 2023. In training institutions, ChatGPT and other chatbots may be able to help attending teach trainees how to write brief, properly worded, and accurate pathology reports that completely cover all aspects of diagnoses. Rather than patients asking “Dr. Google”43 (ie, searching health information online) to find out more about their diagnoses from their pathology reports, too, ChatGPT and other chatbots may provide to be a more appropriate outlet when patients are not able to access their own doctors due to provider time constraints, etc.


We also note that despite the generally superior performance44 of models like the GPT series, patient data privacy considerations may make the use of such models for diagnosis impossible. Currently, chat histories are by default, stored and usable for training by OpenAI, which creates a serious privacy risk if patient data were to be input. On the contrary, if this feature could be theoretically optimized to accommodate electronic medical records (EMRs), chatbots like ChatGPT may be able to swiftly search and summarize relevant patient history, saving much time for pathologists (eg, prior diagnoses with tumor grades). Regardless, for this and other reasons, it may be preferable to use open-sourced models that can be run locally to avoid leaking confidential information and have more direct control over the model’s behavior until widespread EMRs (including via remote access) can be safely, succinctly, and accurately navigated by chatbots.

Article source
Author
loading...