AI in Medicine: Can a Chatbot Beat a Human in a Cognitive Test?
A Groundbreaking Study on Artificial Intelligence’s Cognitive Limitations in Healthcare
Table of Contents
- A Groundbreaking Study on Artificial Intelligence’s Cognitive Limitations in Healthcare
- Exploring Artificial Intelligence’s Cognitive Limitations in Healthcare
- What Cognitive Limitations Does AI Exhibit in healthcare?
- Why are Visuospatial Tasks Challenging for AI Models?
- What Is the Impact of AI’s lack of Empathy in Clinical Settings?
- What Are the Expert opinions on AI’s Current Role and Potential in Healthcare?
- How Does human Empathy Contribute to Patient Recovery?
- What Are the Risks Associated with Medical Errors in AI?
- What Conclusions Have Been Drawn from the AI Study?
In Israel, a consortium of neurologists subjected some of the top AI chatbots to cognitive tests designed for humans, revealing a series of unexpected pitfalls in the technology that is increasingly integral to clinical decision-making. The study, led by neurologist Roy Dayan, casts light on the limitations of artificial intelligence in healthcare and the irreplaceable importance of human empathy in clinical practice.
The Study and Its Findings
The evaluation, published in the Christmas edition of the Medical Magazine BMJ in December 2024, analyzed the cognitive capabilities of five leading AI models: ChatGPT-4, GPT-4O, Claude, and the two models of Google’s Gemini (Gemini 1 and 1.5). At the core of their investigation was the Montreal Cognitive Evaluation (MoCA), a commonly used test to detect cognitive decline in humans.
The Montreal Cognitive Evaluation is widely utilized in the U.S., helping diagnosticians distinguish between normal and pathologic aging. The test evaluates various cognitive functions, including visuospatial skills, naming, memory, attention, language, abstraction, delayed recall, and orientation. To apply this to AI, Sebastian Sademoniraju, a computer scientist at Carnegie Mellon University, previously trained an AI model to simulate human-like cognitive tasks, a practice finding some success while also exposing several flaws.
The cognitive functions examined by the MoCA include tasks such as copying a drawing of a cube, generating words that begin with the same letter, and performing simple mathematical calculations. Surprising the researchers, none of the AI models achieved the maximum score of 30 points.
The models excelled in memory and attention tests but faltered in visuospatial tasks, such as the graphic representation of objects and spatial orientation. Particularly noteworthy is the evaluation of empathy, where a scene of a child about to fall while stealing cookies from a jar was presented in the “Coin Jar,” part of the Boston Aphasia Diagnostic Examination.
A cookie jar,[[in which a child reaches for the jar
, 🍪]
While each model detailed the elements of the image, none identified the imminent danger to the child. This lack of risk perception aligns with symptoms of frontotemporal dementia, which impacts decision-making and empathy. According to the researchers, “AI tools don’t perceive the world as humans do.”
The Limitations of AI in Health Care
Research conducted in the U.S. indicates AI tools can recognize high-level medical images and offer highly accurate diagnoses. However, they struggle to decode vital indicators of human behavior. Among these tools are ChatGPT, which has demonstrated the potential to offer significant assistance in clinical decision-making, but “cannot perceive tension in a patient’s voice or detect subtle signals in the patient’s position.”
The Perspective of Experts
That AI, despite its advancements, struggles with these essential clinical tasks echoes the sentiment of Thomas Thesen, a neuroscientist from the Dartmouth School of Medicine. Thesen states, “It’s like testing a calculator’s ability to lift weights” to illustrate why AI struggles when tested beyond usual human-driven scenarios.
Dr. Thomas Thesen from the Dartmouth School of Medicine said the key to leveraging AI effectively in medical training hinges on AI adoption for simulations in patient interaction.
The study highlights a fundamental issue: AI cannot be evaluated with the exact tools meant for humans.Dr. Thomas Thesen, neuroscientist from the Dartmouth School of Medicine
Dr. Robert Pearl, a former executive director of Permanente Medical Group and Professor at Stanford University, also agrees with the AI limitations but sees potential in its evolution. He believes
AI is progressing just like a young medical student, useful for data analysis, invaluable for learning but not yet reliable enough to treat and diagnose patients without supervision.
He reiterates that while AI’s precision in medical data is invaluable, it’s still not as reliable as traditional data points.
“If ChatGPT has this level of intelligence two years after its launch, let’s imagine its potential in five years.”
Dr. Robert Pearl, former director of Permanente Medical Group and Professor at Stanford University
The Voice of Human Interaction
Empathy, another irreplaceable aspect, remains a central focus. In a study published in 2024, empathy from healthcare providers proved more impactful on patient recovery than opioids. It indicated that instead of focusing on the medical aspect only, addressing human empathy is a crucial factor.
“Medical empathy encompasses further beyond the recognition of suffering.” It aims to relieve the suffering. Commenting specifically on the AI limitations, Dr. Roshini Pinto-Powell at the Dartmouth Medical School, states:
“AI currently lacks the fundamental human factor in clinical care.”
Dr. Roshini Pinto-Powell, Dartmouth Medicine School
Freedom From Medical Mistakes
Medical mistakes annually kill over 250,000 Americans, according to the results of reports produced in the Journal of the Patient safety. As reported by STAT, in 2023, AI errors specifically affecting patient psoriasis sensitively disrupted medical practice. The saying goes, “AI blind to quality of care never primes compassion.”
Exploring Artificial Intelligence’s Cognitive Limitations in Healthcare
What Cognitive Limitations Does AI Exhibit in healthcare?
Cognitive tests designed for humans, such as the Montreal Cognitive Evaluation (MoCA), were used to evaluate AI in a seminal study. Published in 2024, this assessment uncovered significant limitations in AI’s abilities, particularly concerning visuospatial tasks and empathy. AI excels in areas like memory and attention but fails to perceive risks or emotional nuance, vital for clinical decision-making.
Why are Visuospatial Tasks Challenging for AI Models?
AI models tested, including ChatGPT-4 and Google’s gemini, struggled with visuospatial tasks, such as the graphic representation of objects and spatial orientation. These are crucial for tasks such as accurately interpreting medical imaging or understanding spatial contexts in patient interactions.
What Is the Impact of AI’s lack of Empathy in Clinical Settings?
Empathy is a non-negotiable component of effective healthcare. AI’s inability to comprehend emotional contexts or detect human signals,such as the tension in a patient’s voice or subtle behaviors,considerably hinders its request in clinical practice. This limitation was highlighted when AI failed to identify danger in a scenario depicting a child reaching for a jar of cookies.
What Are the Expert opinions on AI’s Current Role and Potential in Healthcare?
Experts highlight the potential of AI but emphasize current limitations. Dr. Thomas Thesen compares AI’s cognitive limitations to “testing a calculator’s ability to lift weights,” suggesting AI’s current utility is in data analysis and learning rather than independent decision-making. Dr. Robert Pearl likens AI to a medical student, useful but not yet reliable without human oversight.
How Does human Empathy Contribute to Patient Recovery?
Human empathy plays a crucial role in patient care,surpassing pharmacological interventions like opioids in effectiveness. Empathy fosters healing by recognizing and alleviating suffering, reinforcing the notion that AI cannot replace the intangible human elements in healthcare.
What Are the Risks Associated with Medical Errors in AI?
The reliance on AI in diagnostics has its risks. AI errors, notably impacting conditions like psoriasis in 2023, serve as a reminder that without human supervision, AI can cause significant disruptions in medical practice. The adage “AI blind to quality of care never primes compassion” underscores the necessity of human oversight.
What Conclusions Have Been Drawn from the AI Study?
The study led by Dr. Roy Dayan reveals AI’s potential in advancing medical technology while cautioning against overreliance. the consensus is that while AI offers significant support, human intuition and empathy remain crucial, irreplaceable factors in clinical success.
