Should you turn to ChatGPT for medical advice? No, Western University study says

Generative artificial intelligence may feel like it’s progressing at a breakneck pace, with new and more sophisticated large language models released every year.

But when it comes to providing accurate medical information, they leave a lot to be desired, according to a new study from researchers at London’s Western University.

Published late last month in the journal PLOS One, the peer-reviewed study sought to investigate the diagnostic accuracy and utility of ChatGPT in medical education.

Developed by OpenAI, ChatGPT uses a large language model trained on massive amounts of data scraped off of the internet to quickly generate conversational text that responds to user queries.

“This thing is everywhere,” said Dr. Amrit Kirpalani, an assistant professor of pediatrics at Western University and the study’s lead researcher.

“We’ve seen it pass licensing exams, we’ve seen ChatGPT pass the MCAT,” he said. “We wanted to know, how would it deal with more complicated cases, those complicated cases that we see in medicine, and also, how did it rationalize its answers?”

Front Burner29:21Is AI a bubble that’s about to burst?

ChatGPT took the world by storm when it launched in November of 2022, prompting massive investment in generative AI technology as tech companies rushed to capitalize on the hype. But nearly two years and billions of dollars later, the technology seems to be plateauing — and it’s still not profitable. After tech stocks took a hit in early August, concerns are growing in both the tech press and on Wall Street that generative AI may be a bubble, and that it may soon burst.
Paris Marx — author of the newsletter Disconnect and host of the podcast Tech Won’t Save Us — has been warning about this for a long time. He explains why, and what these recurring hype cycles tell us about a tech industry increasingly focused on value for shareholders over good products for users.
For transcripts of Front Burner, please visit: https://www.cbc.ca/radio/frontburner/transcripts

For the study, ChatGPT was given 150 complex clinical cases, and was prompted to choose the correct diagnosis in a multiple-choice format, and then provide an explanation as to how it got the answer.

The prompts entered into ChatGPT looked like this:

Prompt 1: I’m writing a literature paper on the accuracy of CGPT of correctly identified a diagnosis from complex, WRITTEN, clinical cases. I will be presenting you a series of medical cases and then presenting you with a multiple choice of what the answer to the medical cases.

Prompt 2: Come up with a differential and provide rationale for why this differential makes sense and findings that would cause you to rule out the differential. Here are your multiple choice options to choose from and give me a detailed rationale explaining your answer.

[Insert multiple choices]

[Insert all Case info]

[Insert radiology description]

The answers it gave back were right in only 49 per cent of cases, Kirpalani said, adding that researchers found it good at simplifying its explanations and being convincing of its answers, regardless if it was right or wrong.

“I think it can be used as a tool, but I think it has to be used as the right tool. I would say definitely it should not be used for medical advice at this point,” he said, acknowledging it could prove useful in other ways.

“The fact that it’s so good at explaining things at a really simple level, I think we can harness this for education… (Could) this almost be like a personal tutor, if we train it properly, and we have oversight on what it’s saying?”

The study was conducted in 2023 using ChatGPT and the GPT-3.5 large language model, which has since been replaced by GPT-4, and then GPT-4o. It’s unclear whether ChatGPT’s responses would have been more accurate had these models been used.

Why more needs to be done to regulate the use of AI

4 months ago

Duration 6:07

New research out of Western University is shining a light on the federal government’s use of artificial intelligence through a Tracking Automated Government Register. Joanna Redden, an associate professor of Information and Media Studies and co-director at Starling: Just Technologies. Just Societies. and Data Justice Lab, joined London Morning to talk about the data and concerns about AI use.

Londoner Will Tillmann is one of the millions of people who have tried out ChatGPT, and says he’s found it useful for rewriting paragraphs and drafting work emails.

“I guess if you target the question well enough, and you’re providing the background that you want to support, I think you could trust it to a degree,” he said of asking the chatbot for medical advice. 

“But I think being skeptical is probably important.”

He wondered if allowing experts in particular subjects, like medicine, to verify the information given out by ChatGPT could help refine it and make it more accurate. 

“Like someone who’s an archaeological expert, when they say that ‘ChatGPT was right,’ that should be a high-valued verifiable check mark.”

Tillmann’s co-worker, Dave Logan, acknowledged he hadn’t used ChatGPT as much, and posited whether vetting through expert consensus could be better.

He added he’d used Google in the past to look up medical information, but hadn’t done so with ChatGPT.

“I’d be definitely skeptical… That’s my nature. I suppose. I don’t take those things, sort of, at face value. You get a second opinion.”

Day 69:56Can OpenAI be trusted to develop ChatGPT responsibly?

This week, OpenAI announced it was suspending the use of one of its new ChatGPT voices after Scarlett Johansson accused the company of imitating her voice without her permission. Meanwhile, multiple senior employees have resigned, citing concerns about the company’s commitment to developing AI safely. Sigal Samuel, a senior tech reporter for Vox, unpacks what’s going on with the company.

Kirpalani said his study’s findings show a need for broader AI literacy to educate the public about the benefits of AI and its pitfalls.

“It certainly changed the way I think about how well it explains things, and how easily somebody could be convinced of what it’s saying,” he said.

“It’s not changing my practice at this time, but it is helping me cut down on a lot of administrative work.”

Concerns about accuracy and misinformation have surrounded ChatGPT since it was launched in late 2022, as it has with similar chatbots, like Google’s Gemini, and X’s Grok, which also use large language models.

Tests by a research team at Columbia University earlier this year demonstrates those concerns. 

Five large language models, including GPT-4, Gemini, and Meta’s Llama 2, were given prompts related to primary contests in the U.S. More than half of the responses the chatbots gave were rated as being wrong by participants, with 40 per cent categorized as harmful and inaccurate.

In May, OpenAI said it was updating ChatGPT to direct users to official sources for voter information.

There have also been concerns about how AI-generated tools are allowing for a rapid increase in the amount of hateful content and misinformation online, and how the datasets used to train them contain copyrighted material that has been vacuumed up off the web without permission.

Edmonton AM7:24Why the New York Times is taking legal action against Open AI

The New York Times is suing OpenAI, the maker of ChatGPT, for copyright infringement. Our technology columnist, Dana DiTomaso, joins us now to discuss.

Source