ChatGPT Fails at Recommending Cancer Treatment, Study Finds

As close and tempting as they sound, cancer treatment regimens designed by artificial intelligence tools are riddled with errors and still likely years away.

(Bloomberg) — As close and tempting as they sound, cancer treatment regimens designed by artificial intelligence tools are riddled with errors and still likely years away.

Computer-designed cancer treatment plans generated by OpenAI Inc.’s ChatGPT were plagued with a familiar problem, researchers at Brigham and Women’s Hospital found: Inappropriate treatment recommendations were intermingled with correct ones, making them especially hard to distinguish, according to the study published Thursday in the journal JAMA Oncology.

ChatGPT “speaks oftentimes in a very sure way that seems to make sense, and the way that it can mix incorrect and correct information is potentially dangerous,” said Danielle Bitterman, an oncologist and study coauthor at the Artificial Intelligence in Medicine program of the Mass General Brigham health system. “It’s hard even for an expert to identify which is the incorrect recommendation.”

Video: Is AI Good For Your Health?

While nearly all of the responses from ChatGPT included at least one recommendation in line with National Comprehensive Cancer Network guidelines, the researchers found, about a third of also contained incorrect suggestions. About 12% of responses had “hallucinations” — recommendations that are nowhere in the guidelines, the researchers said.

Although generative AI tools may not be accurate to depend on for cancer treatment plans, technologies are much more widely discussed as having the potential to help detect cancer early, when it’s more likely to be treated successfully. OpenAI has stressed that ChatGPT can be unreliable, make up information and requires “great care,” especially for “high-stakes contexts.”

Clinicians are hopeful that artificial intelligence can help lighten their administrative loads, Bitterman said, but concerns over accuracy and privacy mean that large language models like ChatGPT are years away from being widely adopted in doctors’ offices and hospitals.

“It’s impressive that in almost all cases it did include a guideline requirement recommendation,” she said. Given its broad base of knowledge, it’s almost as if ChatGPT went to medical school, she said, “but it didn’t go to residency,” the advanced clinical training that doctors get after graduating.

–With assistance from Seth Fiegerman.

More stories like this are available on bloomberg.com