Image: Chayanan via Getty Images
2 min. read
Artificial intelligence has already proven it can perform specific medical tasks, such as interpreting X-rays or flagging risks in patient data. But caring for patients is a dynamic process that unfolds over time, requiring clinicians to interpret signals from multiple sources and intervene as a patient’s condition changes. Stabilizing a patient may require a physician to synthesize lab values and medical images, listen to lung or heart sounds, observe physical responses, and decide when to escalate care—often under severe time pressure.
Given that complexity, how far can modern AI systems really go? More specifically, can a large language model manage an entire clinical decision-making workflow, rather than just individual tasks within it?
That question is the focus of a new white paper by Mack Institute co-director and Wharton professor of operations, information, and decisions Christian Terwiesch, Mack Institute pre-doctoral fellow Lennart Meincke, and Arnd Huchzermeier of WHU’s Otto Beisheim School of Management. The paper is the latest in a series of generative AI experiments by Terwiesch and colleagues, supported by Wharton’s Mack Institute For Innovation Management.
To explore this question, the researchers placed a multimodal large language model inside a realistic medical training simulation—the same type of system used to evaluate medical students and practicing clinicians. On screen, a virtual patient’s condition evolves in real time: vital signs change, test results arrive with delays, and inaction has consequences.
Rather than responding to a written prompt (such as “a 50-year-old male presents with chest pain”), the AI must decide what to do next at every step. It can question the patient, turn on monitors, order lab tests or imaging, administer treatments, and escalate care—all while the clock is running and the patient’s condition may be improving or deteriorating. In effect, the system is evaluated not on a single answer, but on whether it can manage an entire clinical encounter from start to finish.
The researchers evaluated the AI across four acute care scenarios, ranging from a simple at-home hypoglycemia case to complex emergency room situations involving pneumonia, stroke, and congestive heart failure.
Across scenarios, the AI consistently stabilized patients and completed cases at rates comparable to—and in some cases higher than—medical students. It also completed cases substantially faster. Overall diagnostic accuracy was similar, and in many instances the AI’s sequence of actions closely resembled expert clinical practice.
Read more at Knowledge at Wharton.
From Knowledge at Wharton
Image: Chayanan via Getty Images
The "PARCCitect" team seeing the Betty supercomputer for the first time.
(Image: Ken Chaney)
A bioengineered bean gum from the lab of Penn Dental’s Henry Daniell is found to reduce the levels of three microbes associated with head and neck squamous cell cancer to almost zero, without affecting the beneficial bacteria normally found in the mouth.
(Image: Kevin Monko/Penn Dental Medicine)
A student holding a composition sheet filled with music notes while practicing their group performance.
nocred