How did a proprietary AI get into hundreds of hospitals - without extensive peer reviews? The concerning story of Epic's Deterioration Index

Profile picture for user Neil Raden By Neil Raden September 2, 2021 Audio mode
Summary:
How is it possible for proprietary AI models to enter patient care, without extensive peer reviews for algorithmic transparency? That's a question we should be asking about Epic's Deterioration Index, which has been utilized for several use cases, including COVID-19 patient risk models.

question mark on sticky note

The conversation about machine learning development largely centers on how individual organizations proceed - and whether they use adequate data, methods, algorithms, transparency, and a process that guarantees models do not go into production until they are tested and vetted.

At the other end of the spectrum are AI models developed by Google and Facebook that are completely opaque, which is a different problem. But a third level between the two doesn't get the scrutiny it should - custom models developed by a third party, and given as an "enhancement" to existing licensees.

Epic Systems Corporation, or Epic, is a privately held healthcare software company. According to the company, hospitals that use its software held medical records of 54% of patients in the United States and 2.5% of patients worldwide in 2015. In terms of market share, as per Wikipedia, Epic holds 31% of the US EHR  (Electronic Health Records) share for acute care hospitals, ahead of all others, including Cerner at 25% (as of May 2020). More than 250 million patients' have electronic records in Epic. 

Epic celebrated its 40th anniversary in March 2019. It had around 10,000 employees globally and generated about $2.9 billion in annual revenue. The company reports 40 percent of operating expenses are invested in research and development.

Founder and CEO of Epic Judy Faulkner unveiled a new data research initiative and software during the Epic User Group annual meeting on Aug. 27 2020. She highlighted the Cosmos program, which is designed to mine data from millions of patient medical records to improve research into treatments. The program gathers de-identified patient data from 8 million patients at nine health systems, and 31 more organizations have signed on to participate. The company also announced new products focused on letting physicians write shorter notes and voice recognition software.

What has happened with Cosmos since then is something of a mystery. We have been unable find any 2021 mentions of Cosmos, other than this: "Epic will also feature updates at the conference from its Epic Health Research Network, which publishes insights from its HIPAA-limited Cosmos data set from more than 113 million patients." The growth from 8million to 113 million is mentioned with not explanation.

Epic embarked on a program of AI prediction models as early as 2014. One of the first was called the Epic Sepsis Model. However, its usefulness is debatable. The predictive value of the index it produces is most significant for those who are hardly sick, and those who are deathly ill. That means, "This person isn't sick, or this person is circling the drain." It begs the question, "How useful is this?" Surely any clinician can apply the valuable heuristics of triage for the next steps of those not ill, and that are not likely to survive. The vast majority of decisions would seem to occur for all of those in the middle. This is called the Epic Deterioration Index. Via Fast Company:

Loosely speaking, triage is an act of determining how sick a patient is at any given moment to prioritize treatment and limited resources. But historically, these tools have been used only after a rigorous peer review of the raw data and statistical analyses used to develop them. Epic's Deterioration Index, on the other hand, remains proprietary despite its widespread deployment.  Without direct access to the equations underlying Epic's Deterioration Index, or further external inquiry.

More problems were documented via a Michigan-based study: Epic's widely used sepsis prediction model falls short among Michigan Medicine patients. The tool is included as part of Epic's electronic health record platform. According to the company, it calculates and indicates "the probability of a likelihood of sepsis" to help clinicians identify hard-to-spot cases. While some providers have reported success with the tool, as noted, researchers affiliated with the University of Michigan Medical School in Ann Arbor found its output to be "substantially worse" than what was reported by the vendor when applied to a large retrospective sample of more than 27,000 adult Michigan Medicine patients. The researchers highlighted the wider issues surrounding such proprietary models. They wrote in JAMA Internal Medicine:

Our study has important national implications... The increase and growth in deployment of proprietary models have led to an underbelly of confidential, non- peer-reviewed model performance documents that may not accurately reflect real-world model performance.

When AI goes sideways in an e-commerce context, and the online retailer sends you two left-foot shoes, it isn't the end of the world. After all, technology is never perfect. But when the biggest EHR (Electronic Health Records) provider, Epic, provides Machine Learning algorithms to predict care in a clinical setting, errors are a lot less tolerable, especially when the algorithms are not disclosed, nor is the model's development data explained. 

I hate to come back to that old canard about ethics, but when a doctor using the tool admits that "Nobody has amassed the numbers to do a statistically valid" test, it's more than troubling. As STAT put it in AI used to predict COVID-19 patients' decline before proven to work:

'Nobody has amassed the numbers to do a statistically valid' test of the AI, said Mark Pierce, a physician and chief medical informatics officer at Parkview Health, a nine-hospital health system in Indiana and Ohio that is using Epic's tool. 'But in times like this that are unprecedented in U.S. health care, you do the best you can with the numbers you have and err on the side of patient care.'

It's even more troubling when you take into account that Epic PAYS hospitals as much as $1million to use the tool: I have not been able to determine the business case for this (The Verge: Health record company pays hospitals that use its algorithms).

A detailed study of the efficacy of Epic's Deterioration Index (EDI) for identifying at-risk COVID-19 patience concluded: 

We found that EDI identifies small subsets of high- and low-risk patients with COVID-19 with sound discrimination. However, its clinical use as an early warning system is limited by low sensitivity. Studies of Epic's Deterioration Index for COVID-19 have been primarily negative. These findings highlight the importance of independent evaluation of proprietary models before widespread operational use among patients with COVID-1.

The concerns surrounding this practice are its opacity. It is a proprietary system. What data, and what data preparation methods were applied, what algorithms, etc. are not known. But for many, the glaring ethical problem is: why Epic pays clients to use it?

At HIMSS21 Digital, John Halamka, the president of Mayo Clinic Platform on the four significant challenges to AI adoption in healthcare, said, "augmentation of human decision making is going to be greatly beneficial" - but some hurdles need to be overcome first.

However, one key issue that must be solved first is ensuring equity and combating bias that can be "baked in" to AI "The AI algorithms are only as good as the underlying data. And yet, we don't publish statistics describing how these algorithms are developed. The solution, he said, is greater transparency - spelling out and sharing via technology the ethnicity, race, gender, education, income and other details that go into an algorithm.

Halamka points to what he calls the four grand challenges to AI adoption in healthcare:

  1. Gathering valuable novel data - such as GPS information from phones and other devices people carry as well as wearable technology - and incorporating it into algorithms.
  2. Creating discovery at an institutional level so that everyone - including those without AI experience - feels empowered and engaged in algorithm development.
  3. Validating an algorithm to ensure, across organizations and geographies, that it's fit for purpose as well as labeled appropriately as a product and for being described in academic literature.
  4. Workflow and delivery - getting information and advice to physicians instantly while they're in front of patients.

My take

Lots of opinions, but here is my take:

  1. We have no idea how the Epic Deterioration Index model was built. For instance, in the US, healthcare data is riddled with biased information of observations .
  2. No mention of the methodology of Epic's Deterioration Index is published. No description of de-biasing, if any, is given. Biases in medicine, inside and outside of AI systems, are prevalent. As per The Washington Post, it is well-documented that one-half of white medical school students and residents still believe that African Americans have a higher threshold of pain (and you don't have to be a genius to figure out the origin of that) as well as biases about gender, race, age etc. These biases leak into medical records, and no program should be taken at face-value, unless it can disclose this is rectified.
  3. EHR data is full of doctors' other narratives that are not edited. 
  4. There is no viable peer-reviewed assessment of the Deterioration Index model that I know of. We have no idea if the training data represents a fair cross-section of the population.
  5. In addition, for something as complicated as assessing COVID-19 patients in real-time, Epic seems to have rushed this to market, delivering in just six months from the "first wave." One has to presume that the knowledge of the etiology of COVID-19 was only beginning to be understood when this was rolled out.

The fact that Epic pays hospitals to adopt it also needs a detailed explanation. ("Verona, Wis.-based EHR giant Epic gives financial incentives to hospitals and health systems that use its artificial intelligence algorithms, which can provide false predictions," according to a July 26 STAT News investigation).

As per The Verge, Epic provided the following explanation for these financial incentives: 

'Epic's Honor Roll is a voluntary program which encourages the adoption of features that help save lives, increase data exchange, and improve satisfaction for patients, physicians, and health systems,' Epic said in a statement to Stat News.

I must have missed the part where they justify paying hospitals to use their model. The byzantine healthcare system in the US doesn't always seem logical, what we know for sure is that incentives drive the system.

Whatever the explanation, Epic's Deterioration Index is now widely used. Fast Company already raised a similar question: How did a largely untested AI creep into hundreds of hospitals?. As the authors wrote: 

Even now, there have been, to our knowledge, only two peer-reviewed published studies of the index. The deployment of a largely untested proprietary algorithm into clinical practice-with minimal understanding of the potential unintended consequences for patients or clinicians-raises a host of issues.

It remains unclear, for instance, what biases may be encoded into the index. Medicine already has a fraught history with race and gender disparities and biases... Some clinical scores, including calculations commonly used to assess kidney and lung function, have traditionally been adjusted based on a patient's race-a practice many in the medical community now oppose. Without direct access to the equations underlying Epic's Deterioration Index, or further external inquiry, it is impossible to know whether the index incorporates such race-adjusted scores in its algorithm, potentially propagating biases.

What I find distressing is that Epic would develop a model, precise or not, that reduces a human being's course of treatment for a potentially deadly disease with a simple index of 1-10. I understand that clinicians may find this a helpful tool, but I'm an advocate for the patients. They didn't ask for this disease; clinicians chose this profession. They shouldn't be looking for shortcuts or cookie-cutter treatment.