The blush is off the rose of Machine Learning…maybe

Profile picture for user Neil Raden By Neil Raden April 25, 2019
Geeky reviews of two ML studies - and something nice to say about Tom Davenport!

We already know there are drawbacks and even dangers with Machine Learning (ML). The literature is replete with examples of bias, not necessarily from the algorithms themselves, but from biased datasets, bias in selecting training data and all sorts of insidious bias that is picked up in the data, hiding in plain sight. And of course, no human being operates without some kinds of bias which can easily bleed into their models.

Then there is bias in the interpretation of the results and deterioration performance of the model through deliberate force-feeding of biased information into the operation (It took only 24 hours for a Microsoft AI chatbot project to turn from a tweeting teenage girl into a hate spewing, offensive public relations debacle, thanks to coordinated attacks on the learning software). These are all well-known stories.

But something came to my attention today that is quite a bit different. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models.

This gets a little geeky so let me try to flatten it out. Logistic Regression (LR) is a widely used technique in statistics that is used to predict probabilities. One can’t compare LR with ML because ML can and often does use LR. In that case, the difference is in the calculation process. ML uses a gradient based approach (optimization) and statistics uses mathematical equation solving methods.

But this study compared LR to ML, not using logistic regression, but various algorithms such as classification trees (30 studies), random forests (28), artificial neural networks( ANN) (26), and support vector machines (SVM). I am not going into how they used Area Under the Curve (AUC) to test results, but their conclusion was:

We found no evidence of superior performance of ML over LR for clinical prediction modeling, but improvements in methodology and reporting are needed for studies that compare modeling algorithms.

Then, here is another paper published at the same time, February 2019, that goes on to claim exactly the opposite: Comparison of artificial neural network and logistic regression models for prediction of outcomes in trauma patients: A systematic review and meta-analysis.

From the abstract:

The aim of this study was to compare the ANN and LR models in prediction of Health-related outcomes in traumatic patients using a systematic review.

In plain English, this experiment also compared the power of LR versus ML, but rather than considering a potpourri of ML algorithms, here they only considered Deep Learning techniques using Adversarial Neural Networks. Curiously, their conclusion was:

The results of our study showed that ANN has better performance than LR in predicting the terminal outcomes of traumatic patients in both the AUC and accuracy rate. Using an ANN to predict the final implications of trauma patients can provide more accurate clinical decisions.

But the AUC statistics were almost the same at the same Confidence Interval, though the ANN did perform a little better in the Accuracy Rate in random effect models (just another test, too complicated to explain here), but this was just one study and hardly, at least to me, convincing that the extreme cost and effort of ML, especially Deep Learning approaches like ANN, as opposed to the tried-and-true LR, which every statistician understands (and can explain, unlike ANN) are superior. Of course, this is just one type study.

Tom Davenport has written a thousand books, blogs, articles, papers, presentation, and lectures, and I haven’t read them all, but I’ve read a lot of them. But he wrote something in Fast Company in 1995 (not a typo) that always stuck with me. It was in reference to his dim view of what had become of reengineering, but it is sound advice for any new innovation:

When the Next Big Thing in management hits, try to remember the lessons of reengineering. Don’t drop all your ongoing approaches to change in favor of the handsome newcomer. Don’t listen to the new approach’s most charismatic advocates, but only to the most reasoned. Talk softly about what you’re doing and carry a big ruler to measure real results.

My take

Machine Learning, and AI in general, has already become fetishized. This is exactly what Davenport was talking about nineteen years ago. You don’t have to throw your existing methods under the bus and jump on the hype wagon. AI will be the most important thing to ever happen to technology, but not this week. The fact that these two studies came to different conclusions is a good sign that people are beginning to take a close look at it and sifting the wheat from the chaff.