My piece on The last mile in AI deployment - where the biggest risks (and payoffs) happen provoked dialogue.
At a recent big data and analytics event, I had the chance to field live questions on the topic. Here are some of the top questions, as posed to me by IT World Canada's Jim Love.
Would you care to run through the "last mile in AI deployment" concept with us?
ML development is an iterative development and testing effort, not in isolation, but the context of its co-existing in your ecosystem. Their behavior isn't always predictable. Machine Learning is a probabilistic method, and is not ideally suited to deterministic policies and rules. Applying business rules after ML-based predictions or classifications will deliver better conformance and transparency.
The "AI last mile" may represent the most significant risk to the enterprise. There is complexity in putting AI into production. Perhaps, IT principals are mesmerized by the terms "AI" and "AIOps," presuming it's just another case of project management and governance. AI may not only disrupt integration into existing business processes, but it can also do so in mysterious ways, and may escape scrutiny.
You talk about machine learning as a probabilistic method. Can you elaborate a little on that?
About AI and ML, we use these terms without much precision. It isn't intelligent. Even GPT-3, which can write readable novels, has no idea what it's writing. So let's call it what it is, as Judea Pearl did: "It's just curve fitting." Think about an ordinary ML model. It has features containing independent variables and predictors, in the form of a matrix. Let's call it X. It has a vector of outcomes, the dependent response, let's call it Y. Through the operation, they form a joint distribution.. What the model tries to do is derive a function that predicts Y based on X. This why Pearl wryly calls it curve fitting.
The danger is putting too much confidence in its inference. It lacks context as we understand it. That requires building infrastructure around it to apply its inferences in a context that makes sense.
Probability tends to be a little mysterious. Suppose there are 30 people in a gathering. What do you suppose the probability is that two or more people have the same birthday? You might think that that 30 people over 365 days would be about 1 in 10 or 11? That's wrong by a wide margin. To solve the problem, you frame it this way: the answer equals 1 minus the probability that each person has a unique birthday. (probabilities are always stated between zero and one). Solving this simple problem by iterative combinatorics, and the answer may surprise you. The probability is 70%.
Probability isn't very intuitive most of the time, like the birthday example. Another reason is that it is not mathematical. Probability uses math for efficiency, but it's not a good fit. Solving the problem is imagining the steps.
Beyond that misunderstanding of how much AI works, what are some of the other traps that companies fall into?
This is where we talk about bias, and even more severe problems. There are a host of things people do that don't work. The first is not getting a handle on the semantics of the data they are using. Lack of skill, amateurish development, aching desire, as John Tukey said, all add to the problem. For example, an ML model may not converge on the cost function, so it will look outside the features to the training data and find something that works, right or wrong. Labeling cats as dogs because they have a leash. This event is Shortstep Learning, insidious and not transparent. Amateurs won't grasp this. Another problem is that learning models can be subject to Disruptive Perturbation, like Microsoft's Tay. Immutability is a big deal.
What are some of the other risks?
The aggregation of data by a few large forms is very troubling. Frank Lloyd Wright famously observed that "Route 66 is a giant chute, through which everything in the middle of the country is falling to southern California." Today, it's the one-way transfer of our data to the elephant tech companies and applying automated, predictive solutions to everything we do, personally, collectively, and politically, supporting a monopoly. Any AI system that affects people's lives must be subject to protest, account, and redress.
- When your organization pressures you to do things that may not seem ethical
- You adopt an "it's only the math" excuse or "that's how we do it." You engage in fairwashing. - when malicious decision-makers give fake explanations for their unfair decisions.
- You don't know that you're doing these things
- The whole process is so complicated that there is opacity in operation, and the result
- Introspection is not typically followed for these sorts of applications before you embark on a solution
- There is an "aching desire" to do something cool that obscures your judgment
- There are four opposing alternatives in your approach: Deploying models for customer efficiency versus preserving their privacy Improving the precision of predictions versus keeping fairness and non-discrimination Pressing the boundaries of personalization versus maintaining community and citizenship Using automation to make life more convenient versus de-humanizing interactions
Recurrent or adversarial neural networks. Iteration, Monte Carlo simulation, Bayesian methods, all at scale. There are massive data issues, lack of transparency, and algorithms with names that aren't understood, like lassoing, boosting, and bagging. A damaging problem to the whole effort is a loss of confidence and funding from the organization, damage to the brand, or worse. As a first step, many organizations attempt a POC. The problem is, they often pick a subject that has already been solved with prior technology, and it winds up proving nothing and eroding confidence in the technology.
It can cause actual harm, intentional or otherwise But keep in mind, AI didn't invent bias. The use of quantitative methods to cause harm has been around for centuries. Solving the bias problem is not a technology effort. We have to solve ourselves.
Your article lists 15 steps that an AI project goes through. How are these different from other IT, digital or development projects?
They are the same in many ways, but unique problems are data, the black box problem where you don't know what it's doing or can explain its conclusions. Insidious bias is reflected in the data, the developer, and the organization's strategy and values, which are not always golden. Anthropomorphizing, misleading to thinking the model is learning or thinking. And blamestorming. You call the model Oscar, and when it goes awry, you say, "well, Oscar left the farm," when in fact, you did. The blend of skills needed to go from the first mile to the last mile. ML output is probabilistic, not deterministic. Most managers cannot absorb probabilities, so you need to wrap that output in deterministic models, rules and procedures.
What are some of the key lessons that you've learned or that others have learned about applying risk management/mitigation?
The assumption is that you don't really need to understand math to understand the algorithms. That's a mistake. Underestimating the number of iterations, the extensive time to get the data right will add to the unanticipated cost. Not understanding what the model is telling you. Something that spits out vectors, matrices, and tensor models can be easily misinterpreted.
Number 1, be realistic. Data is key. Do you have it, or is it spread across multiple data centers and clouds? Seek the advice of qualified third parties, don't try to go it alone. Use tools instead of being a coding hero. Use the wisdom of crowds in your organization to rank the most critical problems to solve and rank them again on the likelihood of being successful.
What are some of the best ideas and best practices?
Not Allowing ML developers to make their own decisions about the problem to attack is the wrong approach. AI/ML has to be a part of an organization's strategy and values. That means that management has to drive these efforts by combining 1) company purpose and values, 2) business model and strategy, 3) design process, 4) funding, development and support 5) continuing enhancement
Are you optimistic about our ability to get better and better at managing risks and outcomes?
Sure. What will happen, and is happening, is that software developers will take over a lot of the introspection and work and risk out of the process: data management, in particular, de-biasing data sets, clean third-party labeled data, development of fairness metrics and algorithms. Systems must operate under principles that benefit society like fairness, transparency and explainability and avoid issues with bias. However, it's naive to assume that all development will toe that line.
What would it be if you had one piece of advice for someone embarking on an AI project?
Only one? Don't be enamored of the technology. ML isn't a magical oyster exuding pearls.
But it's not all bad. Recently, satellite imagery has been able to spot the movement of refugees, and to monitor ecologically sensitive areas on the planet. used judiciously and without prejudice image recognition can spot rights abuses .In addition, forensics capabilities are vastly improved , to reconstruct crime scenes and hold perpetrators accountable. So, AI isn't the culprit. We are.
As Paul Virilio said, "The invention of the ship was also the invention of the shipwreck."
This content derives from my interview at the Big Data and Analytics West Annual Summit, December 2021. I was interviewed by Jim Love, CIO and Chief Content Officer, ITWC, publisher of ITWorldCanada.com.