Fail early, fail often - a manufacturing mantra for an AI age

Profile picture for user mbanks By Martin Banks October 8, 2019
AI in manufacturing - a healthy counterpoint to some of the hype and the fear-and-loathing.

IIoT manufacturing factory concept ©  wilaporn1973 - shutterstock

The application of AI in sectors such as retail and healthcare is already well charted, and will continue so to be, but other areas are set to be as important, such as manufacturing. A panel session at the recent Microsoft Future Decoded conference at London’s ExCel Centre gave delegates a direct chance to ask questions, seek advice and challenge some of the myths about what AI can offer manufacturing industries.

It has to be said from the outset that the panellists need some acknowledgement for their courage. Normally, such sessions start with some moderator-driven questions that break them in gently with relatively ‘soft’ questions. Not this time: it was straight in with questions from the delegates.

The panel was also a good mix of experience. Moderator Richard King, Director of Manufacturing at Microsoft, joined by with two Microsoft specialists: Cross-Sector Outcomes Specialist, Matthew Hyams and Solutions Architect Megha Agarwal matched with Nick Glazier of Rolls Royce and Dave Myers of Ricoh, and topped off with Richard Scott of Sheffield University providing some cross-cultural academia/industry insights.

Some of the answers they delivered gave a slightly different perspective on modern industrial thinking than might have been expected. For example, the very first question was on the importance of data veracity when it comes to basing the outcomes of machine learning and AI systems on that data as the input. Basically, doesn’t AI and ML demand even more adherence to the need to get the right data and keep it right?

Hyams jumped straight in with a counter-opinion, one which actually takes into account the potential for AI to be working on the issue of data veracity and quality – or at least making appropriate allowances for it:

Because of the performance and speed capabilities of systems now being used, failure is now not the issue it used to be. Failure is now OK and the systems can learn from it.  About 60% of the data that gets collected will, in the end, be thrown away.

Others observed that it was also now possible to measure incoming data in a much more structured way. In addition, there is far less chance that it can be corrupted in any complex transit process because most of it is now collected at source, and is therefore clean from the outset. This means that it is now becoming possible to have the best of both worlds: raw data straight from the data lake, and curated data from the data warehouse.

It was also stressed that, here, the `A’ of AI most definitely meant ‘Augmented’ and there is there is still a great need for the ‘nose’ of the engineer to be fully engaged in the process. This is because it is still early days in the relationship between AI, engineering and manufacturing, an so a good deal of effort is still required in order to develop trust about the data being used. There needs to be a build-up of reported data  and what it really means, the very place where the experienced engineering ‘nose’ can interpret the data and suggest ‘It’s about right, in the right ball park’, or `that is way off’.

The nose 

In fact, it was suggested that any attempt to clean data in any way could have the effect of introducing bias into the operation of the AI systems. This is where the engineer’s ‘nose’ can become a hindrance, for the cleaning process can introduce human value judgements along the lines of ‘the data should really say xxx’, which then introduces a bias into any result arrived at by the AI systems.

One important area here that several delegates asked about was the inevitable use of third-party components in systems, and their impact on the ability to provide effective predictive maintenance on a complete system. As one put it:

We won’t know what third party components might do, how they might behave, when they fail, so it makes effective predictive maintenance impossible.

This was acknowledged as an issue and that conducting destructive testing that is closely monitored for all re-failure indicators can be an essential step.

This did also raise the issue that with the advance of AI systems for predictive maintenance, all component vendors should now be able to supply detailed reports on fully monitored destructive tests of all the products they sell. This data can then be incorporated into the curated data available to the system.

One interesting suggestion that emerged concerned compute resources out at the edge of the network as that develops as in important tool, especially for mixing AI, IoT and compute functions into local process management systems. This was the idea of developing on-chip machine learning modules that could be incorporated into Field Programmable Gate Arrays (FPGAs) incorporated into the edge systems. Each FPGA could then monitor and learn one small part of a process and the entire process then monitored as a federated learning model.

A question was also asked about the issue of moving an AI/ML environment from pilot project to a production system, for this can involve some significant scaling. What is more, users need understand the implications of the different types of scaling – both out and up.

The panel agreed that this was a significant issue, not least it is not always possible to tell quite where a pilot project might end up leading, as Glazier observed:

It is quite likely that new, and quite complex services spring off of what started life as a small, targeted process.

Here, the scaling requirement may have at first seemed limited, while the final result is huge, and therefore requires a major redesign. This is where the `out’ and `up’ scaling issue come into play, for scaling out can end up quite straight forward. The NHS is a good example scaling out, as there is an optimum size to hospitals, and scaling AI resources often requires just the repetition of the same system and resources across each hospital.

Scaling up a pilot project from, say, a pilot covering several hundred users to one handling several million and with a high percentage of simultaneous users, can rapidly demonstrate that the original design is completely unsuited to the scaled up task. As Ricoh’s Dave Myers observed, it is now time think of every project as a potential production process rather than just an ‘experiment’:

The need now is to assume that scale up will happen and therefore build the need for it into the original, pilot project design

My take

Manufacturing engineers are practical people and their reaction to AI is all that might be expected. It is also interesting to see the acknowledgement that AI can learn from mistakes as well as humans, so the new paradigm – fail early and fail often - is in fact meat and drink for the new technology.