Edge analytics hype versus reality - a discerning look at the pros and cons
- Edge analytics is promoted as the solution to the analytics problems posed by edge devices. Let's take a harder look at the pros and cons, including security, cost, and integration with cloud analytics. Can we even call these use cases "analytics" at all?
Is it analytics?: Edge analytics is a Top-5 topic in digital technology. Though it has many strong use cases, using the term "analytics" stretches it a little. In simple terms, edge analytics is the ability to collect, analyze and make decisions with devices (sensors) remotely based on embedded rules and algorithms, though that is changing.
The motivation for edge analytics is faster, distributed (but limited) decision-making, especially in low bandwidth cases.
Analytics itself is a vast discipline that includes four primary types and a multitude of tools and methodologies:
According to Techopedia:
Analytics is the scientific process of discovering and communicating the meaningful patterns which can be found in data.
It is concerned with turning raw data into insight for making better decisions. Analytics relies on the application of statistics, computer programming, and operations research in order to quantify and gain insight to the meanings of data. It is especially useful in areas which record a lot of data or information
In the world of business, organizations would usually apply analytics in order to describe, predict and then improve the business performance of the company. Specifically it would help in the following areas:
- Web analytics
- Fraud analysis
- Risk analysis
- Advertisement and marketing
- Enterprise decision management
- Market optimization
- Market modeling
To understand edge analytics, we need to know where the data is pulled. So, what is a sensor?
- Analog gauges on equipment (yes, they are still used)
- Simple digital devices
- Digital sensing devices with the ability to capture, send and analyze data in-place
- Some sensors today act more like intelligent devices, with the ability to perform more complex analyses or even operate as a swarm, combining data and methods with other sensors.
Some edge analytics pros and cons:
- Analytics for what purpose? Example: adjustments in device controls.
- Will the data be combined with other sensors?
- Swarm analytics - not the vendor. More on this use case below.
- Analyzing data as it's generated decreases latency in the decision-making process. For example, if an individual system component suffers a failure, the algorithm interprets that data and automatically shuts it down. This may save a lot of time transporting data to a centralized store and reducing or avoiding equipment downtime. As analytics go, this is pretty thin.
- Businesses should consider whether or not it makes sense to invest in edge analytics, since it's best suited for scenarios that need to optimize for speed, security or efficiency. There remain some engineering obstacles to successfully deploying an edge analytics application, as with any new architecture.
In Edge Analytics in 2022: What it is, Why it matters & Use Cases Typical proposed advantages (and my POV) of Edge Analytics include:
- Faster, autonomous decision making since insights are identified at the data source, preventing latency
But what is the scope of edge analytics?
- Lower cost of central data storage and management since less data is stored centrally
This is controversial.
- Lower cost of data transmission since less data is communicated to the central data warehouse
This is also controversial.
- Better security/privacy since the most granular data, such as video footage, is not stored or communicated
Edge almost always involves data transmission, leaving the door open for breaches.
Edge topologies were a data collection and analysis method that used an automated analytical data computation performed at a sensor or other device. There are millions of sensors operating today. Historically (ten or more years ago), the sensors had no analytical capability. They were capable of reacting in a prescribed way to the data streams they monitored.
Edge analytics rose to popularity in the Industrial Internet of things (IIoT). Suppose a sensor in a remote oil field well-transmitted data was interpreted as a potential failure of a part. An engineer would be dispatched, usually at a great distance, to resolve the problem. Two days later, a similar situation arose in a well near the first one. Today, the central engineering office can scan all wells for the specific problem, but it's not perfect. Sensors fail, their batteries die, or they were installed improperly. Having this information would save time and potential costs.
There are various edge devices and an inflated depiction of analytics. For example, If the coolant temperature sending unit (a sensor) registers a critical temperature, the heat gauge on the dashboard displays an alert. I shut the car off, avoiding a blown head gasket and irreparable damage to the engine. I wouldn't call that analytics, just sense and respond. But sensors have matured quite a bit and have algorithms embedded in them to react to different situations and different levels of severity. This is not analytics, but more recent development deserve scrutiny.
Swarm analytics: consider a commercial airliner. GE jet engines collect information at 5,000 data points per second; a Boeing 787 generates an average of 500GB of system data a flight; an Airbus A380 (was) fitted with as many as 25,000 sensors. It would be uncomfortable to see warning lights for 25,000 sensors in the cockpit. Instead, the data flowing from the sensors is combined with other' "smarter" sensors, and complex models (analytics) are calculated in real-time. The analytics are not exploratory or ad hoc, nor do they provide visualizations. It does consider integrated temporal data. In that sense, it is analytical.
Another example: consider an autonomous vehicle. There is a bi-directional conversation between the vehicle and some remote device 1) continuously transmitting software updates, 2) receiving data from the vehicle, and 3) running more complex analytics than the on-board "computers." The sensor (edge) analytics are not self-contained.
Edge analytics is a data collection and analysis method that uses an automated analytical data computation performed at a sensor or other device. Generally, the rules or algorithms are hard-coded in this device. This is accomplished before the data is sent to a centralized store. A controversy over a few years is whether it is necessary to send that data back to the data center or cloud? Streaming data from disparate IoT sources creates a massive store of data.
Communication and cloud services costs: There is a debate about whether 100% of the data generated by the edge should, either immediately or after a reasonable period of time, be collected and stored by a central repository, such as a cloud data lake, for more comprehensive analysis. One school of thought is that only the "important" information should be sent, as most of the data from the sensors are not very interesting. Transmission across common carriers of massive amounts of data is very costly.
This begs the question, what is interesting? Hypothetically, suppose a device had two sensors that would fire under a certain circumstance. Suppose the one sensor reports "normal" every ten seconds, but the other reports an abnormal state? Data reduction schemes would have the "normals" discarded, but wouldn't someone want to know why it was reporting normal, when the other sensor was not?
If the decision is made only to, for example, send certain kinds of analytical results to the cloud, how will heavy cloud-based analytics perform? Another thing to consider is that cloud storage may appear inexpensive, but the cloud operators charge a hefty fee for every ingest and egress of data.
Privacy and beaches: because sensor data is often broadcast via WiFi, Bluetooth, or other wireless net words, the risk of leakage or data breaches is high. Data breaches or hacks are an increasingly severe problem. At one of the (US) national labs, their newest $700 million supercomputer is air-gapped.
Not all edge sensors are about equipment or engineered devices. Personal wearable devices account for over 300 million (probably much more), dwarfed by the number of smartphones, approaching seven billion worldwide. Instead of transmitting telemetry, these devices are employed to capture, analyze and send very personal information. For every security scheme, ten hackers are trying to break it.
In edge analytics involving personal data, organizations strive to know very personal things about people in their data, such as preferences for travel, bank transactions, or hospital clinical data. A typical application is for a company to review if its targeting strategies are performing. Federated Learning, finding applications in healthcare, keeps the person's data on the device, receives the model and its updates, and transmits to the center hub only the model's results, preserving the sender's privacy. It solves the problem of sending millions of data streams by having the device perform the analytical work and sending only an encrypted result, typically in tensor matrix that only the central hub can unlock.
Differential privacy is a probabilistic approach defined by a mathematical definition of privacy, privacy as a random variable to quantify the level of security. Differential privacy gives investigators access to the most private data without revealing the individuals' actual identities. Other anonymization techniques remove relevant data to protect privacy but diminish the strength of the dataset.
Differential Privacy (DP) algorithms offer a reliable alternative by utilizing random noise in the data and resolving queries with high accuracy by applying probability. The problem is that it is hard to explain. Probability can be very counter-intuitive. If you flip a coin, the probability of it turning up heads is 50%. But once joint probability or conditional probability (Bayesian) gets into the mix, our decision-making rarely involves probability instead of heuristics or deterministic models (think about the annual budget). When asking for the results of your investigation, does it matter if you say 85% probability or 95%? Our management gestalt doesn't work that way.
The term “Edge Analytics” is only partially misleading. “Edge” was a good choice, “Analytics” not so much. It’s typical in our industry: Business Intelligence, Artificial Intelligence. Good disciplines, but hardly intelligent.