Micron Automata aims to find answers to big data questions

SUMMARY:

Built for a world where pattern matching across large volumes of data is increasingly important, Micron Automata aims to solve big data questions super fast

Automata car park example of pattern matching - from CAP videoOur ability to collect and analyse increasingly large volumes of data is creating demand for new approaches to server hardware, as Kurt Marko has recently described. One of the most valuable operations when analysing real-time streams of data is pattern matching. Memory chipmaker Micron Technologies believes this may create a market opportunity for a completely new processor design called Automata.

Built to analyze huge datasets in parallel, Automata has a reconfigurable processing architecture that Micron believes has applications across graph analysis, pattern matching, and data analytics. Whereas conventional parallelism consists of a single instruction applied to many chunks of data, Automata consists of a DRAM-like fabric of tiny interconnected processing elements that can focus a vast number of instructions at a targeted problem as data is streamed across the chip.

But how do you generate demand for a completely new technology that’s still in beta? Especially when specialist knowledge is needed to set it up do anything — Automata isn’t like a conventional microprocessor that executes a predetermined set of fixed instructions. It’s more akin to an FPGA in that it can be programmed with whatever patterns are to be matched.

Finding applications

Evidently, the technology needs some pioneers who can identify the most useful applications and develop ways of working with it that others can emulate. With that in mind, Micron has provided funding to help set up the Center for Automata Processing (CAP) at the University of Virginia, where a team of computer and data scientists have begun testing out the hardware and developing algorithms. The Center has a membership model and is inviting other academic institutions, research organizations and industry to participate.

Speaking at a London seminar hosted by Micron and its UK distributor Boston last month, the Center’s managing director, Tho Nguyen, said that whereas there are many sources of funding for software research, opportunities with hardware are much less common, which makes the CAP an attractive project for academics to get involved with.

You don’t get so much access to hardware.

One way to think of Automata is as the world’s fastest, most efficient regular expression matcher. It aims to bring a new level of performance to pattern matching, which has applications across a wide variety of disciplines, from speech recognition and fraud detection to DNA research and particle physics. Tests by researchers at CAP have shown some operations being performed hundreds of times faster than with a conventional CPU.

For example, a process called brill tagging, which is used in semantic analysis, shows a 276-fold speed advantage when analyzing text against 1729 different rules in parallel. Nguyen marvels:

It’s just simply unfair. It’s quite amazing the efficiency of this.

Fuzzy patterns

The most accessible explanation of how Automata works is a car park analogy, which can be viewed in a video on the CAP website. Imagine you were looking for a single license plate in a car park that had a millon cars randomly parked. A traditional processor checks each car’s plate sequentially, but Automata has a comparator in each space, which allows it to check them all in parallel and find the license plate instantly.

That’s not all. The spaces can also talk to those around them, which means you can look for cars parked in a pattern, for example two green cars next to each other. And the pattern matching can be fuzzy, allowing for variations within the pattern.

It’s that fuzzy pattern matching that makes the technology so promising in fields as varied as bioinformatics and social media analysis. One early adopter is the US particle physics laboratory Fermilab, which sees great potential in experimental high energy physics.

The search for high energy particles such as the Higgs boson — even dark matter — depends on being able to detect the distinctive energy patterns these particles generate. But the data analysis challenges are enormous. To see one Higgs means analyzing a hundred trillion collisions, and if you don’t notice within ten microseconds, you’ll miss the chance to collect additional data.

Automata is useful because it’s good at separating out the data associated with each of the many colliding particles — akin to separating a jumbled collection of jigsaw puzzles all thrown up in the air together. That in turn makes it easier to spot the one occurence that’s different from the rest. “This is exactly how you look for the unknown,” remarked one physics expert attending the seminar.

My take

It’s early days for this technology, but as we collect more and more data, the ability to analyze those data streams to detect telling patterns is becoming increasingly important. Micron has a lot of work to do to bring Automata into the mainstream, but it seems to be a smart move to enlist the help of the academics at the University of Virginia. Whether we’ll see this hardware routinely deployed in enterprise datacenters sometime in the future remains to be seen but its existence is symptomatic of our need for new solutions to the computing challenges of a connected digital world.

Image credit - Screengrab from CAP video

Disclosure - Boston is an occasional consulting client of the author