If you built a house in a prime location as rental accommodation that had been continuously occupied since 2000, chances are you’d see a pretty high turnover of tenants. And if you had a set of TripAdvisor reviews for all those 19 years, again, chances are you’d have quite a big dataset.
And even if the ‘house’ we’re talking about floats 240 miles/400 kilometres above the Earth, cost $100 billion to build, and zips round our planet at 4.8 miles/7.7km a second - the logic still applies - how do you properly analyse all those ‘visitor reviews’ to get useful insights into what you’re doing right or wrong vis a vis your ‘tenants’?
That’s the situation that the US’s National Aeronautics and Space Administration, NASA (faced with an asset you might have heard about - the ISS, the International Space Station, which has, as of March, been visited by over 236 travellers from 18 countries (many multiple times). And the person tasked with working out a way to turn all that text and unstructured data - the Space Station’s ‘TripAdvisor’ remarks - is David Meza, NASA’s Chief Knowledge Architect at the Johnson Space Center in Houston, Texas.
Meza told diginomica/government:
The primary function of ISS is to do science, but in order for us to be able to do that, we have to understand the environment and its habitability, from using all the equipment to the sleeping arrangements, the food, the computer and scientific information that's there. We need to understand anything and everything that's up there and see if it's useful in providing what the astronauts need, so we need to decipher what the astronauts say and then group and categorise their comments in themes, so our Human Factors and Inhabitability Engineers can improve life on the International Space Station, based on these comments.
Let's say a team wants to understand what the astronauts thought of a particular exercise equipment over a certain span of time. They would have to go into the current database, which was just a traditional SQL database, and try and pull out the information based on a couple of keywords. Then they would have to put them into an Excel spreadsheet or a bunch of documents, read through them all, group them into themes, and then come up with their impression of what the astronauts were saying about this particular equipment - then provide that answer back to the project team that originally asked that question.
That would take anywhere in the neighbourhood of three weeks to maybe a month in order to get that to happen, so it was taking some time.
Sounds like a problem dying to be improved with tech, if possible - and that’s his job, it turns out:
My primary role is to develop and implement a technological road map that allows us to turn our data into actionable knowledge. I take a look at all our disparate data sources, take a look at all the information we have - the over 50 years of information we now have - and try to find better ways to extract the ‘golden nuggets’ of knowledge that users are looking for to help them continue on with the overall NASA mission.
It’s time to get a bit more specific about these ‘TripAdvisor’ reviews - which of course, they aren’t; after each mission participants may get as many as 25 structured debriefs - a process that has accumulated 90,000 crew comments, each at about 114 words, which if you printed them all out would be the page equivalent of 90 copies of Harper Lee’s To Kill A Mockingbird.
You won’t be too surprised, probably, to learn that various Sentiment Analysis techniques are what Meza and his team are primarily using to make sense of this heap of words. What you may be a little bit more intrigued in learning is that he is using a non-standard way of working with data, graph, to visualise and manipulate what comes out of all that sophisticated Maths:
I came across graph for the first time seven or eight years ago, and soon decided the concept of developing the relationships across different types of nodes as an excellent way to not only store information but offer a way to be able to search through it a lot faster.
At the speed of light
Meza says bringing Sentiment Analysis to bear did indeed speed things up - he was able to shorten the analysis period from that month to three days. Which is where graph comes in, as he has found that approach idea for then helping his Engineer colleagues visualise their results:
If I have a comment that was spoken by a certain astronaut about a certain mission during a certain timeframe and about a particular equipment or a piece of hardware on the International Space Station, we have built a model of all that with graph, with relationships defined in that model so we can start applying different types of algorithms and offer different search capabilities. This allows us to do the analysis we want a lot faster than having to pull the data out, do the analysis and then put the information back in.
And as a result, all those 90,000 comments are storied in his database and can be rapidly accessed, queried and analysed, Meza told us.
A next step, he says, though it is early days, is to take a look at the contradictions in the statements - so if one comment is that this bit of exercise equipment was great, but another was that this piece of equipment had major problems, Meza is building a way to show that one statement contradicts the other statement a lot faster than he thinks he could do in a traditional relational database, and is working on tools to help end users create their own queries to find the information and run their own analytics.
And he also wants to go beyond the ISS data resource:
I’d like to be able to connect our old data - such as from our Apollo missions, from some of our Shuttle missions, connect that a lot faster and easier to the things we're doing now, as we're heading onto the Moon and eventually to Mars.
I may not, personally, be working on the rocket itself, or the [space] suits, but the work that I do, does help, in the long run, achieve our mission of helping get astronauts back out into space.
Meza is using technology from Neo4J for this work.