How Open Cosmos is using open source tools to enable alert monitoring in space

Gary Flood Profile picture for user gflood September 15, 2022
Open source and cloud technologies have been key for Open Cosmos to customize their way of handling space-based equipment alerts

Image of a planet in space
(Image by LoganArt from Pixabay )

The use of open source time series database monitoring has helped a European satellite firm, Open Cosmos, meet a unique challenge: supporting a computer in orbit, where you can’t send an engineer when something goes wrong.

Based in the UK, France and Spain, Open Cosmos designs, manufactures and operates satellites. It also tests and launches in-orbit exploitation services with clients like the European Space Agency. Open Cosmos launched its first satellite in May 2021 up to ‘low earth orbit’ (LEO), with plans to raise that number to six by the end of 2023.

After collating data - particularly imagery of the Earth - taken from these devices, the company essentially then sells this data on to both the owners of the satellites and other interested third parties, having recently released a space industry data platform called DataCosmos.

Pep Rodeja, the organization’s Ground Segment Technical Lead, says:

Customers don’t come with the idea of ‘I want a satellite with such and such specs;’ they come to us with a problem - like needing to monitor a specific region of the world, or they’ve got a scientific payload that they need to test in space.

What they want from us is to solve the problem that they have. So, what we offer is not only the satellite that we design and manufacture, but also all the licensing necessary to launch it. We also provide operational services, where we have a team in house that will continuously monitor and operate your satellite for you, all the way to end of life and ensure it's disposed of appropriately. We don't sell satellites, but all the end-to-end service that will solve your specific satellite need.

To do this as effectively as possible, says Rodeja, reliable access to all the telemetric data that is sent back by the satellites to reflect the current ‘health’ of each machine is key. Operational fitness is measured by several key metrics, generated by the main mission control software system.

At the core of this monitoring is what Rodeja calls ‘alerts that are downloaded from the satellite to ground-level monitoring systems, which are then scattered across different labs and test facilities.

The monitoring needs became clearer, says Rodeja, as the satellite is a server in the sky sending back data.

Unfortunately, with the ‘server’ travelling in orbit, this means that the IT team supporting it only has access to information about its current performance intermittently - i.e., only when the batch download data is available in short bursts, over a monitoring station in the Open Cosmos network. It’s also worth noting that this data is not being downloaded at fiber optic speeds, so maximum compression efficiency is also required.

Rodeja says:

Before we launched our first satellite, we were looking for a way to store the telemetry that we would get back, but we also wanted to set alerts that automatically trigger a fix if certain conditions are reported, or perhaps put the satellite into a contingency mode if we have to.

The challenge: Rodeja and his colleagues would only get that telemetry for 10 minutes every six to 10 hours.

During those 10 minutes, the firm would need to make sure that the satellite is okay, but also that nothing happened during the last six hours when it was out of communication.

Therefore, alerts have to tell mission control about the current state, but all this historical information too. 

The space community has built a lot of tools for observability and monitoring, he adds, except for the problem of the 10 minute pass every six hours, which Rodeja says is something unique to Open Cosmos.

It meant a new approach, as so much alert monitoring is real-time oriented. He explains:

“When you have an IoT device or satellite out there in space, no one can go and fix it straight away. So, for this type of situation, we need ‘retrospective replay’ of the alerting rules, which goes into the past and evaluates everything like it is in real time and shows you what happened in the past.

Finding a solution in space

Rodeja selected the enterprise version of Ukraine-based data monitoring and observability platform VictoriaMetrics to meet these challenges.

Founded by ex-engineers from Google, Cloudflare and Lyft, the company offers functionality called ‘vmagent.’

This is being used by Open Cosmos to ingest metrics from both its satellite fleet as well as its ground equipment, which is then uploaded into its main IT system.

An important selection criterion, says Rodeja, was the software’s integration with the Prometheus API - Prometheus being a widely-used open-source monitoring system, originally developed at SoundCloud.

Open Cosmos originally planned to just use Prometheus, says Rodeja, but found there were issues with the way it handled alerts. Using a system that could both do alerts the way he wanted, but was fully compatible with Prometheus, presented a strong advantage. He says:

When we found this tool was compatible with Prometheus, that was great, because it meant we could still use all the tools that have been built out there for Prometheus. We also found that it had better compression than alternative solutions, which meant that we could download data more efficiently from an Open Cosmos satellite.

Also important was the strong developer community around the open source version of the tool, he adds. Being 100% cloud was also seen as a bonus. Rodeja says:

We wanted to take what the cloud companies are doing and leverage all that great investment and experience they have been accumulating.

We also found that the code was both really nice to work with but was written in Go, which is the language that we already use internally, so we are always able to do the modifications that we need in order to add new functionality or solve a problem.

Since implementation, the software allows Rodeja to run alert rules on the ground that confirm that the satellite is both working fine and that there are no big issues that have arisen since the last check-up.

Rodeja also likes the ability to query the database with open source analytics and interactive visualization web application Grafana.

Summing up, Rodeja says that by using open source he can now ingest all the telemetry he needs from Open Cosmos’ growing satellite asset base with a single command. That, combined with the specific form of alerting he needed, his team now has a full, open source enterprise-level monitoring and alerting platform.

A grey colored placeholder image