Main content

US Citizen and Immigration Services moves to real-time data sharing with Confluent

Derek du Preez Profile picture for user ddpreez October 11, 2022
The Homeland Security department is hoping that it can move from spreadsheets and email exchanges, to real-time event streaming with Confluent, in order to better provide immigration services.

An image of Rob Brown, CTO of US CIS
(Image taken by author)

US Citizen and Immigration Services (US CIS), which sits within Homeland Security, is working with Confluent to try and build an event-driven data architecture, so that it can provide live data streams to external organizations that work with it to provide immigration and border benefits and services. 

Rob Brown, Chief Technology Officer at US CIS, was speaking at Confluent’s user event in Austin last week, where he explained that the sharing of data with organizations such as Customs and Border Patrol will mean that the US government will be able to build a more rounded view of a person’s immigration services journey when interacting with the organizations. Commenting on the role of US CIS, Brown said: 

I like to think that we provide the American Dream for folks. We have a lot of people that rely on immigration benefits, but also really on the humanitarian side, we work a lot with refugees. We try to help a lot of these folks and try to give them lawful immigration rights in the United States. We’ve got about 20,000 employees and we’ve got about 350 offices around the globe. 

So, we’ve got a lot of people, a lot of offices, doing a lot of work, and a lot of those adjudication activities rely on a lot of data and information from a lot of other organizations - from our friends at Customs and Border Control, to our colleagues over at ICE, and other folks like the Department of State. 

US CIS already provides data to other organizations upon request, such as to DMVs across the United States that need a status on an individual to provide a driver’s licence, However, this has usually been orchestrated via a complicated data architecture. Brown said: 

We traditionally had a spaghetti mess of how we were sharing data with various other business units. We’ve had to rely on spreadsheets and email exchanges. So taking a step back over the past few years, we’ve really been trying to take note of: what does that mean? How do we start to centralize a lot of our integration activities? What new technologies exist? And how can we start to deploy them not just internally for ourselves, but also externally for our partners too? Both in the consumption and the production of data. 

Brown said that there were no options for filtering, transformation and native streaming analytics. And that most data sharing required sharing an entire copy of a dataset. However, moving to a streaming architecture also posed challenges - mainly that US CIS had to build out an on-premise system, due to security restrictions posed by cloud vendors. Brown added: 

Some of the challenges we have include moving forward with standards - how can we start to think about, not just internally but externally too, technical standards and governance? The other thing is a lot of these things can be accomplished by leveraging a PaaS or SaaS service. But being a federal agency, we rely heavily on following security and privacy mandates. So if you have a service out there that’s not FedRAMPED that’s a big blocker for us. 

There’s lots of different providers that we would love to use, but we can’t. So what does that mean? It means we typically have to use some sort of on-prem manifestation of that service. And then we usually have to build a cadre of other services around it so that it meets our security requirements, as well as it being interoperable with the rest of our services. 

A data in motion model

US CIS currently uses REST APIs, which Brown described as antiquated, and is now pursuing a model of data in motion, using Confluent. The below chart shows the stages that the department is moving through, where US CIS wants to move from REST APIs, to direct access, towards data in motion, which will allow organizations to keep data in sync between different data stores and agencies. 

An image showing how US CIS is moving from REST APIs to data mesh architecture
(Image taken by author)

Brown said: 

We’ve build a cadre of API products internally and we are still trying to mature that into a centralized management layer that can be served up as products within US CIS. At the same time, within the past four months, we've now exposed an external portal for businesses and the public to start to do business with US CIS. And that means getting public data so they can do their own statistics, to actually build their own products. So that excites us quite a bit. 

And that's a great way to open up the aperture score, to really true digitization, as opposed to what we've been doing in the past, which is a lot of paper applications being filed out and submitted and then hiring somebody else to digitize that. So to me, this is very exciting as it relates to how we change the way we do business and do a lot of that external data sharing with some of our business partners.

And this new model will support the creation of new immigration services for other organizations such as Customs and Border Patrol (CBP), which is thinking about how to deploy a unified immigration portal. Brown said: 

How we are leveraging Kafka, and specifically Confluent and cluster linking, is to actually create a true data mesh, or service mesh, building out that sort of more robust domain driven data mesh between and across the Department of Homeland Security  - and ultimately with other business partners that are part of our immigration benefits providing mission. 

We’ve just started this work, we're about maybe three months or four months into actually making this a reality, and we are working through that bite to bite, real interaction of exchanging data and how does that start to change operationally, as opposed to just putting data into somebody else's warehouse? Or possibly somebody else's transactional data store that has pretty high latency? 

So we've got a lot of requirements to make this happen. There are some very good use cases that can manifest into a programme over at CBP called the Unified Immigration Portal, where we are going to have a picture, or a journey, of everybody who's, let's say, crossing the southwest border. 

So having some of this data in motion, real time operational transactional data, that is truly event driven, and then starting to change the applications, not just the reporting dashboard, but start to change those applications so that they are truly event driven. It really sets the foundation that we hope we can start to make patterns, reproducible patterns and start to hit other facets of CBP, other facets of ICE. 

Brown said that US CIS is able to take this approach because of the domain driven design mentality that it adopted way back in 2016/17, where it started building out domains to make its business processes and started to assign the right technical and business people to those areas. It then built upon this with the adoption of microservices, serverless, and started to aggregate teams into common Kubernetes clusters, which now means it is in a position to take advantage of an event-driven architecture. He added: 

Those were the foundational elements from our perspective, and we then moved, using that same sort of construct from a domain driven perspective, into leveraging it for a lot of our data. And we're going through that journey now.  So I'm hoping as we continue to build these patterns from a Kafka perspective and a cluster making perspective, we can start to build that same construct across DHS. Starting small, we'll be doing that with CBP.

The benefits we’ve already realized have been pretty significant. We are starting to see those efficiencies and how we can exchange data pretty quickly, with that direct access. 

A grey colored placeholder image