Assessing thousands of claims for unemployment benefits is a taxing task at the best of times - but in the middle of a pandemic when job losses soar, the scale and complexity is unprecedented. This is the situation that the State of New Jersey found itself in during the onset of COVID-19, when it had to figure out a way to use data to assess the validity of people claiming for state relief when faced with job losses.
The crisis that unfolded meant that government support was an easy target for bad actors that saw an opportunity to conduct fraudulent activities - which we have since reported by tax authorities around the world. However, the State of New Jersey saw a solution with Splunk, a data observability platform, that enabled it to use data to identify suspicious activity and report unverified claims to the New Jersey Department of Labour.
Speaking at a recent Splunk event, Tom DeHaan, Network Administrator at the State of New Jersey, said that the project was a collaboration between the Office of Information Technology (OIT), the New Jersey Department of Labour and Workforce Development. The Splunk Enterprise installation is hosted in the OIT data center, which provides IT services to the State of New Jersey and partner agencies.
DeHaan explained why a solution was needed and said:
In March 2020, as the pandemic was spreading, the Department of Labour saw a greater than tenfold increase in unemployment insurance claims. Splunk was brought in to assist in the diagnosis of performance problems on [our] systems. But after that, the existing systems were never resourced to handle this enormous increase in usage.
This led to a request from the Department of Labour team: can Splunk be used to assist in the detection of fraudulent activity? The answer being a definite yes. Since the sources of data are very similar, expanding our effort into the unemployment insurance fraud detection was a natural extension.
After assessing the data, OIT identified 13 different types of suspicious behaviours. DeHaan said that he couldn’t reveal all of them, as this would give away the system’s ‘secret sauce’, but there are three that are obvious indicators for fraudulent activity.
Firstly, Splunk picks up if an email has multiple full stops in its prefix - for example, [email protected]. Secondly, if the claim is being made from an IP address outside of the United States this will flag a warning in the system. And thirdly, the residential address itself. Whilst the system is more forgiving of addresses in neighbouring states, attention will be given to those that come from states further afield. DeHaan added:
There are many other behaviours that contribute to the suspicious claims and we use all of them to paint a complete picture of subsequent fraud behaviour.
Building a detailed picture of suspicious behaviour
Jason Snyder, also a Network Administrator at the State of New Jersey, explained that before OIT could go looking for fraud, it had to get a handle on the data itself. The unemployment insurance programme, he said, has a lot of data behind it, including information about the claimants themselves, their personal data, but also information about the banks and employers too.
When someone first becomes unemployed in the State of New Jersey, they are able to file a claim, and if that claim is approved, can certify weekly to get paid their benefits. Anytime during this process people can update information, including things like their email address and banking information. Snyder said:
Each [claim] consists of literally dozens of data points. And unfortunately, these data points are spread over a large amount of technical sources. We have things like log files, the web server that hosts the Unemployment Insurance Application, along with the application itself - they both produce extensive log files. We have several databases that house information. And there's even a mainframe in the background. And there's lots of connections between them. And it makes it very difficult to actually get a grip on this whole thing.
And in order to search, you're going to need to understand all the relationships between these entities. And even if you understand the relationships, you need to know the SPL [search processing language] to do all the joints, lookups, to bring all this data together in a way that allows you to search them for suspicious behaviour.
The other problem is these searches are often long running. We average about 100,000 certifications every single day. And if you want to do a search that runs back to the beginning of the pandemic when the fraud became very widespread, these searches can take a long, long time.
There's lots of network activity involved and just a lot of data to parse through. The moral of the story is this system was designed over 30 years ago, there are some parts that are COBOL that are still running. It just wasn't designed with fraud detection and fraud mitigation in mind.
New Jersey’s solution, using Splunk, consists of three core steps. Each step is a set of scheduled searches that run to identify various factors. The first collects all the disparate data points into single events, which are housed within Splunk. These are a list of field value parties, which make it easier to search through. Step two is to run periodic searches throughout these events to look for suspicious activity, to help identify things that don’t seem quite right. That doesn’t necessarily mean fraudulent activity, but the system flags unusual events. Finally, the third step is to identify the fraudulent claims involved. Snyder said:
A single behaviour might not indicate fraud, but various combinations probably do.
The team at OIT wrote searches that collect all the data into indexes within Splunk, which solves two big problems. It collects all the data, which makes it easier to analyze and search through, rather than having to memorize the databases involved and the lookups needed. Snyder said:
All they have to say is ‘tag equals claim’ and they can see all the information associated with all the claims using the time period.
It also speeds up the searches significantly. The application logs and the web server logs have lots of noise in them, Snyder added, which the team is not really interested in when looking for fraud. It can now skip out all those and then for each claim or each certification, the team does all the database lookups at one time. This reduced searches that would typically run for hours and hours, sometimes even days, to just a few seconds to find the information needed.
It’s the relationships that matter
Snyder explained that it’s the second set of searches that really matter to the team at New Jersey, as this is the main component of its fraud detection system. He said:
These are really looking for suspicious behaviours. And again, it's just some sort of suspicious activity, regardless of how strong an indicator of fraud it might be. Now that could be as simple as a single data point, maybe their email domain is from a disposable email address, people typically don't use those for official business, so someone using it for unemployment insurance raises suspicion.
But it could also be many data points or relationships between data points. An obvious example is many claims coming from the same IP address. Now this could just be a few family members filing claims, but it also could be someone who maybe stole a bunch of PI on the internet and is trying to see which of those social security numbers they can file a claim against. So either way we mark the claim as suspicious.
New Jersey OIT also scores a lot of these results - which includes factors such as where the IP address goes to, the address that the person has listed on their claim, the number of claims etc. All these factors come together to provide the team with an overall risk score for their claim.
Finally, Snyder said:
The third step to our solution is to actually look for fraudulent claims. We call these fraud incident searches. We have various ways of determining whether something actually is fraudulent or not. So, is the bank that you provided on your claim on one of our blacklists? You go directly to pending status, you do not pass go, and you do not collect your unemployment benefits. On the other hand, certain banks are sometimes used for fraud, but also very frequently used legitimately.
If the case is that you have one of those banks listed on your claim, but there's nothing else wrong with it, that's probably okay. Your claim will get processed and you can go ahead and collect your benefits. Now in addition to this, we also look for much more complex things like specific combinations of fraudulent behaviours, or fraud scores of various thresholds.
So just as an arbitrary example, you may discover that people frequently show fraud behaviours A, B and F on the same claim, with a fraud score above X. We may have found that this is a pretty reliable determinant of a particular type of fraud.
In that case, we would write a new search to identify this, we would add it to our list of fraud incident searches, and then each day that search would run. All these searches run every day, and at the end of it, what we get is a final output csv file that lists all of the claims that we believe are fraudulent.
We send this over to our partners at the Department of Labour, and they actually shut these down and make sure that people don't get paid.
Along with this list that is sent to the Department of Labour each day, OIT has also created a dashboard that provides an overview of the claim activity and all the fraudulent suspicious activity being observed. This shows things like the total number of claims, the proportion that were suspicious or fraudulent, and the dollar amounts associated with all this information.
The team at the State of New Jersey noted that Splunk has been an integral part of its anti-fraud effort and that many thousands of fraudulent claims have been stopped, resulting in savings that amount to billions of dollars.