Remember data warehouses? Yes, they are still relevant
- Summary:
- Data warehouses may seem like a trip down memory lane, or just a clunky data store. But if we look closely at data warehouse projects, the goals - and results - remain relevant.
Most clients I speak with tell me they have no intention of shutting down their on-premise data warehouse. Others have migrated to the cloud, but still with a cloud data warehouse platform.
Sometimes I get wary of the latest thing. Today I want to trip down memory lane a few years, and talk about data warehouses. The term "data warehouse" can create the impression that it is merely a place to store lots of data, which raises the issue, why not keep the data in a data lake or native object store?
An active data warehouse performs many more essential functions. It represents the single source of integrated, curated data, both historical and current. It supports and often provides complex analytics and processing of models for data scientists. Because of its relational nature, it stores and processes data at the atomic level. The Hadoop file system, HDFS (though I don't know if anyone still uses it), stores data as files, so operating on the contents of those files requires more work to decompose them and pass results off to other modules.
Prevalent thinking about data warehouses is that data is ingested, either cleaned up beforehand via ETL or staged in the data warehouse and cleaned there (ELT). Either way, it is positioned as a source of data (and metadata) for queries. But in many cases, it is not a one-way flow. The data warehouse can be used to create models that, in turn, generate results.
For example, I'm going to describe a data warehouse that curated source data from twenty different internal systems in a life insurance company. Extracts were pulled from the data warehouse to feed a valuation system, which generated large volumes of cash flow information based on interest rate assumptions, mortality and lapses. In one case, the fine-grained time-series cash flows from hundreds of valuation runs were analyzed and used to generate solvency reports and other statutory requirements.
Life insurance company
Regulators of life insurance companies are primarily concerned with solvency. Since life insurers typically issue contracts with liabilities not due for years or decades, it is critically essential that regulators carefully watch insurers' business practices, investment portfolios and experience. This particular company caused some concern with the regulators for several reasons I won't go into, but management chaos was on top of the list.
The basis of a solvency study is a portfolio valuation. Most of the time, the valuation is done at a slightly aggregated level, not for each policy, such as ten year level term for 10-year age brackets and possibly some other variables. However, for this company, the regulators demanded a full seriatum valuation - not just each policy, but each policy coverage, such as death benefit, accidental death benefit and disability waiver benefit, for example. In addition, instead of providing the valuation and all associated studies on an annual basis, they demanded them on a quarterly basis. They gave the company until the end of the following year to comply, about eighteen months. If they did not comply, the regulators would revoke their license to sell new policies, a death sentence for an insurance company.
The actuarial department was solely responsible for meeting these regulatory requirements and had a few problems. The current process was highly manual, extracting data from over 20 source systems, many of them with poor data quality. The actuaries in the department struggled with obsolete tools such as COBOL, Fortran, APL, Easytrieve and others to get the data and then manually try to pull it together with spreadsheets. To do a non-seriatum valuation took five months, so finding a solution for generating a total seriatum valuation every three months seemed impossible. Moral was poor because the work was not challenging to actuaries who are highly skilled in actuarial science, not data wrangling. Turnover m the department was almost 50%, further exacerbating the problem.
Luckily for the company, the chief actuary was something of a visionary and a charismatic leader. He sought the advice of the usual large consultancies but instead hired a small boutique firm (mine) with a reputation for solving these kinds of problems. We architected and implemented an ETL/Data warehouse/BI platform in less than six months. We tightly integrated the data warehouse with a third-party valuation tool, GGY AXIS, now part of Moody's, enabling:
- full seriatim valuation
- integration of multiple jurisdictions
- integration of valuation with scenario testing
- valuation cycle to shift from annual to quarterly, and
- actual cycle time was reduced from five months to two weeks.
All advanced studies of experience, lapses, interest risk, etc., could be easily generated with the BI tool because the relational database schema was designed for extreme flexibility. Over a year, the company was able to run hundreds of variations of valuations and send the detailed cash flows back to the data warehouse, where mixed scenarios and what-if analyses could be performed easily. Increased confidence in data released $20m of reserves, while improved experience studies further improved profits. Other benefits were reduced turnover in the actuarial department and greatly improved process flow for all financial actuary efforts.
Once the emergency was past, the organization looked for other ways to apply the technology and methods. The IT department, less than cooperative during the implementation, took ownership of the data warehouse and chose to integrate with other "digital transformation" initiatives. They decided to re-platform the data warehouse to their "standard" database, which temporarily disrupted all upstream and downstream processes. Still, within a year and with access to much external data that wasn't available in the original data warehouse, the actuaries could make great strides in product design, pricing and underwriting. It wasn't a smooth transition, but it worked.
The moral of the story is that technology isn't everything. Even the best-executed implementation can be subject to atrophy. It takes diligence to keep a good thing going.
My take
Discussions about data warehouses tend to focus on technology, the rigidity of conformance to a schema, and the time and cost to change them. All true. But you rarely hear about how valuable they can be.