B.I.S.S Research White Papers

The Myth of Data Centralization – Re-imagining Data Consolidation and Costs

Richard Robinson

Richard Robinson

A senior business executive with more than 25 years of experience in the financial industry, with a rare perspective that spans working in operations and technology positions at global custodial banks, international brokers, investment managers as well as core industry utilities.
Data is in vogue again – or perhaps, for the first time – within the Financial Services Industry. Regulation has been a primary driver for this trend as firms incur ever escalating costs to create new reports that align with each new regulation, which essentially utilize the same type of data.

The progressive use of the Legal Entity Identifier, regulatory mandates, security master, availability of historical market data, and issues around data cost and licensing have begun to converge, necessitating great attention. No longer can these issues be pushed to the side, addressed as simple technology implementations, or be an afterthought in response to a tactical need. The risk to institutions is too great – from regulatory requirements, to calculating exposures at the enterprise level, to being able to accurately process transactions. Under this larger umbrella, financial institutions need to find a way to better manage their data in order to be able to manage the firm’s risk, but more importantly harness the inherent value that resides in their own data.

One of the most common findings when looking at any data-related effort, is the proliferation of multiple data sets; whether in localized databases, multiple ‘global’ or ‘enterprise’ databases or tables, or specialized data repositories built to satisfy a quick need – from regulatory reports to trading books. In the past, this initial challenge has been viewed as a technology issue. While there has long been a rallying cry to centralize key reference data sets – typically client/account and security masters, the nirvana was viewed as having a pure ‘golden source’ of data that the enterprise would keep clean and nurture for all needs.

Most data initiatives end up being long, strategic, and costly projects, spanning multiple years. Ultimately, they do not achieve the original goals, or expectations are scaled back to cover a limited set of asset classes, or only a segment of entity / client data. Value is lost in cost overruns as wells as failure to achieve goals. Additionally, the inherent value of being able to transform data into information is never realized.

The False Grail of Centralization

Most financial institutions have failed in their quest to centralize data. Current approaches from Big Data to popular vendor solutions neither address the underlying issues, nor do they enable firms to implement a solution that will stand the test of time, scale, and change. At best, current vendor solutions address the traditional technology challenge of organizing data, or are niche solutions to very specific problems. Big Data is an excellent tool and one of the most innovative approaches to data for analytics purposes, but not core data management. Vendor solutions have very good business cases as well but often are stretched past their purpose by the purchasing users in trying to serve too many different business functions and processes within a single firm.

While the data universe continues to expand and become more complex, intensifying cost pressures have led to the scaling back of data initiatives. As per research by CEB TowerGroup (report “Transforming Attitudes to Data Management; Three Pitfalls of Underinvestment in Core Data Projects” – Webinar 25 June 2014), Capital Markets firms store over 4 times the amount of data than the average firm in other industries, resulting in employees spending over 50% of their work time in decision making and researching without finding value. This expanding data volume also results in rapidly expanding costs and puts significant pressure on previously existing simple storage solutions.

We have long ago learnt that some providers try being everything to everyone, but they suffer in quality, flexibility, and depth. At the opposite end of the spectrum is the niche solution provider that targets a very specific fit or purpose. The challenge is re-usability and leveraging the investments for other purposes that require the same data, but in a different format.

The industry revolves around one simple function – trading an asset with another entity. Yet, client data remains the top global issue (according to the same CEB TowerGroup report), with product data close behind. Till date, a majority of efforts have produced reports and dashboards that the business and management cannot rely on.

For over a decade, there has been a force building in financial services to create the ultimate ‘golden source’ for a firm’s reference data as the answer to these issues. Indeed, there have been efforts to create central, multi-tenant and reference data utilities. Creating a ‘golden source’ has been met with varying levels of success, but with ever escalating costs for implementation and ongoing changes as the environment shifts, complicated by regulatory mandates and new requirements for differentiating client types and product types. Industry utilities in this space remain a hopeful end game, but continue to be stymied by the perception of potentially sharing information with competitors, as well as the antiquated licensing and distribution contracts with data vendors.

The reality is that firms have multiple, but mostly siloed databases for client/entity and product/security reference data. Built for specific purposes, or as tactical one-offs that became institutionalized, these tables and databases exist many times on islands. While there is some level of manual or time-based synchronization, in many cases these tables and databases are left to proliferate independently.

Going Back to the Source

What needs to be considered is how financial institutions arrived at this situation. Firms did not set out to create siloed and fragmented data architectures. Many times, business operations were, and continue to be, split across:

  • Traditional lines along product types – equity vs fixed income vs derivatives, and so on.
  • Client organizations along client type – Institutional vs retail vs ‘middle market’.
  • Functional areas of front, middle, and back offices where independent data and IT organizations exist.

Existing data tables may have been deemed insufficient for new purposes or their legacy nature resulted in decisions to take an extract of key data, and create a ‘new more flexible database’ to satisfy the new needs. The key fact is that these existing, albeit fragmented, data sources were created to fit to a specific purpose. There is key data that is common to an enterprise, but in many cases, extensions of data end up being specific to a business function and the application or set of applications supporting it. There is data attached to entity and products that may have no value to a majority of the enterprise thus, there is no need to burden the enterprise with its impact and management.

The traditional approach, as mentioned previously, has been to look at centralization of all data as an ultimate goal. With the advances in metadata, the “data about data” and governance methodologies, the need is to view data as a business need rather than a technology problem. There are options available that could prove to be quicker, less expensive, and that create an enterprise that is more nimble, which is better able to access and take advantage of its data.

A Data Road Revisited

First, a firm needs to take inventory of its key data properties, as well as identify the specific business needs and functions the individual properties support. This should be a wide and varied net. Some of the initial questions to answer may be:

  • Is a client database supporting a KYC, onboarding, regulatory reporting or accounting process?
  • Do different entity tables exist for a front office trade attribution purpose that is disconnected from a back office settlement entity table, but relate to the same client firms?
  • Does a silo’d product table exist for classifying derivatives that receives a feed from a central product master for underlying security information?

Firms tend to bypass taking a thorough inventory and jump to a solution that they are planning to implement. Shortcuts commonly miss looking at all the business processes and doing a true current state analysis of the end-to-end lifecycle of the data. Most commonly forgotten are offline business-built spreadsheet or access database driven data stores and ‘mini’ ETL processes. BCBS 239 explicitly calls for a proper inventory of data properties and related functions they support, and is a key element of the DCAM (Data Management Capability Assessment Methodology).

These data sets need to be classified according to their Fit for Purpose, overlapping functional interaction, as well as interrelatedness of data. Based on these properties, it can be determined if the underlying tables are duplicative, interdependent, dependent, or independent.

In the quest to support a single golden source of centralized data, wholesale elimination of databases that support specific applications is disruptive, and potentially costly – not just from an implementation perspective, but from the potential for errors, system downtime and process impact. Established working systems are vulnerable to errors with migrated data, operational processes are potentially broken, and even with a diligent current state analysis, there are unknown risks to downstream systems and processes.

Weighing this impact, analyzing existing databases for fit for purpose, and understanding the data relationships between existing data properties, a firm can embark on the first step of creating a data governance and management plan. The key piece of the governance plan is to recognize the existence and necessity of multiple databases, which will continue to exist based upon their fitness for specific purposes. Utilizing metadata and governance principals, data can be ‘virtually’ centralized without actually implementing a physical central golden source.

Virtual centralization, by utilizing existing key data properties, retiring costly, inefficient and antiquated data properties, and creating an accessible data layer through the use of shared metadata, allows an enterprise to quickly realize better data synchronization and data quality, lower implementation costs, and deliver faster time to market for new efforts.

Traditional approaches to data storage ignore a couple of important business facts about the shared data – ‘immutable’ data can be semantically different depending on the business function/purpose, and there are real, significant differences in how data is used and transformed from one business function to the other.

A critical success factor is a proper data architecture supporting the metadata – the semantical view as well as the engineering of common definitions and value. By pairing a semantic view of data within a business function with its metadata; the necessary associations, impacts and interdependencies can be mapped, a governance model implemented, and a rational road map to data sanity, synchronization and management can be implemented.

EDMCouncil has worked over the past several years to create a Data Management Maturity Model (DMMM), which is rational, and focused on the business issues. Coupled with the organization’s other efforts, such as FIBO, tools that are part of DMMM provide a great foundation for beginning the investigative, solutioning and governance processes. One of the greatest lessons learnt from EDMCouncil’s efforts is that business architecture, not the technical architecture, should drive implementation.

Centralization of data, and elimination of duplicative data stores has been a core tenant of most of the industry’s efforts and methodology over the past 20 years. Perhaps this is the one most significant area that needs to be rethought and challenged.

Indeed, ‘duplicated’ data is viewed negatively in every context – from a necessary evil to the ultimate sin of data management. ‘Federated’ models of data management are potentially fraught with peril. But what if we shift the point of view to see duplication of specific data, based on business function and process, as an enabler of proper architecture, management, and governance?

Government and Data Governance

Let us view data management through the lens of government. Dictatorships are run centrally, with hard and fast rules that apply to everyone, every state, every city. Thus, they seem to operate cleanly to the casual observer. However, blanket rules management result in inflexible systems without the ability to satisfy some users or a variation. On the other hand, completely federated (Jeffersonian) governments, where each state creates its own rules, remain flexible for the purposes of that state, and can be highly efficient and nimble. However, communication between states can come to a complete halt, impacting information flow, and becoming more complex and near impossible when dealing with multi-state interactions. Sure, there is some sense of a central authority, but in the typical Jeffersonian federated government, the central authority is little more than a traffic cop or inventory taker of the often contradictory rules that have been created.

A true Federalist data model, should support a strong central governance structure, but allow for specific state/regional needs, which can be the most effective in managing the new data enterprise of today. While there is duplication, there is also a strong sense of coordination, governance, and rules on what is allowed. There can be hard and fast laws that apply to everything, where it makes sense, as well as governing principals where it is recognized that some level of flexibility is required across the enterprise.

Current trends for “federated” systems rely on a weak central command and control structure that simply provides inventory and transformation, with only general guidance and highly independent ‘data stewards’. Too often, these structures go too far in the spirit of flexibility, and lose sight of the value of central coordination, as opposed to central control. Reliance is put into titles like “Data Steward” and “Data Custodian” without putting thought into what that means operationally, process-wise, and structurally when integrated into the day-to-day business function. What has been missing is the centralized semantical business context, and intelligent metadata usage to create a tightly coupled, but still independent and flexible, data architecture.

The Federated State

In summary, financial institutions should not be eliminating all those ‘duplicative’ client reference databases just yet. They need to take another look at their ‘centralized’ security master and take stock of what they have, appreciate the value of it before replacing, consolidating, or simply ignoring its existence. Yes, there are definitely properties within the data enterprise that firms should get rid of, which are ugly, unwieldy, and dangerous. But too often, new data management efforts meant to solve perceived problems are more ugly, more unwieldy, and more dangerous to a firm because they are rushed into. Too often, firms assume they already have the answers, and run ahead without first taking stock and appreciating what they actually have. It’s time to step back and take a different look, and perhaps a different approach.

The key to this kind of implementation will be that the rules need to be business driven and not technology driven. The management of data is not solely at the end user or spoke levels, nor is it always at the central hub. Data should come into existence where it matters, die where it is no longer needed in the business process, and transform where it changes in a business process from one version of truth to another. But we also need to know how that data impacts the enterprise – upstream and downstream.

Richard Robinson

Richard Robinson

A senior business executive with more than 25 years of experience in the financial industry, with a rare perspective that spans working in operations and technology positions at global custodial banks, international brokers, investment managers as well as core industry utilities.