Blog

What is Data Lineage?

Written by Neil Sandle - Director, Product Management | 28-Oct-2024 23:16:18

Data lineage is the history or flow of data, including its origins, transformations, and usage. It is used to ensure that data is used in a transparent and responsible manner, and to trace the origins and uses of data within an organization.

 

Why is Data Lineage Required?

Data lineage maps your data from source to destination, including the transformations that data attributes undergo across business processes. These data transformations can, for example, be carried out by ETL tools, spreadsheets or business applications, or SQL and Python scripts. For adequate financial data management you want to know where data comes from and what happened to it along the way. You also want to know the source of the data in reports and in financial models or algorithms. For example, if you see ‘revenue’ or ‘price’ in a dashboard, you want to know how that attribute was created. How is it calculated and where is it recorded? Data lineage makes all of this clear.

To sum up, data lineage is a critical capability for:

  • The ability to explain how you arrived at a certain data point
  • Having a clear overview of the usage of data in your firm
  • Knowing the impact of changing anything in the landscape of data flows, for example what the impact would be of discontinuing a data source, upgrading an existing system or how a new application could be most effectively supplied with input data
  • Being prepared for audits from data providers

 

Regulatory Requirements on Data Management

With increasing automation and adoption of AI, having a clear understanding of data flows and how analytical models are being provided with data is more important than ever.

Regulators have increasingly taken an interest in sound data management, in financial data quality and – more recently – in data lineage capabilities for market and reference data management as well as for pricing models. Regulatory attention to data management received a major boost in 2013 when the Basel Committee for Banking Supervision published its Principles for effective risk data aggregation and risk reporting.

More recently, the ECB ran its Targeted Review of Internal Models (TRIM). The goal was to review the internal models for credit, market and counterparty credit risk at SIs (significant institutions) in the supervisory scope of the ECB. The opacity of many models had made it increasingly difficult for supervisors to assess whether risks had been captured correctly and consistently. The TRIM programme included reviewing the data that was fed into these models.

 

A Necessary Precondition for Effective Change Management

Data lineage can be seen as a subset of data governance, which is the set of rules and procedures that organizations use to maintain and control data. Data lineage has become an essential part of that governance because it provides further information about data on its way from source to destination.

With increasingly data intensive business operations, more detail required in regulatory reporting and increasing use of analytics and machine learning in day-to-day processes, data lineage becomes a prerequisite for both effective operations as well as risk management and change management.

Change is a given in financial services. If you have no insight into data flows you do not know which processing you are potentially breaking if you make changes! For example, you distribute data but you do not know who is picking it up downstream and in which context they are using it, or if they even periodically refresh it! In that case you also expose yourself to potential breaches due to inappropriate usage of data.