What Are the Aspects of Data Quality?
Financial data services firms spend significant amounts of money on market and financial reference data. They then spend even larger amounts of money to process it, cross-reference it, quality proof it, store it and get it integrated into their workflows.
In doing so, one of the key determinants of the effort and cost, as well as whether the data is fit for purpose in the first place is data quality. Although there are cases where data quality is a simple black and white case of whether a data point is correct or not, generally speaking data quality can be decomposed into timeliness, completeness or accuracy:
Data quality can include metadata aspects such as the age of the dataset, the source and collection time but also usage permissions. Lastly, it also makes sense to ensure the data is relevant so you are not flooded with data and cannot see the wood for the trees. Quality is often in the eye of the beholder and trade-offs sometimes have to be made between different data quality aspects.
How Do I Measure Data Quality?
Data curation means ascertaining the validity of a data set before it is used. This can vary widely depending on the context. In front office, you want real-time data. When doing a daily P&L or striking a NAV you have a (financial data) quality management process to curate the data prior to use. This can for instance include comparing different sources, taking an average price, screening for outliers or proxying missing data points to complete records. When preparing data for regulatory filings or disclosures to investors, there is much more stringent quality management process.
Generally, in a data management function such as in investment operations there is a high quality management process where:
These issues are then resolved by operations staff sometimes assisted by automated business rules. Data lineage is needed to be able to do root cause analysis and to answer questions from regulators or auditors as to the provenance of specific data or how certain valuations or risk numbers were arrived at.
It is important to keep track of data quality issues and periodically analyze them sorted by:
This can be the basis of optimally configuring the business rules and quality management workflow as well as selecting the best data sources.
Improving My Data Quality
Firms need to define all the Critical Data Elements (CDEs) and which data quality aspects they want to measure. This also includes capturing type I (a false positive, i.e. incorrectly flagging something as suspect) and type II (a false negative, not catching a mistake) errors. For this it is important to capture feedback from both the data quality team as well as downstream users in case of type II errors.
On top of this, firms need to track various metadata aspects such as permissions and permitted use cases. This helps to set the right expectations as to whether it is even appropriate to use the data in the context.
Having in place a dashboard that provides a bird’s eye overview and data quality intelligence is a precondition to improving data quality. Repeat issues should be captured in a business rule or even be automated. Data sources with persistent low quality could be swapped for alternatives to improve the service the data team provides to the business. Managed Data Services solutions or Data-as-a-Service can help here as well.