7 Data Sins Series: Can’t see The Wood for the GREEN Trees!

Written by Neil Sandle - Director, Product Management | 28-Oct-2024 13:58:52

"There is a lot of data, but no real information!"

Indeed, 90% of all the data in the world has been produced only in the last three years. So, one might naively suspect that we have all the information we need. But alas, it isn’t as simple as that and one must understand that there is a subtle difference between data and information. One can imagine a whole spectrum between the two extremes.

The Importance of a Single Source of Truth

To start, let’s recognise that data itself could come in many forms and be provided by many different (vendor) sources. But not until you have ‘a single consolidated source of truth’ which is modelled consistently through time, decisions will be rightly informed.

Yes, it is certainly self-evident that each time one is confronted with ‘multiple versions of the truth’, it becomes increasingly hard to make a decision with conviction. Nevertheless, even if there is such a thing as ‘a single version of the truth’ and it is available in-house, having just one data strain is still most often not enough to make an informed investment decision. This is because the information is actually created by combining two or more strains of clean, structured, and normalized data.

For example, to determine whether a stock is relatively cheap, one would need clean reference data like Price to Equity (P/Es), or Price to BookValue (P/B) for instance. That data on its own – however, clean it might be – will only turn into value when it’s compared to a proper peer group of similar companies that are in the same business. Such a peer group set of names hardly ever comes ‘out of the box’ of any system and would typically have to be sourced in-house experts. Whereas the rest of the data might very well come from several different vendors.

Seeing Through the ESG Data Forest

Deriving information out of combined consolidated clean data from several different vendors and in-house sources is the lifeblood of each financial institution and is essential to make any proper investment decisions. The overall aim of businesses in finance is then to normalize and squeeze as much information as one possibly can. And with that, it’s not unusual to have a system in place that can collect data from different vendors and sources; clean it, manage it; prepare it; and govern it, just to improve the overall quality before it’s distributed throughout the rest of the organization. A system like that can subsequently improve the overall data usage (and hence cost) of your entire range of investment processes. It prevents wasting precious time and money on reconciliation of (data and analytical) differences and eliminates redundancies further downstream.

And whilst most fund managers are starting to figure out how to combine and clean the essential time-series and reference data from their different sources, they are now also confronted with the burden of consolidating and normalizing ESG-data from the many additional vendors and (in-house) sources.

But with so much ESG data coming in from all directions, how can one see the wood through all these GREEN trees? How can one squeeze true asset allocation and investment decision information out of all of that, rather than merely accumulating data?

Well, to answer that question, one has to bear in mind that the main idea behind gathering ESG data is to analyse and discover whether there are any potential market externalities to be found behind all the investments – where an externality is deemed to be a consequence of an industrial or commercial activity which affects other parties without this being reflected in market price. This simple notion helps reduce the entire tsunami of available ESG vendor data to little else than our modern-day attempt to capitalize on any positive or negative externalities around the investment. And so, yes, the search of the sustainable positive externalities or continuous ‘spillover benefits’, is surely just the latest craze in alpha hunting that the buy-side participants have been after for decades.

Finding that real alpha, always required one to sift through (and clean up) a lot of data from different vendors and sources first – nothing new there. For example, carbon data is relatively easy to get from several providers when it concerns major equity allocations nowadays. But is it always rightly historically dated? If not, perhaps you might want to make some corrections here and there and align with other sources – and keep track of those while you are at it – before they become part of a final asset allocation decision.

And how does one map a company’s carbon information readily to the exposures that their bonds are giving you? Does that carbon allocation apply there too? All such reference data would need to be systematically mapped and cross-referenced before one could make sense of your overall fixed-income investments.

A final GREEN example of creating ESG information before you can actually see the wood for the trees is normalising ESG ratings that are sourced from the several different providers. To do this properly and create your own in-house overarching combined rating cleaned by sector or industry classes, you would indeed require an algorithmic approach. And by extension that approach would need a proper system to manage all the information and track its data (overrides and changes).

The Key to Decoding and Normalizing Data

So, there it is again: Having your own system is key! Especially, when you want this data to be centrally administered and disseminated to every possible discipline in the buy-side house such that one can manage the funds and the assets and reflect them properly against the applicable benchmarks.

An enterprise data management system empowers you to decode, normalize, and operationalize the combined reference and ESG data from many different sources and create genuine investment decision information. Whereas the cheaper alternative would be to just use data from a single source of course – but wouldn’t you lose out on a lot of combined information that way?

View full post