What is “alt data”, who is using it and why?

Radar explores the emergence of alternative data on the buyside, how vendors are approaching fund managers in a data “gold rush” and details any latent operational risks.

Raw data can now be collected from a wide variety of disparate sources such as satellite images, customer interactions, GPS data, sentiment and social media. Some alternative asset managers have been using this “big data” for a considerable time and more fundamental investors are now also joining the party to extract value and insight not previously available. Vendors have entered the space in number to offer a more commoditized product to those seeking “alpha”.

IBM defines big data as “a term applied to datasets whose size or type is beyond the ability of traditional relational databases to capture, manage, and process the data with low latency. It has one or more of the following characteristics– high volume, high velocity or high variety.” Automated systems are now able to quickly identify correlations and patterns from a combination of different and unlinked datasets. Alternative (‘alt’) data can supplement traditional research with new and unbiased data. An example of a vendor here is Dataminr, which runs analytics on Twitter data, and is known to have revealed preliminary reports of Volkswagen’s emissions scandal three days before the market reacted. Another is SpaceKnow which uses 2.2 billion satellite observations over 500,000 square kilometers to track 6,000 industrial facilities across China in order to generate an index of manufacturing activity, a welcome alternative to the official PMI numbers offered by the Chinese state.

Machine learning can synthesize much of the public data entered onto the internet, using natural language processing which helps the machine to analyse text to assess sentiment and context in language. Application of models and scores can rank this content, which can then be used to potentially guide investment decisions.

Internet users generate “data exhaust,” which is data created by their online activity. Single sign-on frameworks deliver even more detailed profiles of use, particularly when combined with cookies.

The acquisition and use of alt data by the fund management sector has been increasing steadily. A compliance head at a prominent global long/short hedge fund says, “it is very much mainstream; banks offer consulting around it but are not as embedded as the buyside. 300 vendors turned up to the recent AI & Data Science Trading Conference in New York.”

Its use and practice are far from standardized and can be tricky when data is presented in its raw form. Cleaning, preparing and analysing the data is a sophisticated process. The appeal of specialist vendors generally outweighs the temptation to build an internal team and framework to do this in-house. Data can be in various states of cleanliness – raw, processed and curated. Some vendors offer generic data to all clients while others have higher-priced data that is offered more exclusively. Executives can be required to allocate considerable time and labor to processing and scrubbing purchased data, dealing with incompatible formats and separating useful from non-useful data. Peter Greene, Vice Chair and Partner of the Investment Management Group at Lowenstein Sandler, agrees that the buyside is majorly invested in the wholesale acquisition and use of alt data, “whether my clients run a fund with $30bn or $200m, almost all are starting to dip a toe in the data waters. Some have a team of data scientists to comb through and curate the data, while others are just starting to buy and explore it. The key question all are asking is what the long term return on investment is here.”

Deloitte has reported that firms are primarily adopting alternative data to acquire information advantage; “any edge, even a narrow timing advantage, may yield a more effective trading signal, algorithm, or investment model.” Greenwich Associates surveyed 23 hedge funds located in the US and Europe to identify areas in which alt data plays a role in the investment process: 61 percent used data as a predictor for future market or sector movements; 48 percent for idea generation; 44 percent to research specific names; and 39 percent to find market mis-pricing and arbitrage opportunities.

Our compliance source adds color on why it is becoming so pervasive for fund managers, “the premise is simple. It’s the generation of alpha and better returns for investors. Either the PM feels they want to take a new position or want to evaluate their current position. It is another signal in the investment process. On occasion it can be the lead signal, confirming a hypothesis, or leading to one being discarded and a new one emerging.”

Process and best practice

Experienced acquirers of alt data have well established and documented policies and procedures. They talk of using a centralised data hub and repository, and watertight agreements with vendors. This is very specific to each fund and also relies heavily on the risk appetite of the firm. There has to be a partnership internally between traders, technology staff and compliance personnel. Everyone needs to understand the source of data and the risks associated with it.

The risks inherent with “Alt-data”

Peter Greene is unequivocal on the biggest risk with the use of alt data, “it is insider trading, which brings existential risk. Then comes privacy, personally identifiable information in the US, and GDPR in the EU. The latter are easier to control by exclusion via your contracts with data vendors.”

So far the government, best represented here by the Securities and Exchange Commission and the Department of Justice, has never brought an alt data insider trading case against a hedge fund. To be successful, it would need to prove materiality, that the data was non-public, and that it was obtained in breach of duty or misappropriated. Because of the considerable amounts being paid for the data, that suggests it is material. It is also highly likely to be non-public unless it has been “scraped” (data taken from websites and platforms and imported to local systems and repositories) but the leg where the government’s case should fall down relates to data provenance. Greene says, “the key is ensuring that all those involved in the supply chain from the origination of the data to the hedge fund have the requisite consents to gather and use it.”

With scraped data, it is assumed that the data is material and in most cases it has been obtained without permission as most websites’ terms of use expressly prohibit scraping of their data; where the case for material non-public information (MNPI) fails is the fact that the data is publicly available, and this protects a firm in the US as well as the EU. This comfort is removed in instances where a password is required to access the data.

If enforcement is ahead, where will regulators most likely strike?

When Eric Schneiderman was the New York State Attorney General, he used the Martin Act to pursue Thomson Reuters to stop the information provider giving a select group of clients privileged access to market-moving information in the University of Michigan’s Survey of Consumers two seconds before the rest of their subscribers saw it. Could a similar approach be used against innovative buyside firms that are getting an edge?

Greene suggests that the most likely impetus for the first enforcement case might derive from public interest and the interest of Congress in the tracking of individuals and the use of geolocation data. He says, “the government likes to bring cases it has a good chance of winning. It will look for an egregious case. Interestingly, the City of Los Angeles recently brought a case against the Weather Channel for unfair business practices in securing consent to share such consumer location data with vendors.”

But he also adds how far the sector has come in terms of its governance and compliance. Greene states, “the key is the understanding and diligence of the data vendors, and they understand much more about why we need what we request. Five years ago, we had to reject perhaps 20 percent of the data vendors but that is now probably approximately 10 percent. The hedge fund sector is very conservative around this now and has thought about it carefully.”