Just as the white SUVs are flooding our streets, “Big Data” seems to be the hot topic. While at first, it doesn’t seem hard to understand, in this article I will try to analyze the meaning of Big Data for a Trader.
Generally, “Big Data” is defined as the process of capturing, storing, processing, and transforming data into information or decisions, when they meet one of the three “V”; they cover large volumes, require Speed, and involve a variety of types and/or sources.
I prefer to define Big Data as the generalization of the use of a very powerful set of statistical and computational tools that gives us independence in analysis. In addition to the knowledge in my area of action, which allows me to construct hypotheses that I want to try, with “Big Data” I also have the tools to not get stuck with ideas in my head.
According to the Deutsche Börse Group:
“Technological innovations have contributed significantly to greater efficiency in the derivatives market. Through innovations in trade technology, negotiations in the German Eurex are now running much faster than a decade ago and knowing the sharp increase in the volume of operations and the number of quotes. These major improvements have only been possible due to continued investments in IT by derivatives markets and clearinghouses.”
The acceleration of the trading process has also reached the side of Traders. Michael Lewis, author of the book “Moneyball” (which has become known for the movie of the same name starring Brad Pitt) has written a non-fiction book, “Flash Boys”, where he describes the story of how “kids” begin to be able to trade at the fastest possible speed, not to mention near the speed of light. These guys, investing little money and using technology and big data concepts, go to war with the High-Frequency Trading (HFT) companies and the big American banks. I’m looking forward to the movie!
A major HFT-related event took place on May 6, 2010, at the Nasdaq Stock Exchange in New York, in what would become known as the 2010 Flash Crash. At 14:32, New York time, there was an extraordinary drop and rebound in the S&P 500 index. Algorithmic trading programs were chained, first in sales orders by their criteria of stop losses and then they were chained back in purchase orders – something never seen in history – it is estimated that the price variation between minimum and maximum for as little as 36 minutes was a historical record of trillions of American dollars.
I’ve heard all about Big Data, some accurate statements, and some wrong ones. But many have ceased to be true with the evolution of technologies and the very concept of Big Data. The main known barriers to the implementation of Big Data have always been the existing infrastructures in the company, the costs, the time of implementation, and the need for knowledge. With the exception of the first, the other barriers are falling with the new technologies of Big Data.
Big Data is very expensive – False – generated by the large supply of tools and by the adoption of free open source technologies, prices, previously in the sky, have dropped to levels very accessible to all sizes of companies.
It’s a computer thing – False – there has been a major transformation in the programming environments and in the Big Data and Analytics tools, making them more accessible to non-technical people interested in using them in their area of expertise.
Takes a long time to implement – False – new techniques of agile project management and job reuse lead to tempting times of implementation.
The piece that really was missing in this puzzle was training, but since already a couple of years ago, there is a relevant offer of Big Data online courses and masters on the subject.
They always ask me for examples of companies using Big Data. Creating a list of companies always scares me, because the world of Big Data gains followers daily – if I wrote a list, it’s very likely that it was obsolete by the time you’re reading it. In the trading world, there are many banks like Bank of America and JP Morgan involved in HFT. As for Spanish banks, there is very little public information on the use of HFT, however, with all the technology currently available. One would be naive if one thought that these have not raised the use of HFT in the currency arbitration and futures market for some commodities, where they have hedging positions.
In addition to HFT, some funds state that they use alternative information for their purchase and sale decisions. This information may vary but many say they use comments and sentiment analysis in online newspapers, in public CSS and RSS, in blogs, and on Twitter and other social networks. It is unclear exactly what they do, but AQR Capital Management and Two Sigma Investments claim to use Big Data in their investment decisions.
What one has to be clear about is that whatever your strategy, we are competing today with these specialized algorithms AT ALL TIMES.
In practice, in the world of trading, Big Data is providing:
Volume: Big Data shows us the way clearly to expand our strategy. Either to include more companies or portfolios to the existing strategy or to allow the creation of many strategies competing in parallel.
Variety: Big Data is allowed through algorithms to mix price history from Private API (Bloomberg or your broker) and Public API (yahoo finance, google finance) with alternative information such as CSS and RSS readers, Web scrapers, Twitter, and other social networks.
Speed: The combined use of various computing paradigms is making it possible for independent traders or small investment firms to compete for the first time in the HFT war with large banks, as described in the book Flash Boys. An example is the use of data banks in memory, in vector format, and with parallel calculations or distributed in several computers.
And that translates into:
Within a large bank, when you follow the full development flow of a strategy until its implementation, many tasks are done before and after the Trader’s intervention. The Technology Area needs to capture, collect and store the data, the methodology department usually does fundamental analysis or simulations with this data and finally, the data comes to the Trader for the creation of their strategies. When the strategy has already been designed, it is up to the Technology to prepare a prototype for the backtesting of the strategy and finally, the strategy is implemented as an automatic algorithm in production.
Using a multi-use programming language (multi-purpose programming language), for example, Python, a trader can now gradually acquire knowledge and do the work of others.
To give an example of what we are talking about from a practical point of view, within Python the problem of collecting historical price data is transformed into a simple call to the yahoo or google API or other provider using a data read command like DataReader.
The training offered at Big Data is on the rise. The war between paid and free tools has forced traditional companies like SAS and SPSS to liberalize free versions of their software or free courses of their analytics tool.
To conclude, on the one hand, Big Data tools are increasingly accessible and integrated, in the future, you can make use of the most appropriate database for the problem, whether it is structured, unstructured, on disk, in memory or distributed, almost without realizing it. Python is a new player where this homogenization is happening very quickly and it is possible to access, within the same language, several big data tools. My prediction is that languages that do not homogenize run the risk of extinction.
On the other hand, everyone is looking at the world of Analytics. Giants in the software industry are buying Analytics companies as “churros”. That is clear evidence that statistical knowledge will gain more and more relevance and in combination with data tools will be an indispensable weapon in the future, This prediction is also confirmed by all the master’s degrees in Big Data and Analytics that have already been established and are emerging every year.
But the big change is that all of the above can only mean one thing. Big data, currently, is made for the business expert and in our case, for the trader. Big Data was born with computer scientists and has attracted many statisticians. But, the clear thing for me, is that you get much better results if the business expert knows how to drive and fix the car, than the other way around.