Forex Market

Trading, HFT and Big Data – Exposing Fake News!

Just as the white SUVs are flooding our streets, “Big Data” seems to be the hot topic. While at first, it doesn’t seem hard to understand, in this article I will try to analyze the meaning of Big Data for a Trader.

Generally, “Big Data” is defined as the process of capturing, storing, processing, and transforming data into information or decisions, when they meet one of the three “V”; they cover large volumes, require Speed, and involve a variety of types and/or sources.

I prefer to define Big Data as the generalization of the use of a very powerful set of statistical and computational tools that gives us independence in analysis. In addition to the knowledge in my area of action, which allows me to construct hypotheses that I want to try, with “Big Data” I also have the tools to not get stuck with ideas in my head.

According to the Deutsche Börse Group:

“Technological innovations have contributed significantly to greater efficiency in the derivatives market. Through innovations in trade technology, negotiations in the German Eurex are now running much faster than a decade ago and knowing the sharp increase in the volume of operations and the number of quotes. These major improvements have only been possible due to continued investments in IT by derivatives markets and clearinghouses.”

The acceleration of the trading process has also reached the side of Traders. Michael Lewis, author of the book “Moneyball” (which has become known for the movie of the same name starring Brad Pitt) has written a non-fiction book, “Flash Boys”, where he describes the story of how “kids” begin to be able to trade at the fastest possible speed, not to mention near the speed of light. These guys, investing little money and using technology and big data concepts, go to war with the High-Frequency Trading (HFT) companies and the big American banks. I’m looking forward to the movie!

A major HFT-related event took place on May 6, 2010, at the Nasdaq Stock Exchange in New York, in what would become known as the 2010 Flash Crash. At 14:32, New York time, there was an extraordinary drop and rebound in the S&P 500 index. Algorithmic trading programs were chained, first in sales orders by their criteria of stop losses and then they were chained back in purchase orders – something never seen in history – it is estimated that the price variation between minimum and maximum for as little as 36 minutes was a historical record of trillions of American dollars.

I’ve heard all about Big Data, some accurate statements, and some wrong ones. But many have ceased to be true with the evolution of technologies and the very concept of Big Data. The main known barriers to the implementation of Big Data have always been the existing infrastructures in the company, the costs, the time of implementation, and the need for knowledge. With the exception of the first, the other barriers are falling with the new technologies of Big Data.

Big Data is very expensive – False – generated by the large supply of tools and by the adoption of free open source technologies, prices, previously in the sky, have dropped to levels very accessible to all sizes of companies.

It’s a computer thing – False – there has been a major transformation in the programming environments and in the Big Data and Analytics tools, making them more accessible to non-technical people interested in using them in their area of expertise.

Takes a long time to implement – False – new techniques of agile project management and job reuse lead to tempting times of implementation.

The piece that really was missing in this puzzle was training, but since already a couple of years ago, there is a relevant offer of Big Data online courses and masters on the subject.

They always ask me for examples of companies using Big Data. Creating a list of companies always scares me, because the world of Big Data gains followers daily – if I wrote a list, it’s very likely that it was obsolete by the time you’re reading it. In the trading world, there are many banks like Bank of America and JP Morgan involved in HFT. As for Spanish banks, there is very little public information on the use of HFT, however, with all the technology currently available. One would be naive if one thought that these have not raised the use of HFT in the currency arbitration and futures market for some commodities, where they have hedging positions.

In addition to HFT, some funds state that they use alternative information for their purchase and sale decisions. This information may vary but many say they use comments and sentiment analysis in online newspapers, in public CSS and RSS, in blogs, and on Twitter and other social networks. It is unclear exactly what they do, but AQR Capital Management and Two Sigma Investments claim to use Big Data in their investment decisions.

What one has to be clear about is that whatever your strategy, we are competing today with these specialized algorithms AT ALL TIMES.

In practice, in the world of trading, Big Data is providing:

Volume: Big Data shows us the way clearly to expand our strategy. Either to include more companies or portfolios to the existing strategy or to allow the creation of many strategies competing in parallel.

Variety: Big Data is allowed through algorithms to mix price history from Private API (Bloomberg or your broker) and Public API (yahoo finance, google finance) with alternative information such as CSS and RSS readers, Web scrapers, Twitter, and other social networks.

Speed: The combined use of various computing paradigms is making it possible for independent traders or small investment firms to compete for the first time in the HFT war with large banks, as described in the book Flash Boys. An example is the use of data banks in memory, in vector format, and with parallel calculations or distributed in several computers.

And that translates into:

Within a large bank, when you follow the full development flow of a strategy until its implementation, many tasks are done before and after the Trader’s intervention. The Technology Area needs to capture, collect and store the data, the methodology department usually does fundamental analysis or simulations with this data and finally, the data comes to the Trader for the creation of their strategies. When the strategy has already been designed, it is up to the Technology to prepare a prototype for the backtesting of the strategy and finally, the strategy is implemented as an automatic algorithm in production.

Using a multi-use programming language (multi-purpose programming language), for example, Python, a trader can now gradually acquire knowledge and do the work of others.

To give an example of what we are talking about from a practical point of view, within Python the problem of collecting historical price data is transformed into a simple call to the yahoo or google API or other provider using a data read command like DataReader.

The training offered at Big Data is on the rise. The war between paid and free tools has forced traditional companies like SAS and SPSS to liberalize free versions of their software or free courses of their analytics tool.

To conclude, on the one hand, Big Data tools are increasingly accessible and integrated, in the future, you can make use of the most appropriate database for the problem, whether it is structured, unstructured, on disk, in memory or distributed, almost without realizing it. Python is a new player where this homogenization is happening very quickly and it is possible to access, within the same language, several big data tools. My prediction is that languages that do not homogenize run the risk of extinction.

On the other hand, everyone is looking at the world of Analytics. Giants in the software industry are buying Analytics companies as “churros”. That is clear evidence that statistical knowledge will gain more and more relevance and in combination with data tools will be an indispensable weapon in the future, This prediction is also confirmed by all the master’s degrees in Big Data and Analytics that have already been established and are emerging every year.

But the big change is that all of the above can only mean one thing. Big data, currently, is made for the business expert and in our case, for the trader. Big Data was born with computer scientists and has attracted many statisticians. But, the clear thing for me, is that you get much better results if the business expert knows how to drive and fix the car, than the other way around.

Crypto Daily Topic

Blockchain and Big Data: A Match Made in Heaven? 

The rise of the technological revolution has given birth to data-driven businesses. Organizations now collect large volumes of consumers’ data that is analyzed to make strategic business decisions that help drive profitability. The collection of massive consumers’ datasets, which is commonly known as Big Data, has become an established industry on its own with its revenue projected to grow to $103 billion by the year 2027. 

As Big Data continues to become more prevalent in modern-day businesses, it presents a slew of analytical problems to businesses looking to derive valuable insights from the data. Additionally, with the advent of the web of connected devices, consumers are also at the risk of privacy violations due to the increased probability of security breaches. 

But blockchain, a relatively new technology focused on data integrity and management, has the potential to transform the Big Data industry. And although the two technologies, blockchain and Big Data, may seem mutually exclusive on the surface, they complement each other to create powerful solutions for tech-driven enterprises. 

Where can Blockchain Help Big Data

Some of the biggest challenges facing the Big Data industry stem from poor data management. This is despite the numerous efforts by data scientists to come up with different data management systems. Even with the dynamic technological advancements, it’s becoming quite clear that the most modern tech-infrastructure can’t keep up the growing volume of data. 

As a result, poor data management breeds such other problems as data insecurity as well as inaccurate and incomplete records, also known as dirty data. Analysts and organizations have, therefore, been forced to spend a huge deal of their time and resources on data management that, in an ideal situation, would be spent on other core areas of the organization.

But with the advent of blockchain technology, data management is about to get a lot easier for both the data collectors and its consumers.  

By leveraging the fundamental properties of this novel technology, traditional data-processing infrastructure could be upgraded to manage data adequately. Below are some of the potentialities that the integration of Big Data and blockchain offers:

I) Enhance Data Security

The Big data industry struggles with the lack of adequate security to keep from malicious hackers and their advanced tools at bay. The current data management infrastructures cannot, therefore, be relied upon to keep consumers’ data secure. 

As a distributed ledger system, blockchain technology can be integrated into these data management infrastructures to improve their security. The fact that it uses cryptographic principles to record data in the network makes it almost impossible to breach.

In addition to the high-security standards, blockchain solutions for big data eliminate the need for a central infrastructure where data is stored. Instead, data is stored in a distributed network, making it impossible for a single party to generate enough computational power to alter the data in any way. 

II) Ensuring Data Integrity

Besides, drawing insights from the data, data scientists spend a great deal of their time verifying the data in their care and ensuring it is accurate and consistent.

Blockchain can relieve analysts of this tedious task by vetting this data before it’s recorded in the extensive data chain network. It, therefore, solves the persistent cases of inaccurate, repeated, and incomplete data and makes it easier to draw credible insights from the data. While verifying each dataset, blockchain technology also enhances transparency, given that any data recorded within the network can be traced.  

III) Allow Individuals to Monetize their Data

In today’s information age, data is the single most valuable commodity traded by giant tech companies as well as small enterprises. However, the owners of the data rarely benefit from this trade. They are reduced to mere data sources, while enterprises pocket all the profit from selling their data.  

This practice is about to change with the introduction of blockchain to Big Data. The technology is set to democratize data ownership, allowing consumers to regain absolute control of their data. Data monetization can be supported through a token-based economy or discount on products in exchange for personal data. 

Eventually, blockchain will create marketplaces where individuals can trade data directly with businesses. Unlike the current data market, blockchain marketplaces will be more transparent, allowing individuals to see how their data is being used even after the transaction has taken place. 

IV) Manage Data Sharing

As a decentralized ledger system, blockchain allows parties within a network to share data without the security risk factor. As such, it’ll be easier for, say, banks and hospitals to share an individual’s data effectively, improving service delivery. Additionally, the coordinated data sharing eliminates the cumbersome Know Your Customer (KYC) processes, saving institutions money and time. 

Even within an organization, data sharing will be seamless with the use of blockchain solutions that eliminate data silos. As a result, departments within an organization will collaborate efficiently to improve productivity. 

V) Real-Time Data Analysis

Blockchain in payment systems is used to facilitate real-time transactions. Today, there are several fintech innovations that use blockchain to process fast and real-time settlements of huge sums, irrespective of geographical barriers.

In the same way, blockchain-enabled systems can be used by organizations that require real-time analysis of large scale data to improve their services. For instance, if banks were to use these systems, it would enable them to observe changes in data in real-time and make quick decisions, such as block fraudulent transaction attempts or track irregular activities. 

VI) Predictive Analysis

Data stored on a blockchain network can be analyzed to give valuable insights, much like any other form of data. Considering the accuracy and security of blockchain data, the analyses derived from this type of data are more accurate than those from traditional data management systems. 

Additionally, owing to the distributed nature of blockchain and the huge computational power it offers, data analysts, even those in small organizations, can engage in extensive data analysis tasks. By leveraging the accuracy of the data stored therein, the computational power of the blockchain, and its resourcefulness, data analysts can predict and forecast different aspects of the business with utmost accuracy. 


Blockchain and Big Data technologies are set to radically transform the way businesses process and manage large volumes of data. As such, the integration of the two technologies to form a single solution will not only help businesses step up their data infrastructures, but also solve some of the inherent problems that come with managing large databases.

You must, however, appreciate that blockchain solutions in the Big Data industry may not be realized anytime soon due to the growing concern that blockchain application in Big Data is overly expensive. Most tech companies believe it is cheaper to store data on the traditional infrastructures than a blockchain network. This is because blocks can only store and process a limited amount of data, which is smaller compared to the large volumes of data collected per second by current Big Data systems. But blockchain is an ever-evolving technology, and hopefully, it will mature fast enough to address these concerns, allowing for its full implementation in Big Data management.