202601121716
Status: #reference
Tags: Financial Machine Learning, Financial Data Structures
State: #nascient

Information-Driven Bars

One of the types of bars (data) outlined in Advances in Financial Machine Learning by Prado, they are a type of bar which are sampled more when we detect that new information has come into the system. The idea is to detect the presence of well-informed traders through some metric (Prado presents the Cumulative Sum of Signed Volume/Signed Ticks) and use that to place our trades. It does so by using the concept of cumulative sums of Signed Ticks or Signed Volume (if the data doesn't specify, one must fallback on the Tick Rule).

I think the rationale is that most modern markets are not traded by humans, but rather by algorithms and bots, and they do not follow human cycles. This is one of the reasons why Time Bars are discredited by Prado, since they oversample where not much happens and undersample in the actual areas of interest. Also, Time Bars are highly leptokurtic (see Kurtosis), so most ML models will fail, since most models explicitly or implicitly expect normality.

Information-Driven Bars (such as Imbalance Bars) directly aim to tackle that, by essentially ensuring that the sampling is done in such a way that each sample effectively bins things such that the same amount of information exists per sample. This therefore means that in regions of high-activity attributable to informed traders (those we are trying to shadow) we sample more, and otherwise we sample less, if at all.

The reward of going through the hassle of finding all those imbalances is that it transforms our sampled data into something much closer to a Normal Distribution which means standard statistical models can be applied on it. It does so by recovering homoscedasticity through clever sampling frequency.

Two main types exist:

File Folder Last Modified