In the previous lectures we saw how computable measures such as Shannon entropy and even lossless compression have some limitations. Here we will introduce a method that complements Shannon entropy and extends the power of the CTM method in the quest to seek for mechanistic generative models. We will also see how measuring algorithmic complexity can help in analysing real-world data and produce candidate models and therefore how it contributes to the discussion of causation. So in the last lecture we saw how we were able to produce an empirical distribution based on algorithmic probability and related to the Universal Distribution. But that method could only take us half way because despite the computational power that is required and that we put into its calculation we will never be able to generate all possible models by running all possible computer programs to find real-world models and cover real-world data. To go as far as to cover models such as Newton’s gravitation would probably require using the computational power of the entire universe to do it by brute force. In reality, while we produced values for strings up to length 30 or 40 bits, the longer the strings the less of them are generated so we had to find ways to carry this algorithmic advantage to larger pieces of data and to the generation of longer models. The Block Decomposition Method (BDM) allows us to extend the power of the Coding theorem by using a combination of Shannon entropy and algorithmic probability. The idea is to divide a piece of data into smaller segments for which we have algorithmic probability approximations, and then count the number of times that a local regularity occurs in data. The Block Decomposition Method (BDM) consists in discovering what we call ‘causal patches’ for which a computational model is found and thus low algorithmic complexity is assigned. In a Rube Goldberg machine, for example, even when there is a deliberate purpose to convolute a process the chain of cause and effect is all over the place and by focusing on those places one can find the mechanistic nature of something as obfuscated as this so-called automatic napkin in this famous drawing. BDM shows that the best solution to approximating generative models is to make the best use of both worlds, using each where relevant and practical and thus combining local estimations of algorithmic complexity in a clever way helped by classical information theory, basically covering each other’s back when they underperform, either because entropy cannot find causal patches or because the estimations of algorithmic complexity are limited to local and small pieces of data, then we have a much more powerful method that can deal with longer data and larger models. The idea is that we can cover any system or object as we have on the screen with as many local observations to feed our CTM method that is able to deal with small objects and then the aggregation of the information of all those pieces would give us a good idea of the nature of the dynamical system, and even give us a landscape of which pieces are of lower or higher algorithmic complexity telling us, for example, which regions may be subject to external sources of information such as noise. So this is how the formula for BDM looks like. BDM decomposes a piece of data X Into k smaller pieces for which estimations to their algorithmic content have been pre-calculated by CTM. But if some of those pieces are exactly the same we don’t just add them twice. This is because 2 objects that are the same should not have twice the same complexity. For example, a tighter upper bound of the algorithmic complexity of the string 10111 repeated twice is not their complexity added separately but the complexity of the program that repeats the same string twice. So estimations are cleverly added up according to classical information theory, because we know how many bits we need to produce a repetition which is the logarithm of the number of times that such object with index i is repeated. There are some other ways to do this even more sophisticated and we have explored many of them, for example not adding the complexity of 2 pieces that look statistically similar. So the more similar to each other, even if statistically, we know that we can write a computer program taking advantage of such similarity to only include the complexity of one of the pieces and then only add a fraction of another similar one to the first one. However, the performance of other ways to add up such pieces was similar for practical reasons so in general we will use this simple version unless we specify otherwise. There are some other parameters involved in the formula, l tells the size in which the object should be broken and m allows overlapping those pieces for a more smooth coverage like the one we saw in one of the Rube Goldberg diagrams. CTM, in contrast, requires the number of n states of the rule space from which the estimation of algorithmic complexity is made and k the number of pieces. You may remember that entropy produces a Bernoulli distribution when plotting entropy values of all strings up to some length sorted by number of 1s. Lossless compression also produce Bernoulli distributions typical of statistical measures suggesting that compression does not identify any non-statistical regularity among all strings that entropy could not already identify. However, when using BDM a very different distribution is produced and compared to the other ones after normalizing between 0 and 1, the distribution that results conforms with the theoretical expectation identifying some strings that should be assigned less randomness than what entropy and also compression assign them. This is the power we gain in connection to causation by following this approach and we belief it is fundamental because it does go directly into the kind of approach able to contribute in the challenge of causation, for this reason we call these gaps between the distribution, the causal gaps. As an example of strings with high entropy for any granularity and low compressibility but low algorithmic complexity, there are these 2 strings that turn out to be produced by many small Turing machines. So what would these CTM and BDM methods do with, for example, the binary digital expansion of a mathematical constant such as π that we know is of low complexity but is also random-looking? In some tests, we were able to show that our methods can, (A) even if weakly but statistically significant, tell cases such as π and the Thue-Morse sequence apart from truly algorithmic random sequences in the same base and with the same length. CTM assigns significantly lower randomness (B,D–F) to known low algorithmic complexity objects. (B) If π is absolute Borel normal, as strongly suspected and statistically demonstrated to any confidence degree, π ’s entropy and block entropy asymptotically approximate 1 while, by the invariance theorem of algorithmic complexity, CTM would asymptotically approximate 0, and this is the behaviour we found when applying our numerical methods. Smooth transitions between CTM and BDM are also shown as a function of string complexity indicating that the combination of entropy and carrying out an algorithmic measure able to capture local algorithmic complexity is sound. An interesting property of BDM is that because we can continue updating the values of CTM, BDM can also always be improved and it is optimal in that the worse BDM can do is to behave as Shannon entropy as we proved in this paper, but by taking these local estimations of algorithmic complexity, we combine the best of both worlds, so we keep the statistical power of Shannon entropy and bring local improvements to detect small causal patches along the way, giving us key insights into the nature of the objects of interest. Remember that we can freely talk about the entropy of π only because as an observer we may not know that it is π, otherwise we would not need to apply entropy to the sequence, we would simply say that as a deterministic process π is of the lowest entropy because no digit in its expansion comes as a true surprise. But as an observer this is very different, we never truly have access to the generating source or, otherwise said, the associated distribution in which an object such as π would belong to a distribution with only π as element. But as an observer, just as it happens when we throw a dice, we have no access to all the information about the object and we are forced to apply methods such as entropy, compression or our methods and they would compete at detecting features other than pure randomness where we know no randomness exists. Many tests can be performed to see if the measures based on CTM and BDM conform to the theoretical expectation, are robust and well-behaved. We know, for example, that sorting strings by complexity and measuring their logical depth one should get a concave down curve because logical depth assigns low logical depth to both simple and random strings, with only that in the middle of high logical depth. And this is exactly what we get when sorting strings by CTM and BDM but not when sorting by compression or entropy even though CTM and BDM are in alignment with compression and entropy because CTM and BDM are generalisations and improvements over both entropy and compression. In the next unit we will provide more tests that provide more evidence of the stability and robustness of CTM and BDM. To illustrate how CTM works, this is one example of a model found by our method for illustration. The rule of the Turing machine shown on top produces the string 0011110001011 and represents a short model, yet there were only very few short computer programs able to generate and is thus of high algorithmic complexity, yet the string is produced in relatively short time and thus has low logical depth. So let S be an observation, for example of the evolution of a dynamical system in space and time t. Then the estimation of the complexity of S over space or time, denoted by C(S_t), entails finding algorithmic models reproducing S_t, or versions close to S_t, that can be aggregated to produce a set of candidate generative models producing and thus mechanistically explaining S_t. We call algorithmic sequence model to the aggregation of smaller models constructing a larger model explaining an observation. This process will be the basis of what we call Algorithmic Information Dynamics. Estimations of CTM and BDM and even Logical Depth can be obtained online through an online complexity calculator that we have developed which is in permanent development and is currently a quite sophisticated piece of software showcasing many of our methods. Of course for heavy duty analysis you should rather use our stand alone programs in several programming languages including of course the Wolfram Language but also available in many other languages and packages. The Online Algorithmic Complexity Calculator has and continue evolving over time while we incorporate more features. The current version is not only able to retrieve CTM and BDM values but also perform interventions to data such as graphs in the way in which we will see is necessary in Algorithmic Information Dynamics. And as we speak we are working on even more sophisticated features that we think will be of value to areas such as machine learning. In conclusion, by introducing CTM and BDM in contrast and in complementation to other measures such as Shannon entropy we have seen how important algorithmic information theory and the role of Computability theory is in the challenge of causality in science for a 4th revolution towards model-driven causal discovery. In the next short animated video you will find a summary.