In the previous lectures we saw how computable
measures such as Shannon entropy and even

lossless compression have some limitations.

Here we will introduce a method that complements
Shannon entropy and extends the power of the

CTM method in the quest to seek for mechanistic
generative models.

We will also see how measuring algorithmic
complexity can help in analysing real-world

data and produce candidate models and therefore
how it contributes to the discussion of causation.

So in the last lecture we saw how we were
able to produce an empirical distribution

based on algorithmic probability and related
to the Universal Distribution.

But that method could only take us half way
because despite the computational power that

is required and that we put into its calculation
we will never be able to generate all possible

models by running all possible computer programs
to find real-world models and cover real-world

data.

To go as far as to cover models such as Newton’s
gravitation would probably require using the

computational power of the entire universe
to do it by brute force.

In reality, while we produced values for strings
up to length 30 or 40 bits, the longer the

strings the less of them are generated so
we had to find ways to carry this algorithmic

advantage to larger pieces of data and to
the generation of longer models.

The Block Decomposition Method (BDM) allows
us to extend the power of the Coding theorem

by using a combination of Shannon entropy
and algorithmic probability.

The idea is to divide a piece of data into
smaller segments for which we have algorithmic

probability approximations, and then count
the number of times that a local regularity

occurs in data.

The Block Decomposition Method (BDM) consists
in discovering what we call ‘causal patches’

for which a computational model is found and
thus low algorithmic complexity is assigned.

In a Rube Goldberg machine, for example, even
when there is a deliberate purpose to convolute

a process the chain of cause and effect is
all over the place and by focusing on those

places one can find the mechanistic nature
of something as obfuscated as this so-called

automatic napkin in this famous drawing.

BDM shows that the best solution to approximating
generative models is to make the best use

of both worlds, using each 
where relevant and practical and thus combining

local estimations of algorithmic complexity
in a clever way helped by classical information

theory, basically covering each other’s
back when they underperform, either because

entropy cannot find causal patches or because
the estimations of algorithmic complexity

are limited to local and small pieces of data,
then we have a much more powerful method that

can deal with longer data and larger models.

The idea is that we can cover any system or
object as we have on the screen with as many

local observations to feed our CTM method
that is able to deal with small objects and

then the aggregation of the information of
all those pieces would give us a good idea

of the nature of the dynamical system, and
even give us a landscape of which pieces are

of lower or higher algorithmic complexity
telling us, for example, which regions may

be subject to external sources of information
such as noise.

So this is how the formula for BDM looks like.

BDM decomposes a piece of data X

Into k smaller pieces for which estimations
to their algorithmic content have been pre-calculated

by CTM.

But if some of those pieces are exactly the
same we don’t just add them twice.

This is because 2 objects that are the same
should not have twice the same complexity.

For example, a tighter upper bound of the
algorithmic complexity of the string 10111

repeated twice is not their complexity added
separately but the complexity of the program

that repeats the same string twice.

So estimations are cleverly added up according
to classical information theory, because we

know how many bits we need to produce a repetition
which is the logarithm of the number of times

that such object with index i is repeated.

There are some other ways to do this even
more sophisticated and we have explored many

of them, for example not adding the complexity
of 2 pieces that look statistically similar.

So the more similar to each other, even if
statistically, we know that we can write a

computer program taking advantage of such
similarity to only include the complexity

of one of the pieces and then only add a fraction
of another similar one to the first one.

However, the performance of other ways to
add up such pieces was similar for practical

reasons so in general we will use this simple
version unless we specify otherwise.

There are some other parameters involved in
the formula, l tells the size in which the

object should be broken and m allows overlapping
those pieces for a more smooth coverage like

the one we saw in one of the Rube Goldberg
diagrams.

CTM, in contrast, requires the number of n
states of the rule space from which the estimation

of algorithmic complexity is made and k the
number of pieces.

You may remember that entropy produces a Bernoulli
distribution when plotting entropy values

of all strings up to some length sorted by
number of 1s.

Lossless compression also produce Bernoulli
distributions typical of statistical measures

suggesting that compression does not identify
any non-statistical regularity among all strings

that entropy could not already identify.

However, when using BDM a very different distribution
is produced and compared to the other ones

after normalizing between 0 and 1, the distribution
that results conforms with the theoretical

expectation identifying some strings that
should be assigned less randomness than what

entropy and also compression assign them.

This is the power we gain in connection to
causation by following this approach and we

belief it is fundamental because it does go
directly into the kind of approach able to

contribute in the challenge of causation,
for this reason we call these gaps between

the distribution, the causal gaps.

As an example of strings with high entropy
for any granularity and low compressibility

but low algorithmic complexity, there are
these 2 strings that turn out to be produced

by many small Turing machines.

So what would these CTM and BDM methods do
with, for example, the binary digital expansion

of a mathematical constant such as π that
we know is of low complexity but is also random-looking?

In some tests, we were able to show that our
methods can, (A) even if weakly but statistically

significant, tell cases such as π and the
Thue-Morse sequence apart from truly algorithmic

random sequences in the same base and with
the same length.

CTM assigns significantly lower randomness
(B,D–F) to known low algorithmic complexity

objects.

(B) If π is absolute Borel normal, as strongly
suspected and statistically demonstrated to

any confidence degree, π ’s entropy and
block entropy asymptotically approximate 1

while, by the invariance theorem of algorithmic
complexity, CTM would asymptotically approximate

0, and this is the behaviour we found when
applying our numerical methods.

Smooth transitions between CTM and BDM are
also shown as a function of string complexity

indicating that the combination of entropy
and carrying out an algorithmic measure able

to capture local algorithmic complexity is
sound.

An interesting property of BDM is that because
we can continue updating the values of CTM,

BDM can also always be improved and it is
optimal in that the worse BDM can do is to

behave as Shannon entropy as we proved in
this paper, but by taking these local estimations

of algorithmic complexity, we combine the
best of both worlds, so we keep the statistical

power of Shannon entropy and bring local improvements
to detect small causal patches along the way,

giving us key insights into the nature of
the objects of interest.

Remember that we can freely talk about the
entropy of π only because as an observer

we may not know that it is π, otherwise we
would not need to apply entropy to the sequence,

we would simply say that as a deterministic
process π is of the lowest entropy because

no digit in its expansion comes as a true
surprise.

But as an observer this is very different,
we never truly have access to the generating

source or, otherwise said, the associated
distribution in which an object such as π

would belong to a distribution with only π
as element.

But as an observer, just as it happens when
we throw a dice, we have no access to all

the information about the object and we are
forced to apply methods such as entropy, compression

or our methods and they would compete at detecting
features other than pure randomness where

we know no randomness exists.

Many tests can be performed to see if the
measures based on CTM and BDM conform to the

theoretical expectation, are robust and well-behaved.

We know, for example, that sorting strings
by complexity and measuring their logical

depth one should get a concave down curve
because logical depth assigns low logical

depth to both simple and random strings, with
only that in the middle of high logical depth.

And this is exactly what we get when sorting
strings by CTM and BDM but not when sorting

by compression or entropy even though CTM
and BDM are in alignment with compression

and entropy because CTM and BDM are generalisations
and improvements over both entropy and compression.

In the next unit we will provide more tests
that provide more evidence of the stability

and robustness of CTM and BDM.

To illustrate how CTM works, this is one example
of a model found by our method for illustration.

The rule of the Turing machine shown on top
produces the string 0011110001011 and represents

a short model, yet there were only very few
short computer programs able to generate and

is thus of high algorithmic complexity, yet
the string is produced in relatively short

time and thus has low logical depth.

So let S be an observation, for example of
the evolution of a dynamical system in space

and time t.

Then the estimation of the complexity of S
over space or time, denoted by C(S_t), entails

finding algorithmic models reproducing S_t,
or versions close to S_t, that can be aggregated

to produce a set of candidate generative models
producing and thus mechanistically explaining

S_t.

We call algorithmic sequence model to the
aggregation of smaller models constructing

a larger model explaining an observation.

This process will be the basis of what we
call Algorithmic Information Dynamics.

Estimations of CTM and BDM and even Logical
Depth can be obtained online through an online

complexity calculator that we have developed
which is in permanent development and is currently

a quite sophisticated piece of software showcasing
many of our methods.

Of course for heavy duty analysis you should
rather use our stand alone programs in several

programming languages including of course
the Wolfram Language but also available in

many other languages and packages.

The Online Algorithmic Complexity Calculator
has and continue evolving over time while

we incorporate more features.

The current version is not only able to retrieve
CTM and BDM values but also perform interventions

to data such as graphs in the way in which
we will see is necessary in Algorithmic Information

Dynamics.

And as we speak we are working on even more
sophisticated features that we think will

be of value to areas such as machine learning.

In conclusion, by introducing CTM and BDM
in contrast and in complementation to other

measures such as Shannon entropy we have seen
how important algorithmic information theory

and the role of Computability theory is in
the challenge of causality in science for

a 4th revolution towards model-driven causal
discovery.

In the next short animated video you will
find

a summary.