So in the previous lectures we saw how computable
measures such as Shannon entropy and even

lossless compression have limitations just
as algorithmic complexity has also limitation

to be computable, yet putting both together
are stronger than any of them in isolation.

In this lecture we will see how we can define
native measures of complexity for graphs and

networks.

We saw how algorithmic complexity is, in principle,
more robust than the application of entropy

in practice.

This is because algorithmic complexity is
invariant to object description up to some

constant.

But entropy requires a feature of interest
and when changing the point of view of the

observer from a degree sequence to an adjacency
matrices, values may diverge.

In a very popular book, Gell-Mann, a very
famous physicist that won the Nobel Prize

in 1969 though that the challenge of graph
complexity was fundamental in the new science

of complex systems and claimed that any reasonable
measure of complexity should have both completely

disconnected and completely connected graphs
to have minimal complexity.

Here we will not only show that we can conform
with this basic requirements with our methods

based on algorithmic complexity but that we
can say and do a lot more.

First, there is one obvious realisation to
make, that we have 2 extremes or lower and

upper bounds for the algorithmic complexity
of a graph of foxed size.

On the one hand, it is not difficult to see
how a fully connected or fully disconnected

graph should have low algorithmic complexity,
because their descriptions, and the computer

programs implementing them, are completely
trivial.

In the case of a complete graph it suffices
to connect every given node, and in a fully

disconnected graph it suffices to keep everything
disconnected, in both cases, the shortest

description of the graphs is simply basically
the number of nodes and so the algorithmic

complexity of these graphs is a function of
the logarithm of their number of nodes.

In this case, if the set of nodes is N, then
the logarithm of N is a good approximation

of their algorithmic complexity.

In contrast, in a truly algorithmic random
graph, one in which the way in which edges

are connected cannot be compressed, the algorithmic
complexity of the graph grows as a function

of the number of graphs, because the only
way to specify them is one by one.

Also notice that when we talk about the complexity
of a graph what we will be doing is to estimate

the complexity of the adjacency matrix of
that graph.

This is robust because the adjacency matrix
of a graph is an invariant of a labelled graph

and is a lossless description of the graph,
meaning that from the adjacency matrix of

the graph one can uniquely fully reconstruct
the graph without any loss of information.

We also know that, because of the invariance
theorem, describing the graph in any other

way, as long as it is without loss of information,
its complexity will converge up to a constant

or, in other words, the complexity of any
description of the graph should be about the

same.

By now we master the concept of the algorithmic
complexity of a string.

But, if we are talking about the algorithmic
complexity of an adjacency matrix, what do

we exactly proceed?

What we did was a clever hack to the original
model.

Instead of using a Turing machine with a one-dimensional
tape, we replaced the tape for a 2-dimensional

grid and allowed the Turing machine transition
rules to also explore the up and down directions.

So the question of the complexity of a graph
in our context using the coding theorem method

turns out to be what is the probability that
a 2-dimensional Turing machine produces the

adjacency matrix of a graph.

Notice that this kind of Turing machines had
been introduced before with other purposes

by Chris Langton under the name ‘Turmite’,
a combination of Turing and termite.

Here is an example of the rules of a 2-dimensional
Turing machine capable of generating a graph

by producing a 2-dimensional grid.

You can see that the only change to the original
formulation is that the tape is now a grid,

and the head can move up and down.

Just as in the case of 1-dimensional tape
Turing machines, very simple squares consisting

of a single cell are the most produced and
thus of lower algorithmic complexity.

The chief advantage of having extended the
model of the Turing machine is that, because

2-dimensional Turing machines are simply another
kind of computer program, everything in the

theory still holds, including all the theorems
that we have explored before, and so we can

still apply the Coding theorem and rely on
the invariance theorem.

These is, for example, the reduced set of
all cases of 3 by 3 matrices produced by all

2-dimensional tape binary Turing machines
with up to 4 states.

Reduced means that we are not showing here
all the symmetries, so the 3 by 3 matrix with

only black cells, for example, is not shown
because is a simple transformation from the

3 by 3 matrix with all-white cells.

This is a simplified way to see all cases
in a more compact way.

First thing to notice is that these patterns
look like going from simplicity to randomness,

when sorting by frequency of production, hence
conforming with the expectation of the Coding

theorem and it also provides evidence in favour
of Ockham's razor because we can see that

according to algorithmic complexity and the
Coding theorem, the most likely explanation

is also the simplest, even if simplest here
is intuitive but for all practical purposes

our work can be considered a formal vindication
of Ockham's razor and Epicurus Principle of

Multiple Explanations.

Here, in A, is one of the simplest patterns
generated by most 5-state Turing machines

only after the most popular pattern, the all
white and all black pattern, this is a simple

cell turned on.

Followed by subfigure B, the most complex
object generated in the same rulespace.

On top, with the letter C are what we call
jumpers, which, as expected, are simple matrices

that appear high in the output frequency distribution,
it looks like a tree and in fact is not the

most random case because we could not generate
all the objects of that size, so actually

is among the lowest complexity squares for
that size.

Interestingly, it does look like a tree.

We

can of course generalise this complexity measure
to any dimension by replacing the 2-dimensional

Turing machine tape by an n-dimensional tape.

In 3D I like to think in the way 3D printers
work, I would call it the 3D printer complexity

and combined with the concept of Logical Depth,
one could get an idea of a sophisticated object

by the time that one would need to wait for
a sophisticated 3D object to be printer.

One restriction is that, as we have seen,
the method cannot keep producing matrices

of arbitrary length and so we devised a divide-and-conquer
algorithm to partition a larger matrices in

smaller matrices for which we would have estimations
for their algorithmic complexity and put together

with provide an idea of some computable characteristics
of the object.

So we have come all the way from a graph to
its adjacency matrix to 2-dimensional Turing

machine tapes.

Something additionally that we have proven
is that the complexity adjacency matrix of

a matrix is a good representation of the complexity
of the complexity of an adjacency matrix of

any isomorphic graph.

There are all sorts of technicalities when
making this mapping.

For example, in this case, the graph has 9
nodes which is a multiple of 3 so the adjacency

matrix of the graph can be partitioned in
square 3 matrices of the same size, but this

may not always be the case, perhaps a graph
has 10 nodes and we would then have to divide

the large matrix into smaller matrices of
different size, or we could stick to matrices

of length 3 but allow overlapping.

We call these options that deal with the boundaries
the challenge of boundary conditions.

What we do is to simply divide the graph into
the largest possible number of matrices and

then sum their complexity in a clever way
with BDM.

BDM can be generalised to any dimension but
one has always to take care of the way in

which the decomposition of the original object
may produce left overs at the boundaries.

However, we have proven that such errors due
to uneven partitions are under control, they

converge and can even be corrected.

One way to deal with the decomposition of
n-dimensional tensors is to embed them in

an n-dimensional torus (n=2 in the case of
the one depicted here), making the borders

cyclic or periodic by joining the borders
of the object.

Depicted here are three examples of graph
canonical adjacency matrices embedded in a

2-dimensional torus that preserves the object
complexity on the surface, a complete graph,

a cycle graph and an Erdös-Rényi graph with
edge density 0.5, all of size 20 nodes and

free of self-loops.

Avoiding borders has the desired effect of
producing no residual matrices after the block

decomposition with overlapping.

So in this paper we provided evidence that
our methods were able to deliver the expected

numerical results, showing not only that unlabelled
graphs with different adjacency matrices have

about the same complexity but also that graphs
that belong to larger automorphism groups

were less complex than graphs that belong
to smaller automorphism groups.

This is also something that could have been
expected, because the number of graphs in

a group is the result of the number of symmetries
that the graph may have and thus the more

symmetry the more patterns one can find to
compress the graph.

An automorphism group can be seen as the group
of all labeled graphs structurally equivalent

to some other labelling of itself.

We tested this on more than 250 well known
graphs with different symmetries and automorphism

group sizes and we found it to be the case.

Simple graphs like complete graphs are algorithmically
simple and belong to large automorphism groups

Other cases, such as lattices are algorithmic
simple and have also a low automorphism count

but more complex graphs such as Cayley graphs
are in automorphism groups with themselves

and turn out to be algorithmic complex.

Moreover, in this other paper we proved that
the algorithmic complexity of an instance

of an automorphism group is equal to the complexity
of any other graph in the automorphism group

because there exists an algorithm that can
be implemented in a short computer program

and is thus of almost fixed size that produces
all them and it suffices to number them in

order to pull out a specific one.

In other words, the algorithmic complexity
of a labelled graph is about the same as its

unlabelled version and viceversa.

We have applied a lot of tests to our numerical
methods.

Two of them involve dual and cospectral graphs.

A dual is a graph whose vertices were exchanged
by edges and edges were exchanged by nodes.

The algorithm to convert one into another
is very simple and can be implemented by a

short computer program and thus we know that
dual graphs should have about the same algorithmic

complexity.

Co-spectral graphs are graphs that have the
same graph spectra, that is the sorted eigenvalues

from largest to smallest.

And it is not trivial to find co-spectral
graphs.

Yet, in both cases, we found that our method
performed very well, and consistently better

than other algorithms to approximate algorithmic
complexity such as popular lossless compression

algorithms such as LZW.

We also tested against BZIP and Shannon entropy.

One

concern is what happens if we start changing
the computing model and not only exploring

larger spaces.

Here we have the 1D versus 2D correlation
of values of algorithmic complexity for the

strings generated by both models.

The rank correlation was perfect.

A simple application is sorting Elementary
Cellular Automata by BDM.

In this 
we did

as intuition dictates and

When 
compared to compression we basically yielded

the same results.

Which is good as we performed equal or better
than the best other available approach to

approximate algorithmic complexity.

Notice also how we did this, as an illustration
of the way in which the BDM method works.

We took the evolution a CA and partitioned
it into smaller matrices for which we can

better approximate their algorithmic complexity.

Then putting everything together we provide
a value for the larger object.

The question is of course what we gain with
BDM as compared to other methods.

One way to see it is the following, let’s
say that

On this axis we have entropy and in this other
some other computable measure, can be a graph

theoretic one.

in this other axis some other computable measure,
can be a graph theoretic one.

What an algorithmic measure does is to tell
apart fake cases of random nature from truly

random ones, which is one of the most important
features that we want to know about data or

a network.

A non random network is likely generated by
some mechanism and is thus a piece in a chain

of cause and effect yet other measures will
collapse all those cases into a single one,

blindly assigning properties that may mislead
someone not interested in a particular property

but interested about the algorithmic content
and causal nature of the object.

Another way to see it is in this other diagram,

you may remember the typical Bernoulli distribution
that plotting the entropy values of a full

set of strings of fixed size would make

what we are doing is to add an additional
dimension to the study of objects such as

networks, a dimension related to the algorithmic,
mechanistic or causal nature of the object.

With this extra dimension we can separate
Erdos-Renyi graphs that are generated pseudo-randomly

versus E-R graphs that are truly random, that
is, they cannot be generated by a computer

program much shorter than the graph itself.

That is, the difference between a recursive
graph such as the graph that we introduced

before with a near maximal entropy degree
sequence and a graph that is algorithmic random.

In this diagram we can also find another type
of networks such as scale-free that may cover

a large area but we know an element of randomness
exists coming from the preferential attachment

algorithm and that such a process can be emulated
or can be algorithmic random.

The same for small-world networks whose rewiring
probability would determine its place in the

diagram but usually would be low in algorithmic
randomness.

On the other hand, there are 2 points common
to both statistical randomness and algorithmic

randomness in cases such as the fully disconnected
and fully connected graphs, that are located

at each extreme of the plot and

reach a maximum of entropy when the edge density
is about 0.5.

In the next lectures we will see how we can
exploit all this places in which a network

or an object can be.