So in the previous lectures we saw how computable measures such as Shannon entropy and even lossless compression have limitations just as algorithmic complexity has also limitation to be computable, yet putting both together are stronger than any of them in isolation. In this lecture we will see how we can define native measures of complexity for graphs and networks. We saw how algorithmic complexity is, in principle, more robust than the application of entropy in practice. This is because algorithmic complexity is invariant to object description up to some constant. But entropy requires a feature of interest and when changing the point of view of the observer from a degree sequence to an adjacency matrices, values may diverge. In a very popular book, Gell-Mann, a very famous physicist that won the Nobel Prize in 1969 though that the challenge of graph complexity was fundamental in the new science of complex systems and claimed that any reasonable measure of complexity should have both completely disconnected and completely connected graphs to have minimal complexity. Here we will not only show that we can conform with this basic requirements with our methods based on algorithmic complexity but that we can say and do a lot more. First, there is one obvious realisation to make, that we have 2 extremes or lower and upper bounds for the algorithmic complexity of a graph of foxed size. On the one hand, it is not difficult to see how a fully connected or fully disconnected graph should have low algorithmic complexity, because their descriptions, and the computer programs implementing them, are completely trivial. In the case of a complete graph it suffices to connect every given node, and in a fully disconnected graph it suffices to keep everything disconnected, in both cases, the shortest description of the graphs is simply basically the number of nodes and so the algorithmic complexity of these graphs is a function of the logarithm of their number of nodes. In this case, if the set of nodes is N, then the logarithm of N is a good approximation of their algorithmic complexity. In contrast, in a truly algorithmic random graph, one in which the way in which edges are connected cannot be compressed, the algorithmic complexity of the graph grows as a function of the number of graphs, because the only way to specify them is one by one. Also notice that when we talk about the complexity of a graph what we will be doing is to estimate the complexity of the adjacency matrix of that graph. This is robust because the adjacency matrix of a graph is an invariant of a labelled graph and is a lossless description of the graph, meaning that from the adjacency matrix of the graph one can uniquely fully reconstruct the graph without any loss of information. We also know that, because of the invariance theorem, describing the graph in any other way, as long as it is without loss of information, its complexity will converge up to a constant or, in other words, the complexity of any description of the graph should be about the same. By now we master the concept of the algorithmic complexity of a string. But, if we are talking about the algorithmic complexity of an adjacency matrix, what do we exactly proceed? What we did was a clever hack to the original model. Instead of using a Turing machine with a one-dimensional tape, we replaced the tape for a 2-dimensional grid and allowed the Turing machine transition rules to also explore the up and down directions. So the question of the complexity of a graph in our context using the coding theorem method turns out to be what is the probability that a 2-dimensional Turing machine produces the adjacency matrix of a graph. Notice that this kind of Turing machines had been introduced before with other purposes by Chris Langton under the name ‘Turmite’, a combination of Turing and termite. Here is an example of the rules of a 2-dimensional Turing machine capable of generating a graph by producing a 2-dimensional grid. You can see that the only change to the original formulation is that the tape is now a grid, and the head can move up and down. Just as in the case of 1-dimensional tape Turing machines, very simple squares consisting of a single cell are the most produced and thus of lower algorithmic complexity. The chief advantage of having extended the model of the Turing machine is that, because 2-dimensional Turing machines are simply another kind of computer program, everything in the theory still holds, including all the theorems that we have explored before, and so we can still apply the Coding theorem and rely on the invariance theorem. These is, for example, the reduced set of all cases of 3 by 3 matrices produced by all 2-dimensional tape binary Turing machines with up to 4 states. Reduced means that we are not showing here all the symmetries, so the 3 by 3 matrix with only black cells, for example, is not shown because is a simple transformation from the 3 by 3 matrix with all-white cells. This is a simplified way to see all cases in a more compact way. First thing to notice is that these patterns look like going from simplicity to randomness, when sorting by frequency of production, hence conforming with the expectation of the Coding theorem and it also provides evidence in favour of Ockham's razor because we can see that according to algorithmic complexity and the Coding theorem, the most likely explanation is also the simplest, even if simplest here is intuitive but for all practical purposes our work can be considered a formal vindication of Ockham's razor and Epicurus Principle of Multiple Explanations. Here, in A, is one of the simplest patterns generated by most 5-state Turing machines only after the most popular pattern, the all white and all black pattern, this is a simple cell turned on. Followed by subfigure B, the most complex object generated in the same rulespace. On top, with the letter C are what we call jumpers, which, as expected, are simple matrices that appear high in the output frequency distribution, it looks like a tree and in fact is not the most random case because we could not generate all the objects of that size, so actually is among the lowest complexity squares for that size. Interestingly, it does look like a tree. We can of course generalise this complexity measure to any dimension by replacing the 2-dimensional Turing machine tape by an n-dimensional tape. In 3D I like to think in the way 3D printers work, I would call it the 3D printer complexity and combined with the concept of Logical Depth, one could get an idea of a sophisticated object by the time that one would need to wait for a sophisticated 3D object to be printer. One restriction is that, as we have seen, the method cannot keep producing matrices of arbitrary length and so we devised a divide-and-conquer algorithm to partition a larger matrices in smaller matrices for which we would have estimations for their algorithmic complexity and put together with provide an idea of some computable characteristics of the object. So we have come all the way from a graph to its adjacency matrix to 2-dimensional Turing machine tapes. Something additionally that we have proven is that the complexity adjacency matrix of a matrix is a good representation of the complexity of the complexity of an adjacency matrix of any isomorphic graph. There are all sorts of technicalities when making this mapping. For example, in this case, the graph has 9 nodes which is a multiple of 3 so the adjacency matrix of the graph can be partitioned in square 3 matrices of the same size, but this may not always be the case, perhaps a graph has 10 nodes and we would then have to divide the large matrix into smaller matrices of different size, or we could stick to matrices of length 3 but allow overlapping. We call these options that deal with the boundaries the challenge of boundary conditions. What we do is to simply divide the graph into the largest possible number of matrices and then sum their complexity in a clever way with BDM. BDM can be generalised to any dimension but one has always to take care of the way in which the decomposition of the original object may produce left overs at the boundaries. However, we have proven that such errors due to uneven partitions are under control, they converge and can even be corrected. One way to deal with the decomposition of n-dimensional tensors is to embed them in an n-dimensional torus (n=2 in the case of the one depicted here), making the borders cyclic or periodic by joining the borders of the object. Depicted here are three examples of graph canonical adjacency matrices embedded in a 2-dimensional torus that preserves the object complexity on the surface, a complete graph, a cycle graph and an Erdös-Rényi graph with edge density 0.5, all of size 20 nodes and free of self-loops. Avoiding borders has the desired effect of producing no residual matrices after the block decomposition with overlapping. So in this paper we provided evidence that our methods were able to deliver the expected numerical results, showing not only that unlabelled graphs with different adjacency matrices have about the same complexity but also that graphs that belong to larger automorphism groups were less complex than graphs that belong to smaller automorphism groups. This is also something that could have been expected, because the number of graphs in a group is the result of the number of symmetries that the graph may have and thus the more symmetry the more patterns one can find to compress the graph. An automorphism group can be seen as the group of all labeled graphs structurally equivalent to some other labelling of itself. We tested this on more than 250 well known graphs with different symmetries and automorphism group sizes and we found it to be the case. Simple graphs like complete graphs are algorithmically simple and belong to large automorphism groups Other cases, such as lattices are algorithmic simple and have also a low automorphism count but more complex graphs such as Cayley graphs are in automorphism groups with themselves and turn out to be algorithmic complex. Moreover, in this other paper we proved that the algorithmic complexity of an instance of an automorphism group is equal to the complexity of any other graph in the automorphism group because there exists an algorithm that can be implemented in a short computer program and is thus of almost fixed size that produces all them and it suffices to number them in order to pull out a specific one. In other words, the algorithmic complexity of a labelled graph is about the same as its unlabelled version and viceversa. We have applied a lot of tests to our numerical methods. Two of them involve dual and cospectral graphs. A dual is a graph whose vertices were exchanged by edges and edges were exchanged by nodes. The algorithm to convert one into another is very simple and can be implemented by a short computer program and thus we know that dual graphs should have about the same algorithmic complexity. Co-spectral graphs are graphs that have the same graph spectra, that is the sorted eigenvalues from largest to smallest. And it is not trivial to find co-spectral graphs. Yet, in both cases, we found that our method performed very well, and consistently better than other algorithms to approximate algorithmic complexity such as popular lossless compression algorithms such as LZW. We also tested against BZIP and Shannon entropy. One concern is what happens if we start changing the computing model and not only exploring larger spaces. Here we have the 1D versus 2D correlation of values of algorithmic complexity for the strings generated by both models. The rank correlation was perfect. A simple application is sorting Elementary Cellular Automata by BDM. In this we did as intuition dictates and When compared to compression we basically yielded the same results. Which is good as we performed equal or better than the best other available approach to approximate algorithmic complexity. Notice also how we did this, as an illustration of the way in which the BDM method works. We took the evolution a CA and partitioned it into smaller matrices for which we can better approximate their algorithmic complexity. Then putting everything together we provide a value for the larger object. The question is of course what we gain with BDM as compared to other methods. One way to see it is the following, let’s say that On this axis we have entropy and in this other some other computable measure, can be a graph theoretic one. in this other axis some other computable measure, can be a graph theoretic one. What an algorithmic measure does is to tell apart fake cases of random nature from truly random ones, which is one of the most important features that we want to know about data or a network. A non random network is likely generated by some mechanism and is thus a piece in a chain of cause and effect yet other measures will collapse all those cases into a single one, blindly assigning properties that may mislead someone not interested in a particular property but interested about the algorithmic content and causal nature of the object. Another way to see it is in this other diagram, you may remember the typical Bernoulli distribution that plotting the entropy values of a full set of strings of fixed size would make what we are doing is to add an additional dimension to the study of objects such as networks, a dimension related to the algorithmic, mechanistic or causal nature of the object. With this extra dimension we can separate Erdos-Renyi graphs that are generated pseudo-randomly versus E-R graphs that are truly random, that is, they cannot be generated by a computer program much shorter than the graph itself. That is, the difference between a recursive graph such as the graph that we introduced before with a near maximal entropy degree sequence and a graph that is algorithmic random. In this diagram we can also find another type of networks such as scale-free that may cover a large area but we know an element of randomness exists coming from the preferential attachment algorithm and that such a process can be emulated or can be algorithmic random. The same for small-world networks whose rewiring probability would determine its place in the diagram but usually would be low in algorithmic randomness. On the other hand, there are 2 points common to both statistical randomness and algorithmic randomness in cases such as the fully disconnected and fully connected graphs, that are located at each extreme of the plot and reach a maximum of entropy when the edge density is about 0.5. In the next lectures we will see how we can exploit all this places in which a network or an object can be.