In this unit we will explore the concept of randomness, both intuitively and mathematically. We will see how algorithmic randomness provides the accepted mathematical definition of randomness and a very robust definition. 

Among the most common properties associated with randomness are the following:

A random object or random event traditionally means that it happened for no particular reason, almost by accident, it has no causal explanation, it just happens.

A random object or random event traditionally has no patterns, otherwise it is hard to believe it random.

A random object or random event is not something predictable.

So, how connected are these concepts between each other? Is there a mathematical definition characterising each or all of these notions? For example, can something be predictable but not causal, or be unpredictable but display patterns? Are any of these notions more fundamental than the others?  Can one or the combination of more of these notions derive into another? We will see how a theory of Algorithmic Information can answer these questions.

One common useful strategy to study randomness is to study random and non-random sequences. For example, the result of tossing a coin can be encoded in binary as a binary sequence each bit encoding an outcome, assuming tails is 0 and head is 1, a typical random run may look like this:

And even the behaviour of more sophisticated systems, such as forecasting the weather can be encoded with N symbols each corresponding to a possible forecast with, for example, rainy encoded by 1, cold by 2, warm by 3, sunny by 4 and so on:

As we have seen, a limitation of classical probability theory is the inability to make an objective distinction among the elements of a distribution. For example, under a uniform distribution, the sequence of 4 zeroes:

has exactly the same probability of occurring as any other sequence of length four:

because according to probability theory, each has a probability of 1/n, with n being the number of elements, in this case 16, so one over 16:

However, there is a feeling that a string such as a sequence of only 0s:

should be less random than another sequence of the same length:

when produced by a random process because the sequence of zeroes somehow looks atypical, predictable and patterned. This is because one can describe it with only a few words, while the latter seems to require more effort and a longer description.

It can also be said that the latter sequence, the more-random looking, seems to contain more information, as if it were encoding something meaningful, but it may also just happen to be random and we will see how algorithmic complexity can help us decide.

As we have seen and we will continue to see, the concept of classical information theory inherits some limitations from probability theory. For example, classical information theory can only recover some aspects associated to randomness just as any other computable measure wouldn't be able to do. Computable means what we saw in the last unit, that one can compute it with a computer program such as a Turing machine. So we will see that to characterise randomness we the need to introduce more powerful measures that will also turn to be not entirely computable, meaning that in general it would not be possible to encode it and run it as a computer program to get a definite answer. 

Let me show you how we will start introducing the concept of computation when talking about randomness and sequences. Let me start by a simple statistical example, a type of property that can be captured by a computer program. For example, a finite sequence of bits with a repeating pattern can be generated by a short computer program implementing a loop routine. In plain English one may say that the sequence of alternating 0s and 1s such as 01010101... and so on, can be described by the sentence "the repetition of zero and one n times". This description would require a computer program implementing a loop that prints 0 and 1 n times. In contrast, a more random-looking sequence would require a longer description, perhaps requiring to call out every bit one by one. This is something that Shannon entropy in classical information theory can also capture with the right parameters, in the case of this sequence of alternating 0 and 1 by, for example, taking units of 2-bits at a time, Shannon entropy will tell that the sequence is indeed very simple. But we will see examples in which Shannon entropy will fail and won't be able to recover.

This is because Shannon entropy can only deal with a rather small subset of properties associated to randomness. That subset is the subset of statistical properties. For example, sequences with period 1, sequences with period 2, or overlapping sequences by n bits, and so on. But not all possible properties are of this type. Thus for an atypical sequence to pass a randomness test based on Shannon entropy it would suffice to avoid any statistical regularity.