A first and popular application of Shannon Entropy is the study of statistical properties of human languages. We owe to Shannon himself this very first application right when he was introducing his new ideas on communication theory. In linguistics, Entropy is closely related to language redundancy and the frequencies of occurrence of certain combination of letters and words also called collocations and technically known as grams, n-grams can be letters, pairs of letters, triplets, words or even sentences, is a kind of coarse-graining approach to studying languages and is very useful to understand Entropy and classical information theory. Here we can see the frequency of n-grams calculated from the United Nations' Universal Declaration of Human Rights in 20 languages and illustrates the Entropy rate calculated from these n-gram frequency distributions. The Entropy of a language is an estimation of the probabilistic information content of each letter in that language and so is also a measure of its predictability and redundancy. One can crack codes by simply plotting the distribution of words as used in written or spoken language. One can think of a human language as some sort of encryption code about human reality to describe objects and events. If you wanted to crack such code or language from scratch, one strategy would be to plot the distribution of words, because there would be a strong tendency in any two languages to use words in about the same frequency when they are sharing some reality or describing the same human reality. So, for example, if both languages have articles and prepositions you might easily identify and match them even from a very short text because they are likely the most used even if you may not know which words are the articles or prepositions in that other language. Here, for example, we are plotting two word clouds from the distribution of words in the United Nations declaration of Humans Rights texts in both English and Spanish. The size and colour of the word corresponds to the frequency of that word in its corresponding text. One can see that running parallel texts one would find a strong correspondence among word meaning and word size. And this would even work across different texts and not only a text with its translation. The same most frequent words will always appear on top in average. In this particular case we have that the top 20 most frequent words in this short texts reveal a lot of the structure of both languages as they reveal their meaning from each other only by looking at the word frequency: This frequency analysis is actually one of the most basic strategies to crack codes because any minimal information about an encrypted message will give up some secrets from its distributions. And this is also the basis of some theorems in classical information theory that, for example, guarantee perfect secrecy by replacing high frequency words with low frequency words when choosing a code to conceal the correspondence in distributions, but perfect secrecy is actually very difficult to achieve and usually any human mistake leads to the eventual cracking of the code. As we said before, languages can be studied by their distribution of letters or words, so the choice of basic unit is very important and so it is natural to ask how the Entropy of a string may look like when taking blocks of different number of letters from one letter to two letters, and so on. To illustrate this let's take the periodic sequence of alternating 0s and 1s As we had seen before, assuming the uniform distribution, this sequence would have maximal entropy suggesting that it is random if we take as unit or micro state single bits, because we have that there are as many 0s than 1s in this sequence. One bit is the finest granularity possible. However, clearly this sequence should not be random and taking blocks two bits at a time would capture the regularity in this sequence. So it looks like there are granularities or blocks in a sequence on which the application of Entropy would either miss or capture the statistical regularities of the sequence. So, how to deal with this high dependency on granularity or block size? One way to overcome this is to always seek for the regularity that minimizes the Entropy of the sequence. We will call this process of studying the Entropy of an object as a function of increasing granularity as Entropy rate, and to each of the values of Entropy for different grain or block lengths we will call it Block Entropy. Notice, however, that variations of the same concept have different names but are mostly the same or are based on similar ideas. If one looks at these plots, for example, one can see that for the string of alternating ones and zeros, there are coarse-graining lengths that capture the periodicity of the string, and others that break it, so minimums of Entropy across all possible block sizes up to half the size of the sequence would capture a statistical regularity. The best version of Shannon Entropy can therefore be achieved by a function of variable block size where the minimum value best captures any (possible) periodicity of a string. Here is illustrated with 3 strings of length 12 bits each, one regular, one periodic and one more random-looking. So when there is some regularity this is captured by some block size. Notice that one should never take minimums when the block size is equal or larger than half the length of the string because that results in only one block and thus automatically Entropy value of zero but the Entropy of a single block has no meaning so it should not be taken into account. One other useful application of Entropy in our context is to multi-dimensional objects. For example, we will sometimes apply Entropy to bi-dimensional objects where the granularity analysis will consist in breaking a large array into smaller arrays or blocks. Here is an example of an array of 24 by 24 bits decomposed into smaller arrays or blocks of size 6. We will do this because some of the methods that we will introduce may only deal with small blocks and we want to be able to compare Shannon Entropy to these other new methods to approximate algorithmic complexity. The application of Entropy to objects other than strings and sequence will be exactly the same. So, for example, just as it was the case for strings and sequences, applying Entropy on a block of single colour cells like this one will have Entropy equal to zero: You can test it by yourself, no matter how large the block, it will always have Entropy equal to zero: And this will be, of course, the case for any block granularity: But as soon as the object displays some diversity, Shannon Entropy will diverge from zero, such as in this example: