A first and popular application of Shannon
Entropy is the study of statistical properties

of human languages.

We owe to Shannon himself this very first
application right when he was introducing

his new ideas on communication theory.

In linguistics, Entropy is closely related
to language redundancy and the frequencies

of occurrence of certain combination of letters
and words also called collocations and technically

known as grams, n-grams can be letters, pairs
of letters, triplets, words or even sentences,

is a kind of coarse-graining approach to studying
languages and is very useful to understand

Entropy and classical information theory.

Here we can see the frequency of n-grams calculated
from the United Nations' Universal Declaration

of Human Rights in 20 languages and illustrates
the Entropy rate calculated from these n-gram

frequency distributions.

The Entropy of a language is an estimation
of the probabilistic information content of

each letter in that language and so is also
a measure of its predictability and redundancy.

One can crack codes by simply plotting the
distribution of words as used in written or

spoken language.

One can think of a human language as some
sort of encryption code about human reality

to describe objects and events.

If you wanted to crack such code or language
from scratch, one strategy would be to plot

the distribution of words, because there would
be a strong tendency in any two languages

to use words in about the same frequency when
they are sharing some reality or describing

the same human reality.

So, for example, if both languages have articles
and prepositions you might easily identify

and match them even from a very short text
because they are likely the most used even

if you may not know which words are the articles
or prepositions in that other language.

Here, for example, we are plotting two word
clouds from the distribution of words in the

United Nations declaration of Humans Rights
texts in both English and Spanish.

The size and colour of the word corresponds
to the frequency of that word in its corresponding

text.

One can see that running parallel texts one
would find a strong correspondence among word

meaning and word size.

And this would even work across different
texts and not only a text with its translation.

The same most frequent words will always appear
on top in average.

In this particular case we have that the top
20 most frequent words in this short texts

reveal a lot of the structure of both languages
as they reveal their meaning from each other

only by looking at the word frequency:
This frequency analysis is actually one of

the most basic strategies to crack codes because
any minimal information about an encrypted

message will give up some secrets from its
distributions.

And this is also the basis of some theorems
in classical information theory that, for

example, guarantee perfect secrecy by replacing
high frequency words with low frequency words

when choosing a code to conceal the correspondence
in distributions, but perfect secrecy is actually

very difficult to achieve and usually any
human mistake leads to the eventual cracking

of the code.

As we said before, languages can be studied
by their distribution of letters or words,

so the choice of basic unit is very important
and so it is natural to ask how the Entropy

of a string may look like when taking blocks
of different number of letters from one letter

to two letters, and so on.

To illustrate this let's take the periodic
sequence of alternating 0s and 1s

As we had seen before, assuming the uniform
distribution, this sequence would have maximal

entropy suggesting that it is random if we
take as unit or micro state single bits, because

we have that there are as many 0s than 1s
in this sequence.

One bit is the finest granularity possible.

However, clearly this sequence should not
be random and taking blocks two bits at a

time would capture the regularity in this
sequence.

So it looks like there are granularities or
blocks in a sequence on which the application

of Entropy would either miss or capture the
statistical regularities of the sequence.

So, how to deal with this high dependency
on granularity or block size?

One way to overcome this is to always seek
for the regularity that minimizes the Entropy

of the sequence.

We will call this process of studying the
Entropy of an object as a function of increasing

granularity as Entropy rate, and to each of
the values of Entropy for different grain

or block lengths we will call it Block Entropy.

Notice, however, that variations of the same
concept have different names but are mostly

the same or are based on similar ideas.

If one looks at these plots, for example,
one can see that for the string of alternating

ones and zeros, there are coarse-graining
lengths that capture the periodicity of the

string, and others that break it, so minimums
of Entropy across all possible block sizes

up to half the size of the sequence would
capture a statistical regularity.

The best version of Shannon Entropy can therefore
be achieved by a function of variable block

size where the minimum value best captures
any (possible) periodicity of a string.

Here is illustrated with 3 strings of length
12 bits each, one regular, one periodic and

one more random-looking.

So when there is some regularity this is captured
by some block size.

Notice that one should never take minimums
when the block size is equal or larger than

half the length of the string because that
results in only one block and thus automatically

Entropy value of zero but the Entropy of a
single block has no meaning so it should not

be taken into account.

One other useful application of Entropy in
our context is to multi-dimensional objects.

For example, we will sometimes apply Entropy
to bi-dimensional objects where the granularity

analysis will consist in breaking a large
array into smaller arrays or blocks.

Here is an example of an array of 24 by 24
bits decomposed into smaller arrays or blocks

of size 6.

We will do this because some of the methods
that we will introduce may only deal with

small blocks and we want to be able to compare
Shannon Entropy to these other new methods

to approximate algorithmic complexity.

The application of Entropy to objects other
than strings and sequence will be exactly

the same.

So, for example, just as it was the case for
strings and sequences, applying Entropy on

a block of single colour cells like this one
will have Entropy equal to zero:

You can test it by yourself, no matter how
large the block, it will always have Entropy

equal to zero:
And this will be, of course, the case for

any block granularity:
But as soon as the object displays some diversity,

Shannon Entropy will diverge from zero, such
as in this example: