Now that we've learned about entropy, thermodynamics, and statistical mechanics,
we're ready to talk about Claude Shannon's formulation of information, which is now
called Shannon Information Theory. It turns out to be an important
concept in many areas of complex systems research. It's definitely something
you should know about. Claude Shannon was a mathematician working
in the 1940's at Bell Labs, which was part of AT&T. His focus was on a major
question for telephone communication: how to transmit signals most efficiently
and effectively across telephone wires. In developing a mathematical solution
to this problem, Shannon adapted Boltzmann's statistical mechanics ideas
to the field of communication, and used these ideas to define a particular,
somewhat-narrow, but extremely useful notion of information.
In Shannon's formulation of communication, we have a message source.
This is here a black box that emits messages, but you can think of it as someone you're
listening to over the phone. The message source emits messages, for example, words,
to a message receiver---that might be you, listening to your mother talking on the phone.
A message source, more formally, is the set of all possible messages this
source can send, each with its own probability of being sent next. And
a message can be a symbol or a number or word, depending on the context.
In our examples, they'll mostly be words. The information content in Shannon's
formulation is a measure of the message source, which is a function of the
message source, which is a function of the number of possible messages
and their probabilities. Informally, information content H is the amount
of "surprise" that the receiver has, upon receipt of each message.
So let me explain that by giving a couple of examples.
In my book, Complexity: a Guided Tour, I used some examples of my two
children, when they were much younger. As you can imagine, they're now
teenagers and they aren't so happy with being examples in my books, though
I won't name them here, but you can imagine any particular 1-year-old,
who's the message source, and this particular 1-year-old is talking to his
grandmother on the phone, but all he can sa is one word: da da da da da,
he says it over and over again. So, his messages consist of one word, da,
with probability 1---that's all he ever says. There's no surprise, therefore,
from grandma's point of view: she always knows what the next word is going
to be, and therefore, if there's no surprise, there's no information content.
So, the information content of this message source, that is, the 1-year-old,
is equal to 0 bits---information content in Shannon's formulation is measured
in "bits". In contrast, think about the 1-year-old's older brother, who's a
3-year-old. The 3-year-old is able to talk and say a lot of things, like, "hi grandma,
I'm playing Superman!" So here, the message source is a 3-year-old, and
the 3-year-old knows about 500 words, in english. And so, we can label
those, word 1, word 2, through word 500, and each of those has its own
probability of being said by the 3-year-old. We don't know those probabilities,
but let's just say we've measured his speech, we've recorded lots of hours
of his speech, and we can assign these words different probabilities.
Well, grandma doesn't know exactly what word is going to come next out of his mouth,
so she has more surprise this time, than for his brother, therefore, this
message source has more information content---it's greater than 0 bits.
Now, just to test your intuitive understanding of this idea, lets have a very short quiz!
Our quiz has two questions---which has higher Shannon information content---
and it asks you to compare two possible message sources, in question 1 and in question 2.