Now that we've learned about entropy, thermodynamics, and statistical mechanics,

we're ready to talk about Claude Shannon's formulation of information, which is now

called Shannon Information Theory. It turns out to be an important

concept in many areas of complex systems research. It's definitely something

you should know about. Claude Shannon was a mathematician working

in the 1940's at Bell Labs, which was part of AT&T. His focus was on a major

question for telephone communication: how to transmit signals most efficiently

and effectively across telephone wires. In developing a mathematical solution

to this problem, Shannon adapted Boltzmann's statistical mechanics ideas

to the field of communication, and used these ideas to define a particular,

somewhat-narrow, but extremely useful notion of information.

In Shannon's formulation of communication, we have a message source.

This is here a black box that emits messages, but you can think of it as someone you're

listening to over the phone. The message source emits messages, for example, words,

to a message receiver---that might be you, listening to your mother talking on the phone.

A message source, more formally, is the set of all possible messages this

source can send, each with its own probability of being sent next. And

a message can be a symbol or a number or word, depending on the context.

In our examples, they'll mostly be words. The information content in Shannon's

formulation is a measure of the message source, which is a function of the

message source, which is a function of the number of possible messages

and their probabilities. Informally, information content H is the amount

of "surprise" that the receiver has, upon receipt of each message.

So let me explain that by giving a couple of examples.

In my book, Complexity: a Guided Tour, I used some examples of my two

children, when they were much younger. As you can imagine, they're now

teenagers and they aren't so happy with being examples in my books, though

I won't name them here, but you can imagine any particular 1-year-old,

who's the message source, and this particular 1-year-old is talking to his

grandmother on the phone, but all he can sa is one word: da da da da da,

he says it over and over again. So, his messages consist of one word, da,

with probability 1---that's all he ever says. There's no surprise, therefore,

from grandma's point of view: she always knows what the next word is going

to be, and therefore, if there's no surprise, there's no information content.

So, the information content of this message source, that is, the 1-year-old,

is equal to 0 bits---information content in Shannon's formulation is measured

in "bits". In contrast, think about the 1-year-old's older brother, who's a

3-year-old. The 3-year-old is able to talk and say a lot of things, like, "hi grandma,

I'm playing Superman!" So here, the message source is a 3-year-old, and

the 3-year-old knows about 500 words, in english. And so, we can label

those, word 1, word 2, through word 500, and each of those has its own

probability of being said by the 3-year-old. We don't know those probabilities,

but let's just say we've measured his speech, we've recorded lots of hours

of his speech, and we can assign these words different probabilities.

Well, grandma doesn't know exactly what word is going to come next out of his mouth,

so she has more surprise this time, than for his brother, therefore, this

message source has more information content---it's greater than 0 bits.

Now, just to test your intuitive understanding of this idea, lets have a very short quiz!

Our quiz has two questions---which has higher Shannon information content---

and it asks you to compare two possible message sources, in question 1 and in question 2.