So the one last thing we have to do is show that our general case formula

agrees with our special case formula, where our special case formula

was when all the messages had equal probability.

So let me give our general case formula, again.

If x is a message source, then the information content of that message

source is equal to the sum from i = 1 to m---that is, we're going to sum

up all these terms of the probability of message i times the log base 2

of the probability of message i. That was our original, general formula.

Ok, now, what if we have all messages have equal probability? That means

that the probabilities are just equal to 1/m! So if there's two messages,

each one has probability 1/2, if there's three, each has probability 1/3,

and so on. So, in that case, H(x) = - sum from i = 1 to m of, well, p-sub-i

is 1/m, times log base 2 of 1/m, and if we sum up this m-times, it's just

going to be this plus this plus this m times---that's going to get rid of the

1/m term---so that's going to be equal to - log base 2 of 1/m---well, - log

base 2 of 1/m is just equal to log base 2 of m, using our log rules, and that's

the same that we had in our special case, when all the probabilities were

equal.

So, stepping back, the reason Shannon wanted to do all this measuring

of information content was for the purpose of coding and, in general,

for optimal coding of signals that go over telephone wires. He showed

that the information content, as defined earlier, gives the average number

of bits that it takes to encode a message, say, a signal over a telephone

line, from a given message source, given an optimal coding---and Shannon's

original paper, which was turned into a book, shows that rigorously mathematically.

So the idea is that you have a particular set of messages, that have particular

probabilities that you've measured. You can find the optimal number

of bits that it would take to encode each message on average, and this

basically gives the amount of compressability of the text---the higher the

information content, the less compressable. So if you're interested in

going further with this, you can google the notion of Huffman coding, which

shows how to do optimal compression. I'm not going to talk more about

this during this course, but it's a really interesting and important area that's

affected our ability, in general, to use technologies such as telephone communication,

internet communication, and so on, so it's extremely important.

Finally, I want to say a few words about the notion off meaning with

respect to Shannon information content.

You probably noticed that Shannon information content, while it's about

probabilities and numbers of messages produced by a message source,

it has nothing to say about the meaning of the messages, the meaning

of the information, what function it might have for the sender or the

receiver. Really, the meaning of information comes from information processing,

that is what the sender or receiver <i>does</i> upon sending or receiving

a message. We're going to talk about this in detail in unit 7, when we talk

about models of self-organization, and how self-organizing systems

process information, in order to extract meaning.