So the one last thing we have to do is show that our general case formula
agrees with our special case formula, where our special case formula
was when all the messages had equal probability.
So let me give our general case formula, again.
If x is a message source, then the information content of that message
source is equal to the sum from i = 1 to m---that is, we're going to sum
up all these terms of the probability of message i times the log base 2
of the probability of message i. That was our original, general formula.
Ok, now, what if we have all messages have equal probability? That means
that the probabilities are just equal to 1/m! So if there's two messages,
each one has probability 1/2, if there's three, each has probability 1/3,
and so on. So, in that case, H(x) = - sum from i = 1 to m of, well, p-sub-i
is 1/m, times log base 2 of 1/m, and if we sum up this m-times, it's just
going to be this plus this plus this m times---that's going to get rid of the
1/m term---so that's going to be equal to - log base 2 of 1/m---well, - log
base 2 of 1/m is just equal to log base 2 of m, using our log rules, and that's
the same that we had in our special case, when all the probabilities were
equal.
So, stepping back, the reason Shannon wanted to do all this measuring
of information content was for the purpose of coding and, in general,
for optimal coding of signals that go over telephone wires. He showed
that the information content, as defined earlier, gives the average number
of bits that it takes to encode a message, say, a signal over a telephone
line, from a given message source, given an optimal coding---and Shannon's
original paper, which was turned into a book, shows that rigorously mathematically.
So the idea is that you have a particular set of messages, that have particular
probabilities that you've measured. You can find the optimal number
of bits that it would take to encode each message on average, and this
basically gives the amount of compressability of the text---the higher the
information content, the less compressable. So if you're interested in
going further with this, you can google the notion of Huffman coding, which
shows how to do optimal compression. I'm not going to talk more about
this during this course, but it's a really interesting and important area that's
affected our ability, in general, to use technologies such as telephone communication,
internet communication, and so on, so it's extremely important.
Finally, I want to say a few words about the notion off meaning with
respect to Shannon information content.
You probably noticed that Shannon information content, while it's about
probabilities and numbers of messages produced by a message source,
it has nothing to say about the meaning of the messages, the meaning
of the information, what function it might have for the sender or the
receiver. Really, the meaning of information comes from information processing,
that is what the sender or receiver *does* upon sending or receiving
a message. We're going to talk about this in detail in unit 7, when we talk
about models of self-organization, and how self-organizing systems
process information, in order to extract meaning.