We've talked about information as bits, measuring information we've talked about counting, so we can use bits to count 0 0 0 1, 1 0 1 1, counting from zero up to three, modulo two. We've talked as bits as labeling, that we can use barcodes which are just bits to label things. And finally we've talked about how bits are physical, that all bits that we have in computers, all the bits of information that I'm conveying via the vibrations of my vocal chords and the vibrations of the air are actually physical systems, physical manifestations of information. And then we also talked about a discovery which is 150 years old, that all physical systems carry information, and that amount of information can be quantified. So number of bits is the logarithm to the base two of the number of possibilities, a result which ironically is inscribed on Boltzmann's grave. So now I'd like to give you another aspect of bits, and this a very 20th century aspect of bits of information. And that is the relationship between information and probability. So probability is something that we're all familiar with and all confused by, and I'm always confused by probability. Human beings are known to have a very bad intuitive sense of probability, we overestimate the probability of truly awful events, we underestimate the probability of fine, nice, normal events. Of course, from an evolutionary standpoint, overestimating the probability of some event like a sabretooth tiger dropping out of this tree and sinking his teeth into your neck, this is probably a good thing, which might be why. But there's a simple idea of probability, and let me try to demonstrate them right here. So let's take the example of heads and tails. I have here a nice shiny, new nickel that's been given to me by a member of the Santa Fe Institute, she didn't ask me to give it back either so I'm five cents ahead. So it can either be heads or tails. What do you think? What's the probability that it's head or that it's tails? Well I claim it's fifty-fifty. But why? Why is it one half? The probability that it's heads or tails. It was tails, I swear. So there are two notions of probability for heads and tails. So one notion is - and I claim that this is the kind of nicest, most intuitive notion - when I just flip it like this, I wasn't watching it on the air, I didn't know how hard I flipped it, I didn't see it before I put it down there. I have no reason for preferring heads over tails. Heads over tails are just a priori they have equal weight. Heads. It was heads by the way, now the probability is one that it was heads, that's the funny thing about probabilities. First you don't know and you have ones that are probabilities. These are called prior or a priori probabilities. So probability of heads is equal to the probability of tails is one-half, because there is no reason to prefer heads over tails. This is a good argument. So this is the prior probabilities of heads or tails, it's 50 percent. But there's another argument about why the probability of heads and tails would be 50 percent. So let me just try it like this, let me just this coin a bunch of times. Tails. Heads. Heads. Heads. Tails. Heads. Tails. Heads. Heads. So I actually got seven heads and three tails out of ten tosses. That was kind of dull, this is the problem. With probability it's dull and confusing and to figure out what's going on, you have to do it many times. Because I don't think that you're going to agree that this shiny new United States nickel really has a probability of having seven out of ten of having heads and three out of ten of having tails. It was just the luck of the draw, or the luck of the toss. It just so happens that there were seven heads and three tails, which, if you're flipping a coin ten times, is pretty reasonable. So if I were to flip this coin a whole bunch more times, which I'm not going to do because I know it will be dull, you would be very bored by this. So if I flip a coin, and I should say a fair coin, I should note that in my classes at MIT, the students all start out seeming to believe what I say, but after a few lectures, they become very distrustful. I don't know why this is, I seem like a trustworthy person. Anyway, I flip a fair coin m times and we look at the number of heads and the number of tails and the sum of the number of heads plus the number of tails is equal to m. I just flipped it ten times. And we're going to call the frequency, or the frequency of heads is just equal to the number of heads divided by m. So I flipped it ten times, I got seven heads, frequency is 0.7. Frequency of tails, as you may very well guess, is the number of tails over m, and that's equal to one minus the number of heads divided by m. Now what we expect, just from personal experience, is that if we just keep flipping the coin many many many times. Well, if I flip it 100 times, I certainly don't expect to get exactly 50 heads, which would be a frequency of exactly 0.5, matching the probability. But I would expect to get something a little better than 0.7, seven-tenths. That seems, you know, very unlikely, that if I flip it a hundred times I'm going to get 70 heads. It's perfectly possible, why not. So I will just give you the formula for this. So the expected number of heads, which is also the expected number of tails, because there's nothing to choose between them, is equal to 50 percent. I flip it 100 times, for example, m is equal to 100. Then m over two is equal to 50. So I'd expect to get it roughly 50, and then I'm going to use this notation, plus or minus, I'll explain what this is in a moment, plus one-half times the square root of m. So actually what you would expect means well it's roughly in this interval. I flip it 100 times, the square root of 100 is 10. I expect it to be roughly within five, might be a few more, might be seven or eight more but I'd be really kind of surprised if there were seventy heads and thirty tails. I would think it'd be more likely, you know 60 heads, 40 tails, but probably more like 55 and 45. And that's actually what you can do. So let's actually ask why is this so. So if I look at all different possible sequences H H T T H H H T H H H T you may notice that the first ten of these are pretty much what I got for when I was flipping the coin. Dot dot dot, which is a way of meaning et cetera. Just keeps on going, and then we're going to have n of these, and we're going to count the number of possible sequences with exactly m_h heads and m_t tails. Of course, because it's got to be heads or tails, at least unless it lands on its side, which I don't think it's going to do, this has got to add up to m. So I'm going to count the number of possible sequences with exactly m_h heads, m_t tails, the two have to add up to m. And what we're going to find out, well there's not so many sequences which are heads heads heads... tails. So there's going to be a very small number of sequences that have almost all heads and a few tails. There's similarly going to be very small number of sequences that have almost all tails and a few heads, and there's going to be humongous number of sequences that have roughly the same number of heads and of tails. So you can see, to relate this to information theory, each sequence is like a sequence of zeros and ones. You can call heads zero and tails one, this is just a long long long bit string. And so we can relate ideas of information, numbers of possible sequences with a particular pattern, in this case a particular number of heads and of tails to probability.