So last time we talked about joint probabilities. The probability of X and Y, and we talked about joint information. If there are different labels X_i and Y_j for these different possibilities, we have the joint information with i and XY is just the ordinary information defined over this joint quantity, X and Y. And I defined something something I call mutual information. So mutual information is just the information in X on its own, the amount of information you get if you look at this variable X, if it's a binary variable, like a coin flip, it could be up to a bit, but could be anywhere between 0 and 1 bit of information. So it's the sum of the individual informations of X and Y on their own, minus the joint information, and this is defined as the mutual information, which we designate by having a little colon. Now I argued, but not very convincingly I believe, that this quantity measures the amount of information that I get about X if I look at Y, it also measures the amount of information that I get about Y if I look at X. And it also measures the information that X and Y in some sense hold in common. So let's explore that notion a little further. And we're going to do it in the following fashion. So we're going to define a conditional probability. The probability of Y given X. So this is the probability that it's raining given that it's sunny. Small, in most places, but actually not that small in Santa Fe. So reasonably small. Now, what is the probability that it's raining given that it's sunny? A nice way of looking at that is we look at our little diagram before. So here is the probability the whole set of events where it's sunny, here's the whole set of events where it's raining. And this is the set of events where it's raining and sunny. So if we want to actually get the conditional probability that it's raining given that it's sunny, then what we have to do is we have to look at the probability that it is raining and sunny, divided by the probability that it's sunny. So this is just an identity, it says that the probability that it's raining given that it's sunny is the probability that it's raining and it's sunny divided by the probability that it's sunny. This is conditional probability. And we can also define condition information. So w can look at the information in Y given that we know that the value of X is X_j. Remember that we added this extra index to say X is now about sunniness or not sunniness, j = 0 means not sunny, j = 1 means that it is sunny. And what we can do is we can just take our ordinary formula for information, we sum over our value j for the possible values