So last time we talked about joint probabilities.
The probability of X and Y,
and we talked about joint information.
If there are different labels X_i and Y_j
for these different possibilities,
we have the joint information with i and XY
is just the ordinary information defined over
this joint quantity, X and Y.
And I defined something something I call
mutual information.
So mutual information is just the information
in X on its own,
the amount of information you get if you
look at this variable X,
if it's a binary variable, like a coin flip,
it could be up to a bit,
but could be anywhere between 0 and 1 bit
of information.
So it's the sum of the individual informations
of X and Y on their own, minus the joint information,
and this is defined as the mutual information,
which we designate by having a little colon.
Now I argued, but not very convincingly I believe,
that this quantity measures the amount of
information that I get about X if I look at Y,
it also measures the amount of information
that I get about Y if I look at X.
And it also measures the information that
X and Y in some sense hold in common.
So let's explore that notion a little further.
And we're going to do it in the following fashion.
So we're going to define a conditional probability.
The probability of Y given X.
So this is the probability that it's raining
given that it's sunny.
Small, in most places, but actually not that
small in Santa Fe.
So reasonably small.
Now, what is the probability that it's
raining given that it's sunny?
A nice way of looking at that is we look at
our little diagram before.
So here is the probability the whole set of
events where it's sunny, here's the whole set
of events where it's raining.
And this is the set of events where it's
raining and sunny.
So if we want to actually get the conditional
probability that it's raining given that it's sunny,
then what we have to do is we have to look
at the probability that it is raining and sunny,
divided by the probability that it's sunny.
So this is just an identity, it says that the
probability that it's raining given that it's sunny
is the probability that it's raining and it's
sunny divided by the probability that it's sunny.
This is conditional probability.
And we can also define condition information.
So w can look at the information in Y given
that we know that the value of X is X_j.
Remember that we added this extra index to
say X is now about sunniness or not sunniness,
j = 0 means not sunny, j = 1 means that it
is sunny.
And what we can do is we can just take our
ordinary formula for information,
we sum over our value j for the possible values