1
99:59:59,999 --> 99:59:59,999
So last time we talked about joint probabilities.
2
99:59:59,999 --> 99:59:59,999
The probability of X and Y,
3
99:59:59,999 --> 99:59:59,999
and we talked about joint information.
4
99:59:59,999 --> 99:59:59,999
If there are different labels X_i and Y_j
for these different possibilities,
5
99:59:59,999 --> 99:59:59,999
we have the joint information with i and XY
is just the ordinary information defined over
6
99:59:59,999 --> 99:59:59,999
this joint quantity, X and Y.
7
99:59:59,999 --> 99:59:59,999
And I defined something something I call
mutual information.
8
99:59:59,999 --> 99:59:59,999
So mutual information is just the information
in X on its own,
9
99:59:59,999 --> 99:59:59,999
the amount of information you get if you
look at this variable X,
10
99:59:59,999 --> 99:59:59,999
if it's a binary variable, like a coin flip,
it could be up to a bit,
11
99:59:59,999 --> 99:59:59,999
but could be anywhere between 0 and 1 bit
of information.
12
99:59:59,999 --> 99:59:59,999
So it's the sum of the individual informations
of X and Y on their own, minus the joint information,
13
99:59:59,999 --> 99:59:59,999
and this is defined as the mutual information,
which we designate by having a little colon.
14
99:59:59,999 --> 99:59:59,999
Now I argued, but not very convincingly I believe,
15
99:59:59,999 --> 99:59:59,999
that this quantity measures the amount of
information that I get about X if I look at Y,
16
99:59:59,999 --> 99:59:59,999
it also measures the amount of information
that I get about Y if I look at X.
17
99:59:59,999 --> 99:59:59,999
And it also measures the information that
X and Y in some sense hold in common.
18
99:59:59,999 --> 99:59:59,999
So let's explore that notion a little further.
19
99:59:59,999 --> 99:59:59,999
And we're going to do it in the following fashion.
20
99:59:59,999 --> 99:59:59,999
So we're going to define a conditional probability.
21
99:59:59,999 --> 99:59:59,999
The probability of Y given X.
22
99:59:59,999 --> 99:59:59,999
So this is the probability that it's raining
given that it's sunny.
23
99:59:59,999 --> 99:59:59,999
Small, in most places, but actually not that
small in Santa Fe.
24
99:59:59,999 --> 99:59:59,999
So reasonably small.
25
99:59:59,999 --> 99:59:59,999
Now, what is the probability that it's
raining given that it's sunny?
26
99:59:59,999 --> 99:59:59,999
A nice way of looking at that is we look at
our little diagram before.
27
99:59:59,999 --> 99:59:59,999
So here is the probability the whole set of
events where it's sunny, here's the whole set
28
99:59:59,999 --> 99:59:59,999
of events where it's raining.
29
99:59:59,999 --> 99:59:59,999
And this is the set of events where it's
raining and sunny.
30
99:59:59,999 --> 99:59:59,999
So if we want to actually get the conditional
probability that it's raining given that it's sunny,
31
99:59:59,999 --> 99:59:59,999
then what we have to do is we have to look
at the probability that it is raining and sunny,
32
99:59:59,999 --> 99:59:59,999
divided by the probability that it's sunny.
33
99:59:59,999 --> 99:59:59,999
So this is just an identity, it says that the
probability that it's raining given that it's sunny
34
99:59:59,999 --> 99:59:59,999
is the probability that it's raining and it's
sunny divided by the probability that it's sunny.
35
99:59:59,999 --> 99:59:59,999
This is conditional probability.
36
99:59:59,999 --> 99:59:59,999
And we can also define condition information.
37
99:59:59,999 --> 99:59:59,999
So w can look at the information in Y given
that we know that the value of X is X_j.
38
99:59:59,999 --> 99:59:59,999
Remember that we added this extra index to
say X is now about sunniness or not sunniness,
39
99:59:59,999 --> 99:59:59,999
j = 0 means not sunny, j = 1 means that it
is sunny.
40
99:59:59,999 --> 99:59:59,999
And what we can do is we can just take our
ordinary formula for information,
41
99:59:59,999 --> 99:59:59,999
we sum over our value j for the possible values