1
00:00:09,276 --> 00:00:13,710
Let's keep on looking at the relationship
between information and probability.
2
00:00:13,710 --> 00:00:24,570
Probability is just flipping coins, heads
heads, tails, thank goodness.
3
00:00:24,570 --> 00:00:32,405
Actually, by the way, I had a legal two-
headed quarter issued by the US Mint.
4
00:00:32,405 --> 00:00:33,986
Can you guess what it is?
5
00:00:33,986 --> 00:00:38,420
It's actually the New Hampshire quarter
that has George Washington on one side
6
00:00:38,420 --> 00:00:41,490
and it's got The Old Man of the Mountain
on the other.
7
00:00:41,490 --> 00:00:44,951
The Old Man of the Mountain is a rock formation
that looks like a head,
8
00:00:44,951 --> 00:00:50,669
unfortunately it fell down about ten years ago,
so now they're putting a new New Hampshire quarter.
9
00:00:51,000 --> 00:01:00,091
I also have a legal US quarter that has five heads,
and I urge you to consider what quarter that might be.
10
00:01:00,943 --> 00:01:06,632
So let's look at this relationship between
information and probability.
11
00:01:06,632 --> 00:01:14,236
Probability, we can either think of it as somethng
that's a prior probability, so the probability
12
00:01:14,236 --> 00:01:20,382
of heads is the probability of tails is one half,
because there is no particular reason to
13
00:01:20,382 --> 00:01:24,705
choose between heads or tails when I flip
this coin.
14
00:01:25,775 --> 00:01:30,015
But we can also think of it in terms of frequencies.
15
00:01:30,015 --> 00:01:37,097
If we have a long, long sequence of heads
and tails, I'm trying to make this random but
16
00:01:37,097 --> 00:01:41,757
I have a funny feeling it's exactly the same
sequence of heads and tails that I put there before.
17
00:01:41,757 --> 00:01:48,924
My students complain that whenever I write
a random string of zeros and ones on the board,
18
00:01:48,924 --> 00:01:54,125
it's always the same sequence, which can't
really be quite random, can it?
19
00:01:54,125 --> 00:01:59,747
Anyway, what we can do is we can say, let's
take a very very long sequence.
20
00:02:00,847 --> 00:02:05,617
Let's count the number of heads and tails,
21
00:02:06,097 --> 00:02:12,126
frequency of heads is equal to the number of heads
22
00:02:12,126 --> 00:02:15,556
divided by the total length of the sequence.
23
00:02:15,556 --> 00:02:18,134
The frequency of tails is the same.
24
00:02:18,134 --> 00:02:22,947
And then let's ask, we have this intuition that
if the coin is fair,
25
00:02:22,947 --> 00:02:26,777
that almost all such sequences that we would
generate
26
00:02:26,777 --> 00:02:30,356
would have roughly fifty percent heads and tails.
27
00:02:30,356 --> 00:02:34,307
Let me now make that mathematically more precise.
28
00:02:36,528 --> 00:02:44,766
So let's look at the total number of sequences
of coin tosses of length m,
29
00:02:44,766 --> 00:02:55,138
which have exactly m_h heads and m_t tails.
30
00:02:55,138 --> 00:03:02,638
The sum of these two things is equal to m,
the total length of the sequence.
31
00:03:02,638 --> 00:03:11,458
Now the total number of sequence of this is
a number that's called m choose m_h.
32
00:03:11,458 --> 00:03:21,128
It's the number of way, if I have a sequence
that's of length m, and I call heads zero and
33
00:03:21,128 --> 00:03:32,118
tails one, it's the number of binary sequences
of length m that have exactly m_h zeros
34
00:03:32,118 --> 00:03:35,178
and m_t ones.
35
00:03:35,442 --> 00:03:41,825
These are the same things because if I've
chosen the number that have exactly m_h
36
00:03:41,825 --> 00:03:47,142
heads is exactly the same number of sequence
that have m_t tails.
37
00:03:47,142 --> 00:03:48,760
So what is this number?
38
00:03:48,760 --> 00:03:52,711
This is equal to the following:
39
00:03:52,711 --> 00:04:06,263
number of ways of choosing, this is called
m choose m_h,
40
00:04:06,263 --> 00:04:10,514
it's the mathematical formulation and it's
pronounced as this funny thing with parentheses
41
00:04:10,514 --> 00:04:15,643
with m on top and m_h at the bottom, it's
called m choose m_h because it's the number
42
00:04:15,643 --> 00:04:30,914
of ways of choosing m_h spots or locations
out of m possible spots.
43
00:04:33,242 --> 00:04:37,283
So we have zero zero zero one one zero,
44
00:04:37,283 --> 00:04:42,475
I'm again falling into the same possibility.
45
00:04:42,475 --> 00:04:47,774
We have a whole bunch of different spots,
and we want to count the number of places
46
00:04:47,774 --> 00:04:50,936
where we can put zeros here.
47
00:04:50,936 --> 00:04:55,425
So we have a certain number of zeros and
we want to put them in a particular way.
48
00:04:55,425 --> 00:05:00,915
Now this total number is the total number of
sequences have exactly this number of m_h
49
00:05:00,915 --> 00:05:04,306
heads in there.
50
00:05:04,306 --> 00:05:09,135
So this number, I'll just write down what it is,
51
00:05:09,135 --> 00:05:24,700
m choose m_h is equal to m! / m_h! * (m-m_h)!
52
00:05:24,908 --> 00:05:31,650
which is equal to m! over m_h! * m_t!.
53
00:05:32,122 --> 00:05:37,230
I use this funny notation with an exclamation mark.
54
00:05:37,230 --> 00:05:45,593
The exclamation mark says if I have a number,
m!, that's equal to one times two times three
55
00:05:45,593 --> 00:05:51,994
times four times ... times m-1 times m.
56
00:05:51,994 --> 00:05:54,653
It's the product of all numbers up to m.
57
00:05:54,653 --> 00:05:59,673
So, for example, 2! is equal to one times two
equals two.
58
00:05:59,673 --> 00:06:07,813
3! is equal to one times two times three is
equal to six, et cetera.
59
00:06:07,813 --> 00:06:11,563
They get big really really fast, these numbers.
60
00:06:11,563 --> 00:06:16,125
So this number m choose m_h,
61
00:06:16,125 --> 00:06:21,363
m! is the total number of ways of rearranging
m objects.
62
00:06:22,076 --> 00:06:27,174
So if we have a certain number of zeros and
ones, we can rearrange these m objects in
63
00:06:27,410 --> 00:06:31,595
a particular way, but then if we have a zero
in the first place and a zero in the second place
64
00:06:31,595 --> 00:06:36,497
we don't care if this is this zero or if this
is the other zero. They're both zero.
65
00:06:36,497 --> 00:06:42,126
So we have to divide by the number of ways of
rearranging the numbers of zeroes between each other.
66
00:06:42,560 --> 00:06:47,062
That's this number m_h! - that's the number
of ways of rearranging the zeroes,
67
00:06:47,062 --> 00:06:50,555
m_t! is the number of ways of rearranging
the ones.
68
00:06:50,555 --> 00:06:57,474
Total number of sequences that have exactly
m_h zeroes and m_t ones is the total number
69
00:06:57,474 --> 00:07:01,533
of ways of rearranging theses sequences
divided by the number of ways of rearranging
70
00:07:01,533 --> 00:07:05,392
the zeroes divided by the number of ways
of rearranging the ones.
71
00:07:05,704 --> 00:07:09,963
This is a fact, if it doesn't make sense,
just suck it up.
72
00:07:10,564 --> 00:07:15,403
Now we have a nice mathematical formula
for this number.
73
00:07:16,042 --> 00:07:19,383
Here is another fun fact.
74
00:07:19,383 --> 00:07:24,144
So I'm now going to define a probability.
75
00:07:24,144 --> 00:07:30,115
The probability of heads, I'll call it q.
76
00:07:30,115 --> 00:07:34,903
Because we agreed before that the probability
of heads was one half, probability of tails was one half.
77
00:07:34,903 --> 00:07:42,702
So we'll define this to be the number of heads
divided by m.
78
00:07:42,702 --> 00:07:53,625
This is the frequency of heads, so it's
what we observe to be the number of heads.
79
00:07:53,625 --> 00:07:56,434
The observed probabilities, as it were, of heads.
80
00:07:56,434 --> 00:08:00,334
But it's just a frequency, we don't know if
it's the probability.
81
00:08:00,334 --> 00:08:09,205
And 1-q(h) is equal to q(t) is equal to the
frequency of tails.
82
00:08:15,335 --> 00:08:23,225
Here is the fundamental formula of information
theory.
83
00:08:23,225 --> 00:08:27,435
I'll say it again, the fundamental formula
of information theory.
84
00:08:31,696 --> 00:08:39,237
S for entropy, or I for information, it gets
written both ways because remember,
85
00:08:39,237 --> 00:08:45,047
entropy, the quantity that people discovered
in the 19th century is just information.
86
00:08:45,047 --> 00:08:51,867
Information required to label the different
possible configurations of atoms and molecules.
87
00:08:51,867 --> 00:08:58,658
In this case we're trying to label the total
number of possibilities of heads and tails,
88
00:08:58,658 --> 00:09:17,768
so this is equal to
-q(h) log base 2 q(h) - q(t) log base 2 q(t).
89
00:09:18,937 --> 00:09:21,137
This is a funny formula.
90
00:09:21,761 --> 00:09:27,769
There's these minus signs, there's these logarithms,
there's these probabilities, et cetera.
91
00:09:28,193 --> 00:09:34,350
So let me just actually write it out in a
particular case.
92
00:09:34,350 --> 00:09:42,976
Suppose, just to give you a feeling for what it is,
q(h) turns out to be exactly one half.
93
00:09:42,976 --> 00:09:48,230
So exactly half of the sequences are heads.
94
00:09:48,230 --> 00:09:53,841
And suppose q(t) is the same thing,
because it's one minus this, this is q(t).
95
00:09:53,841 --> 00:10:05,063
Then this quantity S or I, information, entropy
two sides of the same coin.
96
00:10:05,063 --> 00:10:14,442
So I is equal to -0.5 log base 2 of 0.5, because
this is 0.5, this is 0.5.
97
00:10:14,442 --> 00:10:22,499
And then we have -0.5, this is q(t), log base 2
of 0.5.
98
00:10:25,932 --> 00:10:28,432
There are two of these, each was multiplied 0.5.
99
00:10:28,432 --> 00:10:36,282
This is equal to -log base 2 of 0.5, because I
added them up.
100
00:10:36,282 --> 00:10:44,483
But log base 2 of 0.5, this is the power to
which two must be raised to equal to 0.5,
101
00:10:44,483 --> 00:10:46,143
and that's -1.
102
00:10:46,143 --> 00:10:57,255
This is equal to -(-1), because 2^(-1) is 0.5,
and that's equal to one, and it actually has a unit,
103
00:10:57,255 --> 00:10:59,648
it's one bit.
104
00:11:00,655 --> 00:11:06,143
So this formula, I won't call it magical,
it's just mathematical,
105
00:11:06,143 --> 00:11:08,614
but it's extremely useful.
106
00:11:08,614 --> 00:11:18,344
This formula says, if I have a coin, and I
flip it, and it's probability of heads is 0.5,
107
00:11:18,344 --> 00:11:27,446
it's probability of tails is 0.5, then the
amount of information that's encompassed
108
00:11:27,446 --> 00:11:31,956
in a single coin flip, if you'd like, the
amount of information that is generated
109
00:11:31,956 --> 00:11:35,128
by the coin flip, is one bit.
110
00:11:36,114 --> 00:11:38,635
It happens to be tails this time.
111
00:11:44,046 --> 00:11:58,600
So, flipping a coin with the frequency of
heads is the frequency of tails is 0.5,
112
00:11:58,950 --> 00:12:06,311
gives you one bit of information.
113
00:12:09,949 --> 00:12:16,559
Now, if we go back to our counting argument
about probability, what we find
114
00:12:16,559 --> 00:12:26,771
is that the number of bits required to describe
the ways of arranging exactly a certain number
115
00:12:26,771 --> 00:12:32,593
m_h of heads, remember this is a number,
we're just defining it to be m, the number of
116
00:12:32,593 --> 00:13:02,810
bits required to describe this is just equal
to m times (-q(h) log base 2 of q(h) -q(t) log base 2 of q(t)),
117
00:13:03,102 --> 00:13:09,952
and so this is equal to m times the amount
of information.
118
00:13:10,333 --> 00:13:14,732
I keep getting tempted to call it magical,
but I don't believe in magic.
119
00:13:14,732 --> 00:13:22,352
So this mathematical formula for the information
is just really a way of counting number of possibilities.
120
00:13:22,352 --> 00:13:28,301
We look at the number of possible ways of
getting exactly m_h heads.
121
00:13:28,301 --> 00:13:32,502
We take this logarithm, that's the number
of bits of information that are in this particular
122
00:13:32,760 --> 00:13:40,431
sequence, and we find that it's equal to m,
the total number of flips, times this quantity,
123
00:13:40,431 --> 00:13:42,211
the information.
124
00:13:42,972 --> 00:13:51,840
So just to summarize, fundamental formula
of information theory
125
00:14:05,663 --> 00:14:12,452
is that the amount of information, also the
amount of entropy, is equal to, if I have
126
00:14:12,701 --> 00:14:25,482
two possibilities, is just -q(h) log of q(h) - q(t) log q(t).
127
00:14:25,882 --> 00:14:35,603
If I have more possibilities, let's call
them k possible outcomes,
128
00:14:39,020 --> 00:14:48,064
then the amount of information is equal to
- sum of possible outcomes from i = 1 to k
129
00:14:48,064 --> 00:14:54,533
q(i) log base 2 of q(i).
130
00:14:54,533 --> 00:15:01,732
And we just get this from counting the
number of possibilities that go into constructing
131
00:15:01,732 --> 00:15:04,023
these frequencies.
132
00:15:04,023 --> 00:15:10,422
And this extremely useful formula is the basis
for all information theory, including the mathematical
133
00:15:10,422 --> 00:15:14,213
theory of communication, including the theory
of computation.