1
00:00:09,550 --> 00:00:18,793
We've talked about information as bits,
measuring information
2
00:00:19,663 --> 00:00:26,745
we've talked about counting, so we can use
bits to count 0 0 0 1, 1 0 1 1,
3
00:00:27,135 --> 00:00:30,125
counting from zero up to three, modulo two.
4
00:00:31,304 --> 00:00:34,071
We've talked as bits as labeling,
5
00:00:35,269 --> 00:00:43,779
that we can use barcodes which are just
bits to label things.
6
00:00:44,740 --> 00:00:49,389
And finally we've talked about how bits are
physical,
7
00:00:50,603 --> 00:00:57,352
that all bits that we have in computers, all
the bits of information that I'm conveying
8
00:00:57,352 --> 00:01:02,676
via the vibrations of my vocal chords and
the vibrations of the air
9
00:01:02,676 --> 00:01:09,061
are actually physical systems, physical
manifestations of information.
10
00:01:09,061 --> 00:01:13,434
And then we also talked about a discovery
which is 150 years old,
11
00:01:13,921 --> 00:01:17,587
that all physical systems carry information,
12
00:01:17,937 --> 00:01:21,555
and that amount of information can be quantified.
13
00:01:21,555 --> 00:01:30,083
So number of bits is the logarithm to the base two
of the number of possibilities,
14
00:01:30,185 --> 00:01:35,111
a result which ironically is inscribed on Boltzmann's
grave.
15
00:01:36,351 --> 00:01:42,614
So now I'd like to give you another aspect
of bits, and this a very 20th century aspect
16
00:01:42,614 --> 00:01:45,363
of bits of information.
17
00:01:45,363 --> 00:01:52,123
And that is the relationship between information
and probability.
18
00:01:56,914 --> 00:02:04,303
So probability is something that we're all
familiar with and all confused by,
19
00:02:04,303 --> 00:02:06,865
and I'm always confused by probability.
20
00:02:06,865 --> 00:02:10,949
Human beings are known to have a very bad
intuitive sense of probability,
21
00:02:11,537 --> 00:02:16,066
we overestimate the probability of truly
awful events,
22
00:02:16,066 --> 00:02:20,491
we underestimate the probability of fine,
nice, normal events.
23
00:02:21,000 --> 00:02:25,472
Of course, from an evolutionary standpoint,
overestimating the probability of some event
24
00:02:25,472 --> 00:02:31,167
like a sabretooth tiger dropping out of this
tree and sinking his teeth into your neck,
25
00:02:31,167 --> 00:02:34,751
this is probably a good thing, which might
be why.
26
00:02:34,751 --> 00:02:40,491
But there's a simple idea of probability,
and let me try to demonstrate them right here.
27
00:02:40,491 --> 00:02:44,322
So let's take the example of heads and tails.
28
00:02:44,901 --> 00:02:49,701
I have here a nice shiny, new nickel that's
been given to me by a member of the Santa Fe
29
00:02:49,701 --> 00:02:54,991
Institute, she didn't ask me to give it back
either so I'm five cents ahead.
30
00:02:54,991 --> 00:03:00,492
So it can either be heads or tails.
31
00:03:00,993 --> 00:03:05,830
What do you think? What's the probability that
it's head or that it's tails?
32
00:03:06,432 --> 00:03:09,572
Well I claim it's fifty-fifty. But why?
33
00:03:10,083 --> 00:03:15,074
Why is it one half? The probability that it's
heads or tails.
34
00:03:16,065 --> 00:03:18,384
It was tails, I swear.
35
00:03:21,032 --> 00:03:27,423
So there are two notions of probability for
heads and tails.
36
00:03:27,423 --> 00:03:33,193
So one notion is - and I claim that this is the kind
of nicest, most intuitive notion - when I just
37
00:03:33,193 --> 00:03:37,613
flip it like this, I wasn't watching it on the
air, I didn't know how hard I flipped it,
38
00:03:37,880 --> 00:03:39,910
I didn't see it before I put it down there.
39
00:03:39,910 --> 00:03:43,252
I have no reason for preferring heads over
tails.
40
00:03:43,252 --> 00:03:48,831
Heads over tails are just a priori they have
equal weight.
41
00:03:50,738 --> 00:03:55,719
Heads. It was heads by the way, now the
probability is one that it was heads,
42
00:03:55,728 --> 00:03:57,408
that's the funny thing about probabilities.
43
00:03:57,408 --> 00:04:01,206
First you don't know and you have ones that
are probabilities.
44
00:04:01,206 --> 00:04:06,668
These are called prior or a priori probabilities.
45
00:04:09,789 --> 00:04:15,601
So probability of heads is equal to the
probability of tails is one-half,
46
00:04:15,601 --> 00:04:20,951
because there is no reason to prefer heads
over tails. This is a good argument.
47
00:04:21,798 --> 00:04:27,330
So this is the prior probabilities of heads
or tails, it's 50 percent.
48
00:04:27,900 --> 00:04:35,574
But there's another argument about why
the probability of heads and tails would be 50 percent.
49
00:04:35,574 --> 00:04:40,251
So let me just try it like this, let me just
this coin a bunch of times.
50
00:04:40,251 --> 00:04:42,631
Tails.
51
00:04:42,631 --> 00:04:45,571
Heads.
52
00:04:45,571 --> 00:04:48,809
Heads.
53
00:04:48,809 --> 00:04:51,660
Heads.
54
00:04:51,660 --> 00:04:54,743
Tails.
55
00:04:54,743 --> 00:04:57,962
Heads.
56
00:04:57,962 --> 00:05:01,253
Tails.
57
00:05:01,253 --> 00:05:04,499
Heads.
58
00:05:04,499 --> 00:05:06,606
Heads.
59
00:05:07,241 --> 00:05:16,831
So I actually got seven heads and three tails
out of ten tosses.
60
00:05:16,831 --> 00:05:19,382
That was kind of dull, this is the problem.
61
00:05:19,382 --> 00:05:23,665
With probability it's dull and confusing
and to figure out what's going on,
62
00:05:23,665 --> 00:05:25,745
you have to do it many times.
63
00:05:25,745 --> 00:05:34,274
Because I don't think that you're going to
agree that this shiny new United States nickel
64
00:05:34,274 --> 00:05:42,813
really has a probability of having seven out
of ten of having heads and three out of ten of
65
00:05:42,813 --> 00:05:44,714
having tails.
66
00:05:44,714 --> 00:05:46,654
It was just the luck of the draw,
67
00:05:46,654 --> 00:05:47,994
or the luck of the toss.
68
00:05:47,994 --> 00:05:51,160
It just so happens that there were seven
heads and three tails, which, if you're flipping
69
00:05:51,160 --> 00:05:54,976
a coin ten times, is pretty reasonable.
70
00:05:57,645 --> 00:06:01,435
So if I were to flip this coin a whole bunch
more times,
71
00:06:01,435 --> 00:06:06,366
which I'm not going to do because I know it
will be dull, you would be very bored by this.
72
00:06:06,366 --> 00:06:14,725
So if I flip a coin, and I should say a fair coin,
73
00:06:14,725 --> 00:06:19,336
I should note that in my classes at MIT,
the students all start out seeming to
74
00:06:19,336 --> 00:06:23,774
believe what I say, but after a few lectures,
they become very distrustful.
75
00:06:23,774 --> 00:06:27,006
I don't know why this is, I seem like a
trustworthy person.
76
00:06:27,006 --> 00:06:46,592
Anyway, I flip a fair coin m times and we
look at the number of heads and the number of tails
77
00:06:46,592 --> 00:06:49,413
and the sum of the number of heads plus the
number of tails is equal to m.
78
00:06:49,673 --> 00:06:52,423
I just flipped it ten times.
79
00:06:52,863 --> 00:06:55,293
And we're going to call the frequency,
80
00:07:00,452 --> 00:07:03,521
or the frequency of heads
81
00:07:03,533 --> 00:07:08,446
is just equal to the number of heads divided
by m.
82
00:07:08,446 --> 00:07:14,621
So I flipped it ten times, I got seven heads,
frequency is 0.7.
83
00:07:14,621 --> 00:07:22,024
Frequency of tails, as you may very well guess,
is the number of tails over m,
84
00:07:22,024 --> 00:07:29,143
and that's equal to one minus the number of heads
divided by m.
85
00:07:31,158 --> 00:07:36,211
Now what we expect, just from personal
experience, is that if we just keep flipping
86
00:07:36,720 --> 00:07:40,301
the coin many many many times.
87
00:07:40,301 --> 00:07:46,962
Well, if I flip it 100 times, I certainly don't
expect to get exactly 50 heads,
88
00:07:46,962 --> 00:07:52,022
which would be a frequency of exactly 0.5,
matching the probability.
89
00:07:52,022 --> 00:07:57,821
But I would expect to get something a little
better than 0.7, seven-tenths.
90
00:07:57,821 --> 00:08:02,771
That seems, you know, very unlikely, that
if I flip it a hundred times I'm going to get 70 heads.
91
00:08:02,771 --> 00:08:06,863
It's perfectly possible, why not.
92
00:08:10,383 --> 00:08:13,451
So I will just give you the formula for this.
93
00:08:14,354 --> 00:08:24,013
So the expected number of heads, which is
also the expected number of tails, because
94
00:08:24,013 --> 00:08:26,163
there's nothing to choose between them,
95
00:08:26,163 --> 00:08:29,653
is equal to 50 percent.
96
00:08:29,653 --> 00:08:39,946
I flip it 100 times, for example, m is equal to 100.
Then m over two is equal to 50.
97
00:08:40,524 --> 00:08:47,965
So I'd expect to get it roughly 50, and then
I'm going to use this notation, plus or minus,
98
00:08:47,965 --> 00:08:56,526
I'll explain what this is in a moment, plus
one-half times the square root of m.
99
00:08:57,497 --> 00:09:04,194
So actually what you would expect means
well it's roughly in this interval.
100
00:09:04,194 --> 00:09:08,646
I flip it 100 times, the square root of 100
is 10.
101
00:09:09,636 --> 00:09:16,225
I expect it to be roughly within five, might
be a few more, might be seven or eight more
102
00:09:16,225 --> 00:09:21,527
but I'd be really kind of surprised if there
were seventy heads and thirty tails.
103
00:09:21,527 --> 00:09:29,167
I would think it'd be more likely, you know
60 heads, 40 tails, but probably more like
104
00:09:29,167 --> 00:09:31,966
55 and 45.
105
00:09:32,488 --> 00:09:34,447
And that's actually what you can do.
106
00:09:35,967 --> 00:09:38,409
So let's actually ask why is this so.
107
00:09:47,336 --> 00:09:57,690
So if I look at all different possible sequences
108
00:10:00,527 --> 00:10:08,575
H H T T H H H T H H H T
109
00:10:08,575 --> 00:10:12,508
you may notice that the first ten of these
are pretty much what I got for when
110
00:10:12,508 --> 00:10:14,708
I was flipping the coin.
111
00:10:14,708 --> 00:10:17,975
Dot dot dot, which is a way of meaning
et cetera.
112
00:10:18,840 --> 00:10:23,329
Just keeps on going, and then we're going
to have n of these,
113
00:10:23,329 --> 00:10:40,239
and we're going to count the number of
possible sequences
114
00:10:40,239 --> 00:10:54,852
with exactly m_h heads and m_t tails.
115
00:10:54,852 --> 00:10:58,499
Of course, because it's got to be heads or
tails,
116
00:10:58,499 --> 00:11:02,812
at least unless it lands on its side, which
I don't think it's going to do,
117
00:11:02,812 --> 00:11:05,770
this has got to add up to m.
118
00:11:06,491 --> 00:11:12,033
So I'm going to count the number of possible
sequences with exactly m_h heads, m_t tails,
119
00:11:12,033 --> 00:11:14,412
the two have to add up to m.
120
00:11:14,412 --> 00:11:23,152
And what we're going to find out, well there's not so
many sequences which are heads heads heads...
121
00:11:23,445 --> 00:11:25,747
tails.
122
00:11:27,048 --> 00:11:31,815
So there's going to be a very small number
of sequences that have almost all heads and
123
00:11:31,855 --> 00:11:33,405
a few tails.
124
00:11:33,467 --> 00:11:37,297
There's similarly going to be very small
number of sequences that have almost all
125
00:11:37,596 --> 00:11:41,686
tails and a few heads, and there's going to
be humongous number of sequences that have
126
00:11:41,686 --> 00:11:46,297
roughly the same number of heads and of tails.
127
00:11:46,297 --> 00:11:49,638
So you can see, to relate this to information
theory,
128
00:11:49,638 --> 00:11:54,579
each sequence is like a sequence of zeros
and ones.
129
00:11:55,567 --> 00:11:58,891
You can call heads zero and tails one,
130
00:11:59,159 --> 00:12:01,999
this is just a long long long bit string.
131
00:12:02,430 --> 00:12:05,961
And so we can relate ideas of information,
132
00:12:05,961 --> 00:12:09,841
numbers of possible sequences with a
particular pattern,
133
00:12:09,841 --> 00:12:12,791
in this case a particular number of heads and
of tails
134
00:12:12,791 --> 00:12:14,681
to probability.