1
00:00:01,664 --> 00:00:05,874
So there's some subtleties or tricky points
that we're going to have to think through
2
00:00:05,874 --> 00:00:09,706
regarding this idea of randomness as incompressibility.
3
00:00:10,300 --> 00:00:13,778
So, here is the first one that isn't too serious an issue.
4
00:00:14,154 --> 00:00:19,289
And that is, we need to choose some standard way
of representing these compressions.
5
00:00:19,600 --> 00:00:22,235
So let's say that the task we give
6
00:00:22,547 --> 00:00:27,034
give ourselves is to compress a certain string,
a sequence of symbols,
7
00:00:27,629 --> 00:00:30,519
and we're going to write a computer program to do that
8
00:00:30,519 --> 00:00:33,501
and we want to write the shortest computer program possible.
9
00:00:34,131 --> 00:00:36,888
So, um, I like programming in Python
10
00:00:36,888 --> 00:00:39,959
so, um, I might try to write a Python program
11
00:00:39,959 --> 00:00:43,008
and somebody else might write a program in C
12
00:00:43,048 --> 00:00:46,040
and somebody else might do C++
13
00:00:46,040 --> 00:00:47,205
and somebody might do Lisp
14
00:00:47,205 --> 00:00:49,662
and who knows what else everybody would do.
15
00:00:49,846 --> 00:00:52,764
But we would all be working in different languages,
16
00:00:52,989 --> 00:00:57,661
and the algorithm would have different representations
in these different languages.
17
00:00:58,223 --> 00:01:04,182
So we need to agree on some sort of standard
or reference, um, device
18
00:01:04,785 --> 00:01:07,950
or language that we can use
when we're comparing these algorithms.
19
00:01:08,325 --> 00:01:10,232
And so the standard thing that's done here
20
00:01:10,232 --> 00:01:14,378
in this sort of formal, um, development of these ideas,
21
00:01:14,673 --> 00:01:17,016
is to use something called a universal Turing machine.
22
00:01:17,652 --> 00:01:20,182
So a universal Turing machine is
23
00:01:20,384 --> 00:01:23,759
a device that, um, can do anything
24
00:01:24,140 --> 00:01:25,736
can sort of perform any computation,
25
00:01:25,937 --> 00:01:27,794
any discrete time, discrete space computation,
26
00:01:29,224 --> 00:01:31,793
or discrete state computation, excuse me.
27
00:01:31,793 --> 00:01:36,038
Um, and is capable of emulating any other computer
28
00:01:36,038 --> 00:01:38,932
or device that can do that computation.
29
00:01:39,178 --> 00:01:41,808
So it's sort of an all-encompassing computing device
30
00:01:41,808 --> 00:01:43,505
and it's a standard reference device,
31
00:01:43,505 --> 00:01:46,763
a model of computation
that's as powerful as anything else.
32
00:01:46,763 --> 00:01:50,939
So in the formal development of these ideas
which is not part of this course,
33
00:01:50,939 --> 00:01:55,301
um, one works in a framework
where you're comparing things
34
00:01:55,301 --> 00:01:57,148
in terms of different universal Turing machines.
35
00:01:58,428 --> 00:02:01,433
So that's one subtlety that we have to address.
36
00:02:01,433 --> 00:02:03,518
And, OK, we've addressed that.
37
00:02:03,731 --> 00:02:05,423
We'll use universal Turing machines.
38
00:02:06,686 --> 00:02:09,349
The second subtlety is more subtle,
39
00:02:09,574 --> 00:02:10,980
and, uh, more serious.
40
00:02:12,112 --> 00:02:17,036
And that is, how do we know if we've found
41
00:02:17,814 --> 00:02:20,050
the shortest algorithm that can produce
42
00:02:20,050 --> 00:02:22,035
that can reproduce a sequence.
43
00:02:22,035 --> 00:02:24,783
Maybe a sequence really is compressible
44
00:02:25,128 --> 00:02:27,132
but I just haven't been able to figure it out.
45
00:02:27,989 --> 00:02:29,051
Or I'm not smart enough,
46
00:02:29,051 --> 00:02:31,150
or I haven't given myself enough time.
47
00:02:31,150 --> 00:02:32,744
Like that could be the case for pi,
48
00:02:32,744 --> 00:02:34,447
that certainly looked random.
49
00:02:34,447 --> 00:02:37,822
And by the way, in some view,
in some ways of viewing it,
50
00:02:37,822 --> 00:02:39,471
pi really is random.
51
00:02:39,471 --> 00:02:42,319
All the digits,
if you do a binary expansion,
52
00:02:42,319 --> 00:02:46,557
um, it would look, uh, the same as
a fair coin.
53
00:02:47,105 --> 00:02:52,868
All possible sequences of zeros and ones
occur equally often in the binary expansion of pi.
54
00:02:53,172 --> 00:02:53,860
So that's weird.
55
00:02:53,860 --> 00:02:55,326
So in a certain sense,
pi is random.
56
00:02:55,933 --> 00:02:57,014
But in this algorithmic sense,
57
00:02:57,014 --> 00:02:59,440
or if we're using universal Turing machines,
58
00:02:59,440 --> 00:03:00,927
that type of algorithm makes sense.
59
00:03:01,273 --> 00:03:04,031
Pi isn't random,
because there's a short algorithm to produce it.
60
00:03:06,215 --> 00:03:11,165
OK, so, um, maybe every possible sequence
61
00:03:11,165 --> 00:03:13,685
actually has some almost impossible to figure out
62
00:03:13,685 --> 00:03:16,254
short algorithm that can reproduce it,
63
00:03:16,254 --> 00:03:18,234
that we just haven't been smart enough to figure out yet.
64
00:03:18,863 --> 00:03:21,994
So it turns out we can eliminate that possibility.
65
00:03:22,878 --> 00:03:25,912
So let me present that argument as follows.
66
00:03:28,369 --> 00:03:31,573
So we're wondering if we can figure out compression algorithms,
67
00:03:31,573 --> 00:03:37,414
ways to compress sequences that it seems not obvious
how you might be able to compress them.
68
00:03:37,710 --> 00:03:39,361
So one idea is,
69
00:03:39,361 --> 00:03:43,306
well, maybe we could come up with an algorithm
that determines the shortest algorithm.
70
00:03:43,306 --> 00:03:46,957
An algorithm for compression
that's optimal all the time.
71
00:03:46,957 --> 00:03:50,436
So this would be an algorithm
that writes an algorithm.
72
00:03:50,681 --> 00:03:52,995
Or it's like a program that writes a program.
73
00:03:52,995 --> 00:03:56,473
And moreover, this program would automatically
write the best program
74
00:03:56,473 --> 00:03:58,745
in the sense that it was the shortest program.
75
00:03:58,745 --> 00:04:02,491
The one that could compress our symbol sequence
as much as possible.
76
00:04:04,069 --> 00:04:06,069
So, this would be an amazing thing, but
77
00:04:06,069 --> 00:04:09,565
one can show that such a program
doesn't exist.
78
00:04:09,565 --> 00:04:12,522
You can't write a program that automatically
writes the shortest program.
79
00:04:13,991 --> 00:04:17,732
So, um, so that idea, so we can't
80
00:04:17,732 --> 00:04:19,539
we can't sort of automatically determine
81
00:04:19,539 --> 00:04:21,540
if there is a shortest algorithm.
82
00:04:21,540 --> 00:04:25,350
But maybe, um, there is a shortest algorithm
out htere,
83
00:04:25,350 --> 00:04:26,605
even if we can't find it.
84
00:04:27,151 --> 00:04:30,039
And it turns out we can eliminate that possibility as well.
85
00:04:30,732 --> 00:04:32,678
And here is how one would argue that.
86
00:04:32,678 --> 00:04:36,811
So first let's think about the set, or collection,
of all possible algorithms.
87
00:04:37,339 --> 00:04:42,300
All possible schemes for compressing
a sequence of zeros and ones.
88
00:04:43,183 --> 00:04:44,554
So that would be an infinite set.
89
00:04:44,554 --> 00:04:46,719
There's an infinite number of possible algorithms
90
00:04:46,719 --> 00:04:48,685
and we could let our program get infinitely long.
91
00:04:48,685 --> 00:04:53,342
So there we have this infinite set,
this collection of all of these algorithms.
92
00:04:54,627 --> 00:04:59,878
We also might think about the collection
of all possible sequences that we want to compress.
93
00:05:00,428 --> 00:05:02,417
This also is an infinite set.
94
00:05:02,417 --> 00:05:04,579
We're thinking about infinite sequences here,
95
00:05:04,579 --> 00:05:07,773
all possible infinite sequences of zeros and ones.
96
00:05:07,773 --> 00:05:08,374
That's a lot.
97
00:05:08,374 --> 00:05:09,908
That's clearly infinite as well.
98
00:05:09,908 --> 00:05:11,938
So we've got these two infinite sets:
99
00:05:11,938 --> 00:05:13,032
all the algorithms,
100
00:05:13,887 --> 00:05:15,823
and all of the sequences.
101
00:05:17,024 --> 00:05:19,683
And if these two sets are the same size,
102
00:05:20,091 --> 00:05:22,482
then it might be that every algorithm,
103
00:05:22,482 --> 00:05:25,032
excuse me, that every sequence
104
00:05:25,032 --> 00:05:26,626
has an algorithm that goes for it.
105
00:05:26,626 --> 00:05:30,865
So maybe we can't find what algorithm
goes with what sequence,
106
00:05:31,708 --> 00:05:35,327
but, um, we know that,
that algorithm might be out there.
107
00:05:36,986 --> 00:05:42,582
However, it's not the case that there are
the same number of algorithms
108
00:05:42,582 --> 00:05:45,216
as there are sequences.
109
00:05:45,216 --> 00:05:49,067
In fact, there're infinitely many more sequences
than algorithms.
110
00:05:49,393 --> 00:05:51,052
Thus, there are
111
00:05:51,052 --> 00:05:56,480
there are for sure some sequences for which
there's no algorithm that can compress it.
112
00:05:56,817 --> 00:05:58,697
And we'll call those sequences random.
113
00:05:58,697 --> 00:06:01,062
So randomness exists in this argument.
114
00:06:02,644 --> 00:06:07,175
Furthermore, there's so many more, um,
115
00:06:08,642 --> 00:06:10,907
sequences than there are algorithms,
116
00:06:10,907 --> 00:06:13,115
that if I choose a sequence, um,
117
00:06:13,861 --> 00:06:14,991
I don't want to say at random.
118
00:06:14,991 --> 00:06:16,466
If I choose an arbitrary sequence,
119
00:06:16,466 --> 00:06:19,733
I just dip my hand into this infinite bag of sequences,
120
00:06:20,463 --> 00:06:24,254
um, I'm almost certain in the sense that
with probability one
121
00:06:24,554 --> 00:06:25,848
that sequence will be random.
122
00:06:25,848 --> 00:06:26,977
It will be incompressible.
123
00:06:26,977 --> 00:06:28,450
So the technical idea
124
00:06:28,450 --> 00:06:31,353
maybe, um, some of you are familiar with this,
125
00:06:31,353 --> 00:06:34,889
is that the set of all algorithms
is a countable infinity
126
00:06:34,889 --> 00:06:38,891
and the set of all, um, infinite sequences
of zeros and ones
127
00:06:38,891 --> 00:06:40,431
is uncountable.
128
00:06:40,683 --> 00:06:44,279
So they're very different infinite, uh, infinite sets.
129
00:06:44,412 --> 00:06:48,086
So again, if I choose a symbol sequence
at random,
130
00:06:48,823 --> 00:06:50,254
an arbitrary symbol sequence,
131
00:06:51,445 --> 00:06:53,400
I can be certain with probability one
132
00:06:54,406 --> 00:06:57,461
that, um, the
133
00:06:57,461 --> 00:06:59,094
that sequence is random.
134
00:07:00,314 --> 00:07:02,509
OK, now let's go back to the logistic equation finally
135
00:07:02,509 --> 00:07:04,137
go back to dynamical systems.
136
00:07:04,827 --> 00:07:06,940
So if I choose an initial condition,
137
00:07:06,940 --> 00:07:08,137
a random initial condition,
138
00:07:08,137 --> 00:07:09,876
an arbitrary initial condition,
139
00:07:10,481 --> 00:07:11,915
for the logistic equation,
140
00:07:12,844 --> 00:07:15,225
that will produce a sequence of zeros and ones.
141
00:07:15,225 --> 00:07:19,086
And I know, we saw earlier,
or I argued earlier,
142
00:07:19,086 --> 00:07:23,353
that the logistic equation can
produce every possible sequence of zeros and ones,
143
00:07:23,353 --> 00:07:24,713
and all subsequences,
144
00:07:24,713 --> 00:07:27,377
and they occur with equal probability.
145
00:07:27,377 --> 00:07:28,896
So if I choose a,
146
00:07:28,896 --> 00:07:31,309
a, um, an initial condition,
147
00:07:31,309 --> 00:07:35,096
arbitrary initial condition for the logistic equation,
with R = 4,
148
00:07:36,069 --> 00:07:39,128
I can be almost certain that
the outcome will be random.
149
00:07:41,249 --> 00:07:43,519
Um, so, in this view,
150
00:07:44,742 --> 00:07:46,745
this way of thinking about randomness,
151
00:07:46,745 --> 00:07:50,819
the process producing the symbols
is definitely deterministic.
152
00:07:51,583 --> 00:07:53,986
It's an unchanging deterministic rule
153
00:07:53,986 --> 00:07:55,813
but the outcome is random.
154
00:07:55,813 --> 00:08:00,157
That with probability one
I'm absolutely certain
155
00:08:00,157 --> 00:08:01,893
that there's no algorithm that can
156
00:08:01,893 --> 00:08:07,608
that can compress the sequence that
that logistic equation produced.
157
00:08:07,829 --> 00:08:10,934
So the logistic equation is producing randomness.
158
00:08:13,590 --> 00:08:14,600
Or is it?
159
00:08:14,600 --> 00:08:15,946
Wait a minuite.
160
00:08:16,902 --> 00:08:22,113
The logistic equation is itself an algorithm
that's producing this sequence.
161
00:08:22,113 --> 00:08:26,822
So how can I say that the logistic equation
is producing random outcome
162
00:08:26,822 --> 00:08:27,858
because there's a
163
00:08:27,858 --> 00:08:28,766
a random outcome
164
00:08:28,766 --> 00:08:31,897
because there's no algorithm to, um,
165
00:08:33,007 --> 00:08:34,756
to compress that sequence.
166
00:08:34,756 --> 00:08:38,201
Isn't the logistic equation,
iterating logistic equation itself,
167
00:08:38,799 --> 00:08:40,337
just such an algorithm?
168
00:08:41,439 --> 00:08:43,949
So, um, this is a good objection,
169
00:08:44,828 --> 00:08:48,761
but, um, we have to think carefully about
the logistic equation in chaos,
170
00:08:48,761 --> 00:08:51,556
and in particular sensitive dependence
on initial conditions.
171
00:08:52,728 --> 00:08:55,566
So suppose you hand me a
172
00:08:55,566 --> 00:08:57,514
a random sequence,
173
00:08:57,514 --> 00:08:58,834
no obvious pattern to it,
174
00:08:58,834 --> 00:09:00,562
and you say, OK,
175
00:09:00,562 --> 00:09:06,098
my task is to figure out if there is a way
176
00:09:06,098 --> 00:09:11,282
to, um, use the logistic equation to generate
this sequence of zeros and ones.
177
00:09:11,282 --> 00:09:12,572
And at first I would say,
178
00:09:12,572 --> 00:09:14,561
Sure, of course, there has to be.
179
00:09:14,561 --> 00:09:18,289
The logistic equation produces all possible
sequences of zeros and ones.
180
00:09:18,422 --> 00:09:20,298
So this sequence that you just gave me,
181
00:09:20,658 --> 00:09:22,647
no problem, I can figure it out.
182
00:09:22,647 --> 00:09:24,100
Or even if I can't figure it out,
183
00:09:24,100 --> 00:09:27,046
maybe it's, there're too many different things to try,
184
00:09:27,046 --> 00:09:28,442
initial conditions to try.
185
00:09:28,442 --> 00:09:31,237
I know that an answer is out there,
186
00:09:31,570 --> 00:09:33,375
and then there would be a short algorithm.
187
00:09:33,375 --> 00:09:36,429
So I just compressed the sequence that
you thought was incompressible.
188
00:09:36,843 --> 00:09:38,045
But wait a minute.
189
00:09:38,045 --> 00:09:44,455
The key is, that in order to produce
that particular symbol sequence,
190
00:09:45,211 --> 00:09:49,514
I need to specify the initial condition exactly.
191
00:09:51,059 --> 00:09:54,313
So with full precision,
I need to specify the initial condition.
192
00:09:56,449 --> 00:09:58,453
So I have to,
in my algorithm,
193
00:09:59,197 --> 00:10:02,401
the, sort of mechanics of it,
are very simple.
194
00:10:02,401 --> 00:10:04,783
Iterate the logistic equation with R = 4.0
195
00:10:04,783 --> 00:10:06,308
that's a very simple program.
196
00:10:06,308 --> 00:10:08,731
Many of you have written
your own versions of those programs.
197
00:10:08,731 --> 00:10:09,864
Spreadsheets can do it, too.
198
00:10:10,554 --> 00:10:13,115
But then, in order to make that particular outcome,
199
00:10:13,115 --> 00:10:14,161
the one that you gave me,
200
00:10:14,807 --> 00:10:16,309
so I can, say I've compressed it,
201
00:10:16,309 --> 00:10:18,191
I also need the initial condition.
202
00:10:18,979 --> 00:10:20,593
And almost all initial conditions
203
00:10:20,593 --> 00:10:23,955
are going to be, um, irrational numbers.
204
00:10:24,463 --> 00:10:26,116
Numbers that continue on forever.
205
00:10:26,687 --> 00:10:31,443
And so, um, in order for this algorithm to work,
206
00:10:31,466 --> 00:10:33,879
to reproduce this symbol sequence,
207
00:10:33,879 --> 00:10:35,968
I need to specify the initial condition,
208
00:10:36,220 --> 00:10:38,707
and I need to specify the initial condition exactly.
209
00:10:38,934 --> 00:10:40,776
And it turns out,
by a similar argument
210
00:10:40,776 --> 00:10:42,888
that almost all numbers between zero and one
211
00:10:42,888 --> 00:10:44,596
are themselves random.
212
00:10:44,596 --> 00:10:46,434
It's a random sequence of, of uh
213
00:10:46,434 --> 00:10:49,247
decimal digits following a decimal point.
214
00:10:50,193 --> 00:10:52,037
And so I need to know that number exactly,
215
00:10:52,037 --> 00:10:53,353
that initial condition exactly,
216
00:10:53,353 --> 00:10:55,334
to an infinite number of decimal points,
217
00:10:55,540 --> 00:10:59,356
and the vast majority in the sense of all
but a set of measure zero
218
00:10:59,356 --> 00:11:00,256
with probability one
219
00:11:00,884 --> 00:11:02,770
initial conditions are incompressible.
220
00:11:03,175 --> 00:11:06,370
So, aha, I've sort of found
a short mechanism for generating it
221
00:11:06,768 --> 00:11:12,006
but I need, um, to remember an incompressible
infinite sequence of decimal digits
222
00:11:12,006 --> 00:11:13,622
in order to run that algorithm
223
00:11:13,622 --> 00:11:17,482
and reproduce exactly the, um, binary sequence
that's desired.
224
00:11:17,996 --> 00:11:23,644
So yes, the logistic equation really is producing
a random output.
225
00:11:26,568 --> 00:11:27,956
So to summarize:
226
00:11:27,956 --> 00:11:31,459
the logistic equation,
a deterministic dynamical system
227
00:11:31,459 --> 00:11:34,156
produces random output.
228
00:11:34,251 --> 00:11:36,510
The sequence of zeros and ones,
the symbols from
229
00:11:36,510 --> 00:11:39,370
the symbolic dynamics of
the logistic equation
230
00:11:39,805 --> 00:11:41,214
are incompressible.
231
00:11:41,897 --> 00:11:43,819
It's a rand--
and that's what it means to be random.
232
00:11:43,819 --> 00:11:44,853
It's incompressible.
233
00:11:44,853 --> 00:11:46,082
And you might think,
234
00:11:46,082 --> 00:11:47,783
that the logistic equation itself
235
00:11:47,783 --> 00:11:51,374
is a compressed description of the sequence
it just generated.
236
00:11:51,374 --> 00:11:54,635
However, in order to reproduce that sequence,
237
00:11:54,635 --> 00:11:58,190
we need to specify the initial condition
to an infinite precision.
238
00:11:58,276 --> 00:12:01,383
So that requires remembering
an infinitely long number.
239
00:12:01,808 --> 00:12:03,844
So you, it's a short program,
240
00:12:03,844 --> 00:12:05,924
but you need to give it an infinitely long number
241
00:12:05,924 --> 00:12:10,118
that infinitely, uh, precise description,
of the initial condition.
242
00:12:10,400 --> 00:12:17,006
So the logistic equation actually is not
a compressed version of the sequence that it produces.
243
00:12:17,929 --> 00:12:20,629
So I hope this all hasn't been too confusing.
244
00:12:20,672 --> 00:12:23,213
But, to be honest, I hope
it's at least a little bit confusing.
245
00:12:23,213 --> 00:12:24,328
But, in a good way,
246
00:12:24,328 --> 00:12:28,615
in that I think what these examples do is, um
247
00:12:28,615 --> 00:12:32,682
enrich and complexify our notion
of randomness.
248
00:12:33,196 --> 00:12:35,112
So, a chaotic dynamical system,
249
00:12:35,296 --> 00:12:39,523
like the Lorenz equations,
or like the logistic equation with R = 4,
250
00:12:40,210 --> 00:12:43,764
it's deterministic,
but it's unpredictable because of the butterfly effect.
251
00:12:44,060 --> 00:12:47,330
And it's deterministic,
but it actually produces randomness
252
00:12:47,395 --> 00:12:49,834
in this algorithmic sense of randomness.
253
00:12:50,034 --> 00:12:53,365
So randomness doesn't have to be
a product of chance.
254
00:12:53,365 --> 00:12:56,223
Randomness can be a product
of determinism.
255
00:12:56,223 --> 00:13:02,326
And that, I think, is one of the big lessons
of, uh, the study of chaos and dynamical systems.
256
00:13:02,559 --> 00:13:05,559
And I'll talk a little bit more about that
in the summary for this unit,
257
00:13:05,559 --> 00:13:10,307
and these are themes that we'll continue
to revisit throughout the course.