So there's some subtleties or tricky points
that we're going to have to think through
regarding this idea of randomness as incompressibility.
So, here is the first one that isn't too serious an issue.
And that is, we need to choose some standard way
of representing these compressions.
So let's say that the task we give
give ourselves is to compress a certain string,
a sequence of symbols,
and we're going to write a computer program to do that
and we want to write the shortest computer program possible.
So, um, I like programming in Python
so, um, I might try to write a Python program
and somebody else might write a program in C
and somebody else might do C++
and somebody might do Lisp
and who knows what else everybody would do.
But we would all be working in different languages,
and the algorithm would have different representations
in these different languages.
So we need to agree on some sort of standard
or reference, um, device
or language that we can use
when we're comparing these algorithms.
And so the standard thing that's done here
in this sort of formal, um, development of these ideas,
is to use something called a universal Turing machine.
So a universal Turing machine is
a device that, um, can do anything
can sort of perform any computation,
any discrete time, discrete space computation,
or discrete state computation, excuse me.
Um, and is capable of emulating any other computer
or device that can do that computation.
So it's sort of an all-encompassing computing device
and it's a standard reference device,
a model of computation
that's as powerful as anything else.
So in the formal development of these ideas
which is not part of this course,
um, one works in a framework
where you're comparing things
in terms of different universal Turing machines.
So that's one subtlety that we have to address.
And, OK, we've addressed that.
We'll use universal Turing machines.
The second subtlety is more subtle,
and, uh, more serious.
And that is, how do we know if we've found
the shortest algorithm that can produce
that can reproduce a sequence.
Maybe a sequence really is compressible
but I just haven't been able to figure it out.
Or I'm not smart enough,
or I haven't given myself enough time.
Like that could be the case for pi,
that certainly looked random.
And by the way, in some view,
in some ways of viewing it,
pi really is random.
All the digits,
if you do a binary expansion,
um, it would look, uh, the same as
a fair coin.
All possible sequences of zeros and ones
occur equally often in the binary expansion of pi.
So that's weird.
So in a certain sense,
pi is random.
But in this algorithmic sense,
or if we're using universal Turing machines,
that type of algorithm makes sense.
Pi isn't random,
because there's a short algorithm to produce it.
OK, so, um, maybe every possible sequence
actually has some almost impossible to figure out
short algorithm that can reproduce it,
that we just haven't been smart enough to figure out yet.
So it turns out we can eliminate that possibility.
So let me present that argument as follows.
So we're wondering if we can figure out compression algorithms,
ways to compress sequences that it seems not obvious
how you might be able to compress them.
So one idea is,
well, maybe we could come up with an algorithm
that determines the shortest algorithm.
An algorithm for compression
that's optimal all the time.
So this would be an algorithm
that writes an algorithm.
Or it's like a program that writes a program.
And moreover, this program would automatically
write the best program
in the sense that it was the shortest program.
The one that could compress our symbol sequence
as much as possible.
So, this would be an amazing thing, but
one can show that such a program
doesn't exist.
You can't write a program that automatically
writes the shortest program.
So, um, so that idea, so we can't
we can't sort of automatically determine
if there is a shortest algorithm.
But maybe, um, there is a shortest algorithm
out htere,
even if we can't find it.
And it turns out we can eliminate that possibility as well.
And here is how one would argue that.
So first let's think about the set, or collection,
of all possible algorithms.
All possible schemes for compressing
a sequence of zeros and ones.
So that would be an infinite set.
There's an infinite number of possible algorithms
and we could let our program get infinitely long.
So there we have this infinite set,
this collection of all of these algorithms.
We also might think about the collection
of all possible sequences that we want to compress.
This also is an infinite set.
We're thinking about infinite sequences here,
all possible infinite sequences of zeros and ones.
That's a lot.
That's clearly infinite as well.
So we've got these two infinite sets:
all the algorithms,
and all of the sequences.
And if these two sets are the same size,
then it might be that every algorithm,
excuse me, that every sequence
has an algorithm that goes for it.
So maybe we can't find what algorithm
goes with what sequence,
but, um, we know that,
that algorithm might be out there.
However, it's not the case that there are
the same number of algorithms
as there are sequences.
In fact, there're infinitely many more sequences
than algorithms.
Thus, there are
there are for sure some sequences for which
there's no algorithm that can compress it.
And we'll call those sequences random.
So randomness exists in this argument.
Furthermore, there's so many more, um,
sequences than there are algorithms,
that if I choose a sequence, um,
I don't want to say at random.
If I choose an arbitrary sequence,
I just dip my hand into this infinite bag of sequences,
um, I'm almost certain in the sense that
with probability one
that sequence will be random.
It will be incompressible.
So the technical idea
maybe, um, some of you are familiar with this,
is that the set of all algorithms
is a countable infinity
and the set of all, um, infinite sequences
of zeros and ones
is uncountable.
So they're very different infinite, uh, infinite sets.
So again, if I choose a symbol sequence
at random,
an arbitrary symbol sequence,
I can be certain with probability one
that, um, the
that sequence is random.
OK, now let's go back to the logistic equation finally
go back to dynamical systems.
So if I choose an initial condition,
a random initial condition,
an arbitrary initial condition,
for the logistic equation,
that will produce a sequence of zeros and ones.
And I know, we saw earlier,
or I argued earlier,
that the logistic equation can
produce every possible sequence of zeros and ones,
and all subsequences,
and they occur with equal probability.
So if I choose a,
a, um, an initial condition,
arbitrary initial condition for the logistic equation,
with R = 4,
I can be almost certain that
the outcome will be random.
Um, so, in this view,
this way of thinking about randomness,
the process producing the symbols
is definitely deterministic.
It's an unchanging deterministic rule
but the outcome is random.
That with probability one
I'm absolutely certain
that there's no algorithm that can
that can compress the sequence that
that logistic equation produced.
So the logistic equation is producing randomness.
Or is it?
Wait a minuite.
The logistic equation is itself an algorithm
that's producing this sequence.
So how can I say that the logistic equation
is producing random outcome
because there's a
a random outcome
because there's no algorithm to, um,
to compress that sequence.
Isn't the logistic equation,
iterating logistic equation itself,
just such an algorithm?
So, um, this is a good objection,
but, um, we have to think carefully about
the logistic equation in chaos,
and in particular sensitive dependence
on initial conditions.
So suppose you hand me a
a random sequence,
no obvious pattern to it,
and you say, OK,
my task is to figure out if there is a way
to, um, use the logistic equation to generate
this sequence of zeros and ones.
And at first I would say,
Sure, of course, there has to be.
The logistic equation produces all possible
sequences of zeros and ones.
So this sequence that you just gave me,
no problem, I can figure it out.
Or even if I can't figure it out,
maybe it's, there're too many different things to try,
initial conditions to try.
I know that an answer is out there,
and then there would be a short algorithm.
So I just compressed the sequence that
you thought was incompressible.
But wait a minute.
The key is, that in order to produce
that particular symbol sequence,
I need to specify the initial condition exactly.
So with full precision,
I need to specify the initial condition.
So I have to,
in my algorithm,
the, sort of mechanics of it,
are very simple.
Iterate the logistic equation with R = 4.0
that's a very simple program.
Many of you have written
your own versions of those programs.
Spreadsheets can do it, too.
But then, in order to make that particular outcome,
the one that you gave me,
so I can, say I've compressed it,
I also need the initial condition.
And almost all initial conditions
are going to be, um, irrational numbers.
Numbers that continue on forever.
And so, um, in order for this algorithm to work,
to reproduce this symbol sequence,
I need to specify the initial condition,
and I need to specify the initial condition exactly.
And it turns out,
by a similar argument
that almost all numbers between zero and one
are themselves random.
It's a random sequence of, of uh
decimal digits following a decimal point.
And so I need to know that number exactly,
that initial condition exactly,
to an infinite number of decimal points,
and the vast majority in the sense of all
but a set of measure zero
with probability one
initial conditions are incompressible.
So, aha, I've sort of found
a short mechanism for generating it
but I need, um, to remember an incompressible
infinite sequence of decimal digits
in order to run that algorithm
and reproduce exactly the, um, binary sequence
that's desired.
So yes, the logistic equation really is producing
a random output.
So to summarize:
the logistic equation,
a deterministic dynamical system
produces random output.
The sequence of zeros and ones,
the symbols from
the symbolic dynamics of
the logistic equation
are incompressible.
It's a rand--
and that's what it means to be random.
It's incompressible.
And you might think,
that the logistic equation itself
is a compressed description of the sequence
it just generated.
However, in order to reproduce that sequence,
we need to specify the initial condition
to an infinite precision.
So that requires remembering
an infinitely long number.
So you, it's a short program,
but you need to give it an infinitely long number
that infinitely, uh, precise description,
of the initial condition.
So the logistic equation actually is not
a compressed version of the sequence that it produces.
So I hope this all hasn't been too confusing.
But, to be honest, I hope
it's at least a little bit confusing.
But, in a good way,
in that I think what these examples do is, um
enrich and complexify our notion
of randomness.
So, a chaotic dynamical system,
like the Lorenz equations,
or like the logistic equation with R = 4,
it's deterministic,
but it's unpredictable because of the butterfly effect.
And it's deterministic,
but it actually produces randomness
in this algorithmic sense of randomness.
So randomness doesn't have to be
a product of chance.
Randomness can be a product
of determinism.
And that, I think, is one of the big lessons
of, uh, the study of chaos and dynamical systems.
And I'll talk a little bit more about that
in the summary for this unit,
and these are themes that we'll continue
to revisit throughout the course.