My goal in the first 9 units of this course was to
lay the groundwork - to give you a basic understanding
of nonlinear dynamics with an emphasis on the
computational angle of this field. Important, as you
now know, because chaotic equations can't be solved
in closed form. I've emphasized practice over theory,
demonstrating concepts with examples, rather than
proofs and doing a lot of ranting about what happens
to the theory under the constraints of real-world
applications where there is noise, and you can't sample
everything, and you can't sample as fast as you want,
and so on and so forth. In this last unit of the course,
I'm going to take advantage of that groundwork and dig
into a couple of examples a bit more deeply.
Now, because nonlinear dynamics is everywhere,
choosing a set of examples for a unit like this is
a real challenge. There's no way to be exhaustive,
so I've cherry-picked 4 examples that are a nice
combination of: interesting, important, and fun.
I could spend another 10 weeks on applications and
not cover the field exhaustively, but you now have
the background to go digging around on your own
in order to find and understand all the other cool
applications of this stuff.
Today's topic is prediction. Given a time series,
can you say what that time series is going to do next?
Of course, if we could do this, we'd all be rich.
The stock market, for example, is a time-series.
So is the currency exchange, and we could make
a lot of money if we could predict them effectively.
We'll come back to that. But, I want to start with a
different prediction task, the one that really got
this field going. This was the work of the Chaos Cabal
at the University of California at Santa Cruz,
which included a lot of names that you know:
Jim Crutchfield who was the author of the seminal
work on delay-coordinate embedding, Doyne Farmer
whose noise reduction scheme we talked about
(the one with the stable/nonstable manifolds
squeezing the noise balls. What they wanted to do:
predict where a ball on a roulette wheel would go -
that is, the black box in the previous picture is a
roulette wheel. If you could do this effectively,
you'd be rich if you didn't get tossed out of the casino
first, of course. They don't like people who have systems
that actually work. This slide is a sketch of one of the
strategies they tried. They gathered data, then they used
delay-coordinate embedding to reconstruct the full
dynamics from that data (that's that purple arrow down
there and the picture in the bottom-middle).
Then what they did was model that trajectory - that is,
they took little patches of that trajectory and examined
which direction the flow was in that little patch.
The simplest way to do this would be to fit a line to
the flow in each of those boxes and that's what they
started with. Then, in order to use that model, you drop
a new point some place in the midst of all your patches
and whichever patch you're in, you look for the arrow for
that patch and you move your point along that arrow
until it exits that patch. Then you look for the next patch,
and its arrow, and follow that arrow, and so on and
so forth, around the dynamics. And that gives you a
prediction of where the ball will be. Then you can use
that information to place a bet, and maybe win some
money. All of this is chronicled in this wonderful book
that I have listed at the bottom right of the slide,
and that book talks about the people, the technology,
and the challenges, and all that kind of stuff, including
chases by casino goons, short-circuits of battery
packs wrapped around people's bellies, and some
pretty seriously challenging engineering. This was
in the late 70s/early 80s when computer equipment
was a lot bigger and obviously, they had to hide that
equipment, and that was a huge part of the challenge.
You couldn't actually waltz in with your data acquisition
system and say "excuse me, I want to hook this up to
your roulette wheel". That wouldn't work. You had to
be surreptitious. In terms of data acquisition,
the challenge was to gather the data without being
caught and the solution that they used was a
pin switch in the sole of a shoe under the big toe.
The data gathering was a person standing by the wheel,
tapping his toe on that switch every time the ball
went by a particular place on the wheel - say, where
the dealer was standing (where that blue line is).
So, really this picture is not quite right. Instead of
evenly-sampled data about the angle of the ball on the
wheel, what we have is a series of spikes - times when
the ball when by a particular place. And as you recall,
Tim's shower showed that it's mathematically okay
to embed data like that - that was that interspike
interval embedding stuff that we talked about a week
or two ago. I wanted to circle back around to another
important part of this picture - the model. That green
arrow in the upper-right patch is a pretty simple way
to approximate the flow of the dynamics. You could
do many other things - for example, you could fit a
parabola or some other more complex function to the
observed trajectory paths in that patch.
You could also make the patches smaller to get a
finer-grained model, but then you run the risk of
not having enough data to fit the model.
It's a good idea to overlap the patches. So there's
a bunch of issues that are covered in that book if
you are interested. The field of machine learning
offers a ton of interesting and useful solutions to these
kinds of problems. But you can't simply grab one off the
shelf and apply it to a non-linear dynamical system and
have that work. We'll come back to that.
Another important point that I wanted to circle around
back to: the first step in the procedure followed by the
Santa Cruz Chaos Cabal was to embed the data.
That is, they didn't try to predict the time series
directly using techniques like autoregressive
moving average or something like that - the traditional
ways. You can think of embedding as a pre-processing
step that takes the temporal patterns and brings them
out spatially on those axes of the reconstruction space.
And in that manner, embedding gives prediction
methods more traction on the problem of capturing
and exploiting temporal patterns. Much more so
than if you just fit a function to the scalar time series
data. But all of this only works if the system that you're
working with is a deterministic, dynamical system
with fairly low dimension, as you know because of all
the noise and dimension caveats that we've talked about
the in the previous 2 units. Anyway, the roulette project
worked well enough to make some money - not a large
amount, but some money - for its creators.
It also catalyzed a law in the state of Nevada against the
use of computers in casinos. Most importantly though,
from my standpoint, it catalyzed a huge amount of
mathematics and a number of algorithms too.
Papers are still being written about this. After this work,
Doyne Farmer, Norman Packard and others moved
on to form something called the prediction Company.
That endeavor is chronicled in another fun book
if you're interested. It's called The Predictors.
The Prediction Company did well enough to get
bought by a big Swiss bank and I haven't heard much
about what they're doing since then. A few years later,
Andreas Weigend and Neil Gershenfeld decided to
run a time series prediction competition and they
took a bunch of data sets, put them up on an
FTP server - actually what they did was put the first
chunk of each of them on the FTP server and reserving
the last chunk for evaluation of all the predictions
that everybody sent in. Then they got the word out
and invited everybody to send in their predictions.
This too is chronicled in a good book, far more
technical than the last ones that I mentioned.
Here's the data, again - what were they? The top one
was a laser. Look back at it, that's the one that
bursts and then collapses. Then there's a sleep apnea
data set, the second one down.
The third one is currencies exchange rates - you can
see the day of the week and weekend pattern here -
there is no trading on the weekend.
The fourth one is fourth-order Runge Kutta on some.
ODE. The fifth one is some astro-physical data set.
And the last one is an interesting one. Both Neil and
Andreas are musicians and they were intrigued to
find out whether one could predict the future course
of a Bach fugue. So that's what that bottom data set is.
Here's an entry that they featured in the introduction
to their book as one of the best. It was, again,
Tim Sauer - same guy who did the inter-spike interval
embedding - and what you're seeing here is the
true continuation - that is, where the series should've
gone, which they know because they held back the
last little chunk. C is where Tim Sauer's method
predicted it would go and this was a strategy like the
one I just told you about where he embedded the data
and then did some local patch models - fair, but
more sophisticated than the roulette stuff and you can
look at his paper in that book if you're interested.
Here's another entry. This is a neural net - standard
machine learning method. And if you compare these
two pictures, it looks like the neural net did better -
and indeed it did for a while.
This is looking further out than the previous
two pictures - that is, the previous two pictures are
out to 110 and this one goes out to 1600.
And you can see that the dynamic space method
on the top captures the kind of bursty nature and
collapse of the laser, whereas the neural net
did really really well for a while and then completely
lost it. So one moral here: traditional machine
learning methods may not be the right thing when
you're working with deterministic, nonlinear
dynamical systems.
If you have the full state space trajectory, you actually
don't need very high-powered methods in order to get
traction on predictions tasks. Even a simple nearest
neighbor prediction works surprisingly well if you
have the full trajectory. So what do I mean by nearest
neighbor prediction? Imagine I have a historical
record of the weather and I have say: the pressure,
the temperature, the relative humidity at every
day for the past 30 years. I want to know what it's
going to do tomorrow. I look for the temperature,
pressure, and relative humidity for today, I look back
through the historical record for the day that's the best
match to that - that's the nearest neighbor - and then
whatever it did the day after that, I say it's going to do
tomorrow. That's called nearest neighbor prediction.
It was Edward Lorenz, the same Lorenz from the
Lorenz equations, who had the idea to apply that
to dynamical systems. And it's called Lorenz's
method of analogues. Say we have the blue
trajectory - that's the historical record of the weather -
and we want to know where this little green point will
go. Right in there, there's a green point.
We look for that green point's nearest neighbor, here's
the green point and here's the nearest neighbor in the
observed trajectory, and then we look where that point
went next - that is, it's forward image - and that's our
prediction. This works remarkably well, but it is not
very robust to noise. Remember that nearest neighbor
relationships are very sensitive to noise, so the point
that looks like the nearest neighbor could really not
be the nearest neighbor and therefore could go
somewhere else. Okay, here's a useful
modification that I bet many of you have thought of
already. Look for the k-nearest neighbors, look where
they go - that is, their forward images - and average
those forward images to get your prediction.
This also works remarkably well. Here's one example.
As in the Santa Fe Institute competition, we built the
model from the first chunk of the data - the first
90% here, used the model to predict the next 10%,
and then compared that against the 10% we held back;
the true continuation. And these results are pretty good.
If you're reading all the words on this slide, by the way,
you'll see that what we were predicting is actually
the cache miss-rate of a computer, an Intel
Core 2 base machine, as it runs a particular program.
That program repeatedly initializes a matrix in
column-major order. Now, look closely at that
trace at the top of this slide. It's almost periodic,
but not completely. That, at this point, should
pick your ears up. An almost-pattern that doesn't
quite repeat. That should smell like chaos.
So, we use computers to study dynamical systems,
but computers themselves are dynamical systems.
And their dynamics are often chaotic. This is hard
to wrap your head around. Feel free to post your
thoughts about this on the forum and start
a discussion.