My goal in the first 9 units of this course was to

lay the groundwork - to give you a basic understanding

of nonlinear dynamics with an emphasis on the

computational angle of this field. Important, as you

now know, because chaotic equations can't be solved

in closed form. I've emphasized practice over theory,

demonstrating concepts with examples, rather than

proofs and doing a lot of ranting about what happens

to the theory under the constraints of real-world

applications where there is noise, and you can't sample

everything, and you can't sample as fast as you want,

and so on and so forth. In this last unit of the course,

I'm going to take advantage of that groundwork and dig

into a couple of examples a bit more deeply.

Now, because nonlinear dynamics is everywhere,

choosing a set of examples for a unit like this is

a real challenge. There's no way to be exhaustive,

so I've cherry-picked 4 examples that are a nice

combination of: interesting, important, and fun.

I could spend another 10 weeks on applications and

not cover the field exhaustively, but you now have

the background to go digging around on your own

in order to find and understand all the other cool

applications of this stuff.

Today's topic is prediction. Given a time series,

can you say what that time series is going to do next?

Of course, if we could do this, we'd all be rich.

The stock market, for example, is a time-series.

So is the currency exchange, and we could make

a lot of money if we could predict them effectively.

We'll come back to that. But, I want to start with a

different prediction task, the one that really got

this field going. This was the work of the Chaos Cabal

at the University of California at Santa Cruz,

which included a lot of names that you know:

Jim Crutchfield who was the author of the seminal

work on delay-coordinate embedding, Doyne Farmer

whose noise reduction scheme we talked about

(the one with the stable/nonstable manifolds

squeezing the noise balls. What they wanted to do:

predict where a ball on a roulette wheel would go -

that is, the black box in the previous picture is a

roulette wheel. If you could do this effectively,

you'd be rich if you didn't get tossed out of the casino

first, of course. They don't like people who have systems

that actually work. This slide is a sketch of one of the

strategies they tried. They gathered data, then they used

delay-coordinate embedding to reconstruct the full

dynamics from that data (that's that purple arrow down

there and the picture in the bottom-middle).

Then what they did was model that trajectory - that is,

they took little patches of that trajectory and examined

which direction the flow was in that little patch.

The simplest way to do this would be to fit a line to

the flow in each of those boxes and that's what they

started with. Then, in order to use that model, you drop

a new point some place in the midst of all your patches

and whichever patch you're in, you look for the arrow for

that patch and you move your point along that arrow

until it exits that patch. Then you look for the next patch,

and its arrow, and follow that arrow, and so on and

so forth, around the dynamics. And that gives you a

prediction of where the ball will be. Then you can use

that information to place a bet, and maybe win some

money. All of this is chronicled in this wonderful book

that I have listed at the bottom right of the slide,

and that book talks about the people, the technology,

and the challenges, and all that kind of stuff, including

chases by casino goons, short-circuits of battery

packs wrapped around people's bellies, and some

pretty seriously challenging engineering. This was

in the late 70s/early 80s when computer equipment

was a lot bigger and obviously, they had to hide that

equipment, and that was a huge part of the challenge.

You couldn't actually waltz in with your data acquisition

system and say "excuse me, I want to hook this up to

your roulette wheel". That wouldn't work. You had to

be surreptitious. In terms of data acquisition,

the challenge was to gather the data without being

caught and the solution that they used was a

pin switch in the sole of a shoe under the big toe.

The data gathering was a person standing by the wheel,

tapping his toe on that switch every time the ball

went by a particular place on the wheel - say, where

the dealer was standing (where that blue line is).

So, really this picture is not quite right. Instead of

evenly-sampled data about the angle of the ball on the

wheel, what we have is a series of spikes - times when

the ball when by a particular place. And as you recall,

Tim's shower showed that it's mathematically okay

to embed data like that - that was that interspike

interval embedding stuff that we talked about a week

or two ago. I wanted to circle back around to another

important part of this picture - the model. That green

arrow in the upper-right patch is a pretty simple way

to approximate the flow of the dynamics. You could

do many other things - for example, you could fit a

parabola or some other more complex function to the

observed trajectory paths in that patch.

You could also make the patches smaller to get a

finer-grained model, but then you run the risk of

not having enough data to fit the model.

It's a good idea to overlap the patches. So there's

a bunch of issues that are covered in that book if

you are interested. The field of machine learning

offers a ton of interesting and useful solutions to these

kinds of problems. But you can't simply grab one off the

shelf and apply it to a non-linear dynamical system and

have that work. We'll come back to that.

Another important point that I wanted to circle around

back to: the first step in the procedure followed by the

Santa Cruz Chaos Cabal was to embed the data.

That is, they didn't try to predict the time series

directly using techniques like autoregressive

moving average or something like that - the traditional

ways. You can think of embedding as a pre-processing

step that takes the temporal patterns and brings them

out spatially on those axes of the reconstruction space.

And in that manner, embedding gives prediction

methods more traction on the problem of capturing

and exploiting temporal patterns. Much more so

than if you just fit a function to the scalar time series

data. But all of this only works if the system that you're

working with is a deterministic, dynamical system

with fairly low dimension, as you know because of all

the noise and dimension caveats that we've talked about

the in the previous 2 units. Anyway, the roulette project

worked well enough to make some money - not a large

amount, but some money - for its creators.

It also catalyzed a law in the state of Nevada against the

use of computers in casinos. Most importantly though,

from my standpoint, it catalyzed a huge amount of

mathematics and a number of algorithms too.

Papers are still being written about this. After this work,

Doyne Farmer, Norman Packard and others moved

on to form something called the prediction Company.

That endeavor is chronicled in another fun book

if you're interested. It's called The Predictors.

The Prediction Company did well enough to get

bought by a big Swiss bank and I haven't heard much

about what they're doing since then. A few years later,

Andreas Weigend and Neil Gershenfeld decided to

run a time series prediction competition and they

took a bunch of data sets, put them up on an

FTP server - actually what they did was put the first

chunk of each of them on the FTP server and reserving

the last chunk for evaluation of all the predictions

that everybody sent in. Then they got the word out

and invited everybody to send in their predictions.

This too is chronicled in a good book, far more

technical than the last ones that I mentioned.

Here's the data, again - what were they? The top one

was a laser. Look back at it, that's the one that

bursts and then collapses. Then there's a sleep apnea

data set, the second one down.

The third one is currencies exchange rates - you can

see the day of the week and weekend pattern here -

there is no trading on the weekend.

The fourth one is fourth-order Runge Kutta on some.

ODE. The fifth one is some astro-physical data set.

And the last one is an interesting one. Both Neil and

Andreas are musicians and they were intrigued to

find out whether one could predict the future course

of a Bach fugue. So that's what that bottom data set is.

Here's an entry that they featured in the introduction

to their book as one of the best. It was, again,

Tim Sauer - same guy who did the inter-spike interval

embedding - and what you're seeing here is the

true continuation - that is, where the series should've

gone, which they know because they held back the

last little chunk. C is where Tim Sauer's method

predicted it would go and this was a strategy like the

one I just told you about where he embedded the data

and then did some local patch models - fair, but

more sophisticated than the roulette stuff and you can

look at his paper in that book if you're interested.

Here's another entry. This is a neural net - standard

machine learning method. And if you compare these

two pictures, it looks like the neural net did better -

and indeed it did for a while.

This is looking further out than the previous

two pictures - that is, the previous two pictures are

out to 110 and this one goes out to 1600.

And you can see that the dynamic space method

on the top captures the kind of bursty nature and

collapse of the laser, whereas the neural net

did really really well for a while and then completely

lost it. So one moral here: traditional machine

learning methods may not be the right thing when

you're working with deterministic, nonlinear

dynamical systems.

If you have the full state space trajectory, you actually

don't need very high-powered methods in order to get

traction on predictions tasks. Even a simple nearest

neighbor prediction works surprisingly well if you

have the full trajectory. So what do I mean by nearest

neighbor prediction? Imagine I have a historical

record of the weather and I have say: the pressure,

the temperature, the relative humidity at every

day for the past 30 years. I want to know what it's

going to do tomorrow. I look for the temperature,

pressure, and relative humidity for today, I look back

through the historical record for the day that's the best

match to that - that's the nearest neighbor - and then

whatever it did the day after that, I say it's going to do

tomorrow. That's called nearest neighbor prediction.

It was Edward Lorenz, the same Lorenz from the

Lorenz equations, who had the idea to apply that

to dynamical systems. And it's called Lorenz's

method of analogues. Say we have the blue

trajectory - that's the historical record of the weather -

and we want to know where this little green point will

go. Right in there, there's a green point.

We look for that green point's nearest neighbor, here's

the green point and here's the nearest neighbor in the

observed trajectory, and then we look where that point

went next - that is, it's forward image - and that's our

prediction. This works remarkably well, but it is not

very robust to noise. Remember that nearest neighbor

relationships are very sensitive to noise, so the point

that looks like the nearest neighbor could really not

be the nearest neighbor and therefore could go

somewhere else. Okay, here's a useful

modification that I bet many of you have thought of

already. Look for the k-nearest neighbors, look where

they go - that is, their forward images - and average

those forward images to get your prediction.

This also works remarkably well. Here's one example.

As in the Santa Fe Institute competition, we built the

model from the first chunk of the data - the first

90% here, used the model to predict the next 10%,

and then compared that against the 10% we held back;

the true continuation. And these results are pretty good.

If you're reading all the words on this slide, by the way,

you'll see that what we were predicting is actually

the cache miss-rate of a computer, an Intel

Core 2 base machine, as it runs a particular program.

That program repeatedly initializes a matrix in

column-major order. Now, look closely at that

trace at the top of this slide. It's almost periodic,

but not completely. That, at this point, should

pick your ears up. An almost-pattern that doesn't

quite repeat. That should smell like chaos.

So, we use computers to study dynamical systems,

but computers themselves are dynamical systems.

And their dynamics are often chaotic. This is hard

to wrap your head around. Feel free to post your

thoughts about this on the forum and start

a discussion.