My goal in the first 9 units of this course was to lay the groundwork - to give you a basic understanding of nonlinear dynamics with an emphasis on the computational angle of this field. Important, as you now know, because chaotic equations can't be solved in closed form. I've emphasized practice over theory, demonstrating concepts with examples, rather than proofs and doing a lot of ranting about what happens to the theory under the constraints of real-world applications where there is noise, and you can't sample everything, and you can't sample as fast as you want, and so on and so forth. In this last unit of the course, I'm going to take advantage of that groundwork and dig into a couple of examples a bit more deeply. Now, because nonlinear dynamics is everywhere, choosing a set of examples for a unit like this is a real challenge. There's no way to be exhaustive, so I've cherry-picked 4 examples that are a nice combination of: interesting, important, and fun. I could spend another 10 weeks on applications and not cover the field exhaustively, but you now have the background to go digging around on your own in order to find and understand all the other cool applications of this stuff. Today's topic is prediction. Given a time series, can you say what that time series is going to do next? Of course, if we could do this, we'd all be rich. The stock market, for example, is a time-series. So is the currency exchange, and we could make a lot of money if we could predict them effectively. We'll come back to that. But, I want to start with a different prediction task, the one that really got this field going. This was the work of the Chaos Cabal at the University of California at Santa Cruz, which included a lot of names that you know: Jim Crutchfield who was the author of the seminal work on delay-coordinate embedding, Doyne Farmer whose noise reduction scheme we talked about (the one with the stable/nonstable manifolds squeezing the noise balls. What they wanted to do: predict where a ball on a roulette wheel would go - that is, the black box in the previous picture is a roulette wheel. If you could do this effectively, you'd be rich if you didn't get tossed out of the casino first, of course. They don't like people who have systems that actually work. This slide is a sketch of one of the strategies they tried. They gathered data, then they used delay-coordinate embedding to reconstruct the full dynamics from that data (that's that purple arrow down there and the picture in the bottom-middle). Then what they did was model that trajectory - that is, they took little patches of that trajectory and examined which direction the flow was in that little patch. The simplest way to do this would be to fit a line to the flow in each of those boxes and that's what they started with. Then, in order to use that model, you drop a new point some place in the midst of all your patches and whichever patch you're in, you look for the arrow for that patch and you move your point along that arrow until it exits that patch. Then you look for the next patch, and its arrow, and follow that arrow, and so on and so forth, around the dynamics. And that gives you a prediction of where the ball will be. Then you can use that information to place a bet, and maybe win some money. All of this is chronicled in this wonderful book that I have listed at the bottom right of the slide, and that book talks about the people, the technology, and the challenges, and all that kind of stuff, including chases by casino goons, short-circuits of battery packs wrapped around people's bellies, and some pretty seriously challenging engineering. This was in the late 70s/early 80s when computer equipment was a lot bigger and obviously, they had to hide that equipment, and that was a huge part of the challenge. You couldn't actually waltz in with your data acquisition system and say "excuse me, I want to hook this up to your roulette wheel". That wouldn't work. You had to be surreptitious. In terms of data acquisition, the challenge was to gather the data without being caught and the solution that they used was a pin switch in the sole of a shoe under the big toe. The data gathering was a person standing by the wheel, tapping his toe on that switch every time the ball went by a particular place on the wheel - say, where the dealer was standing (where that blue line is). So, really this picture is not quite right. Instead of evenly-sampled data about the angle of the ball on the wheel, what we have is a series of spikes - times when the ball when by a particular place. And as you recall, Tim's shower showed that it's mathematically okay to embed data like that - that was that interspike interval embedding stuff that we talked about a week or two ago. I wanted to circle back around to another important part of this picture - the model. That green arrow in the upper-right patch is a pretty simple way to approximate the flow of the dynamics. You could do many other things - for example, you could fit a parabola or some other more complex function to the observed trajectory paths in that patch. You could also make the patches smaller to get a finer-grained model, but then you run the risk of not having enough data to fit the model. It's a good idea to overlap the patches. So there's a bunch of issues that are covered in that book if you are interested. The field of machine learning offers a ton of interesting and useful solutions to these kinds of problems. But you can't simply grab one off the shelf and apply it to a non-linear dynamical system and have that work. We'll come back to that. Another important point that I wanted to circle around back to: the first step in the procedure followed by the Santa Cruz Chaos Cabal was to embed the data. That is, they didn't try to predict the time series directly using techniques like autoregressive moving average or something like that - the traditional ways. You can think of embedding as a pre-processing step that takes the temporal patterns and brings them out spatially on those axes of the reconstruction space. And in that manner, embedding gives prediction methods more traction on the problem of capturing and exploiting temporal patterns. Much more so than if you just fit a function to the scalar time series data. But all of this only works if the system that you're working with is a deterministic, dynamical system with fairly low dimension, as you know because of all the noise and dimension caveats that we've talked about the in the previous 2 units. Anyway, the roulette project worked well enough to make some money - not a large amount, but some money - for its creators. It also catalyzed a law in the state of Nevada against the use of computers in casinos. Most importantly though, from my standpoint, it catalyzed a huge amount of mathematics and a number of algorithms too. Papers are still being written about this. After this work, Doyne Farmer, Norman Packard and others moved on to form something called the prediction Company. That endeavor is chronicled in another fun book if you're interested. It's called The Predictors. The Prediction Company did well enough to get bought by a big Swiss bank and I haven't heard much about what they're doing since then. A few years later, Andreas Weigend and Neil Gershenfeld decided to run a time series prediction competition and they took a bunch of data sets, put them up on an FTP server - actually what they did was put the first chunk of each of them on the FTP server and reserving the last chunk for evaluation of all the predictions that everybody sent in. Then they got the word out and invited everybody to send in their predictions. This too is chronicled in a good book, far more technical than the last ones that I mentioned. Here's the data, again - what were they? The top one was a laser. Look back at it, that's the one that bursts and then collapses. Then there's a sleep apnea data set, the second one down. The third one is currencies exchange rates - you can see the day of the week and weekend pattern here - there is no trading on the weekend. The fourth one is fourth-order Runge Kutta on some. ODE. The fifth one is some astro-physical data set. And the last one is an interesting one. Both Neil and Andreas are musicians and they were intrigued to find out whether one could predict the future course of a Bach fugue. So that's what that bottom data set is. Here's an entry that they featured in the introduction to their book as one of the best. It was, again, Tim Sauer - same guy who did the inter-spike interval embedding - and what you're seeing here is the true continuation - that is, where the series should've gone, which they know because they held back the last little chunk. C is where Tim Sauer's method predicted it would go and this was a strategy like the one I just told you about where he embedded the data and then did some local patch models - fair, but more sophisticated than the roulette stuff and you can look at his paper in that book if you're interested. Here's another entry. This is a neural net - standard machine learning method. And if you compare these two pictures, it looks like the neural net did better - and indeed it did for a while. This is looking further out than the previous two pictures - that is, the previous two pictures are out to 110 and this one goes out to 1600. And you can see that the dynamic space method on the top captures the kind of bursty nature and collapse of the laser, whereas the neural net did really really well for a while and then completely lost it. So one moral here: traditional machine learning methods may not be the right thing when you're working with deterministic, nonlinear dynamical systems. If you have the full state space trajectory, you actually don't need very high-powered methods in order to get traction on predictions tasks. Even a simple nearest neighbor prediction works surprisingly well if you have the full trajectory. So what do I mean by nearest neighbor prediction? Imagine I have a historical record of the weather and I have say: the pressure, the temperature, the relative humidity at every day for the past 30 years. I want to know what it's going to do tomorrow. I look for the temperature, pressure, and relative humidity for today, I look back through the historical record for the day that's the best match to that - that's the nearest neighbor - and then whatever it did the day after that, I say it's going to do tomorrow. That's called nearest neighbor prediction. It was Edward Lorenz, the same Lorenz from the Lorenz equations, who had the idea to apply that to dynamical systems. And it's called Lorenz's method of analogues. Say we have the blue trajectory - that's the historical record of the weather - and we want to know where this little green point will go. Right in there, there's a green point. We look for that green point's nearest neighbor, here's the green point and here's the nearest neighbor in the observed trajectory, and then we look where that point went next - that is, it's forward image - and that's our prediction. This works remarkably well, but it is not very robust to noise. Remember that nearest neighbor relationships are very sensitive to noise, so the point that looks like the nearest neighbor could really not be the nearest neighbor and therefore could go somewhere else. Okay, here's a useful modification that I bet many of you have thought of already. Look for the k-nearest neighbors, look where they go - that is, their forward images - and average those forward images to get your prediction. This also works remarkably well. Here's one example. As in the Santa Fe Institute competition, we built the model from the first chunk of the data - the first 90% here, used the model to predict the next 10%, and then compared that against the 10% we held back; the true continuation. And these results are pretty good. If you're reading all the words on this slide, by the way, you'll see that what we were predicting is actually the cache miss-rate of a computer, an Intel Core 2 base machine, as it runs a particular program. That program repeatedly initializes a matrix in column-major order. Now, look closely at that trace at the top of this slide. It's almost periodic, but not completely. That, at this point, should pick your ears up. An almost-pattern that doesn't quite repeat. That should smell like chaos. So, we use computers to study dynamical systems, but computers themselves are dynamical systems. And their dynamics are often chaotic. This is hard to wrap your head around. Feel free to post your thoughts about this on the forum and start a discussion.