I'm now gonna review all the steps we took ok? Because we went on a long journey and you've learned a huge number of things in the pursuit of solving this rather, in the end, rather simple problem. Ok? The problem you wanted to solve, ok? was a parsimonious description of how long it takes to get a cab in New York City ok? And that parsimonious description you wanted to induce or indoctor learn from data, ok? So things that aren't parsimonious are saying that the probability of waiting n-minutes is the number of times you saw yourself waiting for n-minutes, ok?, for a cab. Those kinds of descriptions we decided were they were too..they were overfitting the data. Ok? So instead what I said was what we're gonna do is we're gonna try to reproduce a limited number of features. We're not gonna try to reproduce, for example, the exact number of times we waited six minutes. ok? Or the exact fraction of time we waited six minutes Instead what we're gonna do is reproduce right, some of the overall gross characteristics of the data. In particular what I said was you know what? the only thing I want to preserve is the average time it take me to get a cab. That's it. Everything else, forget it. Now the problem is, there's many distributions that preserve that So what we decided to do was take the distribution that had maximum entropy subject to that constraint, ok? And the argument that we made was that the distribution with maximum entropy leaves you maximally uncertain about the waiting time, ok? It has no additional hidden theories. There's no way that it implicitly assumes something else about the data specific that would reduce your uncertainty about what was going to happen ok? So that was our argument...that was sort of our...er, uh...intuitive justification for this step here, to maximize the entropy. Once you believe that's a good thing to do, then we dive into the mathematics. In particular, what I had to do was show you how the method of Lagrange multipliers works. This is a great mathematical tool, it's useful not only the particular case of the MaxEnt problem, but you see it all over the place particularly in a subject like economics ok? where you're goal - in fact, their Lagrange multipliers are called "Shadow Prices", ok? But, in those..in a lot of systems is trying to maximize one quantity but you're constrained by another set of forces, ok? So I showed you how to do a Lagrange multiplier trick. I gave you I gave you...I gave you the one constraint two dimensional problem and I told you that the end constraint problem seems, or works out in a very similar fashion, ok? And then I actually worked through the problem of maximizing constraints - of maximizing entropy subject to constraints and we found a particular functional form but it was only a functional form. It was only a functional form because lambda and Z, these were the hidden Lagrange multiplier terms. These were terms that I had to set by hand. So I know the functional form right away But now I have to the heavy lifting to actually figure out what Lambda and Z should be. And so, I had to do some infinite sums, played some nice mathematical games - I hope you had fun - and in the end, what we've found was that solving for these Lagrange multipliers ended up with a single transcendental equation for Lambda 1. While you weren't looking I quickly plugged that into Mathematica and found the numerical value of Lambda 1 which is equal to about 0.22. So, at the end of all of this - if this is 0 minutes, 1, 2, 3, 4, 5, 6, 7… this is your waiting time in minutes this is the probability of waiting that long. So in the data, we had, you know sometimes we waited 6 minutes sometimes we waited, let's see, 3 minutes sometimes we waited 4 minutes, there were a couple times we waited 2, ok? So, this was the distribution of the data that we had measured, alright? This would have been what we would have decided was an overfitting model. And in fact, we found was that the distribution actually looks something like this. It's an exponential distribution, in x, ok? So this here is in some sense, the best fitting model to this if you were strict...if if if... if this was constrained only by fixing the average value of these waiting times here. That's the only thing we've constrained. And this here, for this particular choice of Lagrange multiplier constraints, gets the average right and nothing else. It's maximally uncertain. It's not that this doesn't have other properties, it's distribution does have for example, a variance. But those are all dependent, those are chosen so that this distribution here has the maximum entropy subject to a constraint only on the average. So, let's think briefly about this model, which by the way is mechanistically agnostic, alright? It has no theory about taxi cabs. At no place we could have, instead of modeling waiting time for taxi cabs, we could have modeled waiting time for, I don't know you know, your next United flight, alright We could have modeled, you know, the number of, um, you know, the number of earthquakes in Japan over a year of a certain magnitude. We could have modeled the number of, you know, C-pluses you give to your students in a particular year Ok? This method is totally agnostic about the actual underlying sort of physics or cognitive science or sociology of the problem, ok? But let go, and look and see if there's any implicit mechanistic model that maximum entropy has kind of implicitly given to us. In particular, let's see if we can construct, and we'll be able to do this quite easily, an underlying mechanistic model for catching a cab in New York that produces the same probability distribution, ok? And so what I'm going to do, is I'm going to say the chance of you getting a cab in New York is constant and independent of time. And in particular, the chance of you getting a cab at any one minute interval is 'p'. Alright? Some number 'p', ok? So that means the chance of you getting a cab between 0 minutes and 1 minute is 'p' the chance of you getting a cab between 1 minute and 2 minutes, well, first of all it's 1 minus p, cause you didn't get a cab that first minute, ok, you got unlucky. And the chance that, ok - having not gotten a cab in the first minute, you get a cab in the second minute. Ok, that's just 'p'. Ok? So...or rather p(0), is p. The probability of getting a cab between 0 and less than 1 minute is p. p(1) is 1 minus p times p. And of course p(2) is just 1 minus p squared times p. Didn't get it the first minute, didn't get it the second, finally got it the third. Ok? And so, this here is a mechanistic model Ok? And at least has some theory about taxi cabs in New York, it assumes they are sort of like rain drops, they kinda fall from the sky. Ok? Independent of each other. And of course you can map this model here, which in general looks like… P(x) equals one minus p to the x, times p. And if I define Z has one over p. And I define lambda 1 has negative log one minus p. Then I have an exact correspondence between these two models. Ok? So, what We've just showed is that the maximum entropy model, ok, where the waiting time is constraint on average to be a particular value, but the system is other wise completely uncertain, is equivalent to sot of random rain drop taxi cab arrival model, and, what We'll do on and off for the rest of this lecture is talk little bit of how this mechanism agnostic story, ok? Can be translated into some set of assumptions, ok? About the underlying principles, the underlying scientific principles that might be at work, and so in particular here is a bit exalted to call this a scientific principle, but the story is essentially that, you know… Privately own transportation services in New York arrive in an uncorrelated fashion with each other, constant over time, ok? And you can see, of course, that you know, if you wait, you know, too long maybe the time of day changes, maybe some other features of the system changes, so this p might change, ok, in which case this model here would no longer have the same functional form of the max. Ent. Model, ok? And you can see there how additional mechanistic phenomena might drive the system away from the simple max. Ent. Model constraint model, ok?