I'm now gonna review all the steps we took

ok? Because we went on a long journey and

you've learned a huge number of things

in the pursuit of solving this rather, in

the end, rather simple problem. Ok?

The problem you wanted to solve, ok?

was a parsimonious description of how long

it takes to get a cab in New York City ok?

And that parsimonious description you

wanted to induce or indoctor learn from

data, ok? So things that aren't

parsimonious are saying that the

probability of waiting n-minutes is the

number of times you saw yourself waiting

for n-minutes, ok?, for a cab. Those kinds

of descriptions we decided were

they were too..they were overfitting the

data. Ok? So instead what I said was

what we're gonna do is we're gonna try to

reproduce a limited number of features.

We're not gonna try to reproduce, for

example, the exact number of times we

waited six minutes. ok? Or the exact

fraction of time we waited six minutes

Instead what we're gonna do is reproduce

right, some of the overall gross

characteristics of the data. In particular

what I said was you know what? the only

thing I want to preserve is the average

time it take me to get a cab. That's it.

Everything else, forget it.

Now the problem is, there's many

distributions that preserve that

So what we decided to do was take the

distribution that had maximum entropy

subject to that constraint, ok? And the

argument that we made was that the

distribution with maximum entropy leaves

you maximally uncertain about the waiting

time, ok? It has no additional hidden

theories. There's no way that it

implicitly assumes something else about

the data specific that would reduce your

uncertainty about what was going to happen

ok? So that was our argument...that was

sort of our...er, uh...intuitive

justification for this step here, to

maximize the entropy. Once you believe

that's a good thing to do, then we dive

into the mathematics. In particular,

what I had to do was show you how the

method of Lagrange multipliers works. This

is a great mathematical tool, it's useful

not only the particular case of the MaxEnt

problem, but you see it all over the place

particularly in a subject like economics

ok? where you're goal - in fact, their

Lagrange multipliers are called

"Shadow Prices", ok? But, in those..in a

lot of systems is trying to maximize one

quantity but you're constrained by another

set of forces, ok? So I showed you how to

do a Lagrange multiplier trick. I gave you

I gave you...I gave you the one constraint

two dimensional problem and I told you

that the end constraint problem seems, or

works out in a very similar fashion, ok?

And then I actually worked through the

problem of maximizing constraints - of

maximizing entropy subject to constraints

and we found a particular functional form

but it was only a functional form.

It was only a functional form because

lambda and Z, these were the hidden

Lagrange multiplier terms. These were

terms that I had to set by hand.

So I know the functional form right away

But now I have to the heavy lifting to

actually figure out what Lambda and Z

should be. And so, I had to do some

infinite sums, played
some nice mathematical

games - I hope you had fun - and in the

end, what we've found was that solving

for these Lagrange multipliers ended up

with a single transcendental equation

for Lambda 1. While you weren't looking

I quickly plugged that into Mathematica

and found the numerical value of Lambda 1

which is equal to about 0.22.

So, at the end of all of this - if this

is 0 minutes, 1, 2, 3, 4, 5, 6, 7…

this is your waiting time in minutes

this is the probability of waiting that

long. So in the data, we had, you know

sometimes we waited 6 minutes

sometimes we waited, let's see, 3 minutes

sometimes we waited 4 minutes, there were

a couple times we waited 2, ok? So,

this was the distribution of the data that

we had measured, alright? This would have

been what we would have decided was an

overfitting model. And in fact, we found

was that the distribution actually looks

something like this. It's an exponential

distribution, in x, ok? So this here

is in some sense, the best fitting model

to this if you were strict...if if if...

if this was constrained only by fixing

the average value of these waiting times

here. That's the only thing we've

constrained. And this here, for this

particular choice of Lagrange multiplier

constraints, gets the average right and

nothing else. It's maximally uncertain.

It's not that this doesn't have other

properties, it's distribution does have

for example, a variance. But those are

all dependent, those are chosen so

that this distribution here has the

maximum entropy subject to a constraint

only on the average. So,

let's think

briefly about this model, which by the way

is mechanistically agnostic, alright? It

has no theory about taxi cabs. At no place

we could have, instead of modeling

waiting time for taxi cabs, we could have

modeled waiting time for, I don't know

you know, your next United flight, alright

We could have modeled, you know, the

number of, um, you know, the number of

earthquakes in Japan over a year of a

certain magnitude. We could have modeled

the number of, you know, C-pluses you

give to your students in a particular year

Ok? This method is totally agnostic about

the actual underlying sort of physics

or cognitive science or sociology of the

problem, ok? But let go, and look and see

if there's any implicit mechanistic model

that maximum entropy has kind of

implicitly given to us. In particular,

let's see if we can construct, and we'll

be able to do this quite easily, an

underlying mechanistic model for catching

a cab in New York that produces the same

probability distribution, ok? And so what

I'm going to do, is I'm going to say the

chance of you getting a cab in New York

is constant and independent of time. And

in particular, the chance of you getting a

cab at any one minute interval is 'p'.

Alright? Some number 'p', ok? So that

means the chance of you getting a cab

between 0 minutes and 1 minute is 'p'

the chance of you getting a cab between

1 minute and 2 minutes, well, first of all

it's 1 minus p, cause you didn't get a cab

that first minute, ok, you got unlucky.

And the chance that, ok - having not

gotten a cab in the first minute, you get

a cab in the second minute. Ok, that's

just 'p'. Ok? So...or rather p(0), is p.

The probability of getting a cab between

0 and less than 1 minute is p. p(1) is

1 minus p times p. And of course p(2) is

just 1 minus p squared times p.

Didn't get it the first minute, didn't

get it the second, finally got it the

third. Ok?

And so, this here is a mechanistic model

Ok? And at least has some theory about

taxi cabs in New York, it assumes they are

sort of like rain drops, they kinda fall

from the sky. Ok?

Independent of each other.

And of course you can map this model here,

which in general looks like…

P(x) equals one minus p to the x, times p.

And if I define Z has one over p.

And I define lambda 1 has negative

log one minus p.

Then I have an exact correspondence

between these two models. Ok?

So, what We've just showed is that the

maximum entropy model, ok, where the

waiting time is constraint on average to

be a particular value, but the system is

other wise completely uncertain,

is equivalent to sot of random rain drop

taxi cab arrival model, and, what We'll do

on and off for the rest of this lecture is

talk little bit of how this mechanism

agnostic story, ok? Can be translated into

some set of assumptions, ok?

About the underlying principles,

the underlying scientific principles that

might be at work, and so

in particular here is a bit exalted to

call this a scientific principle, but the

story is essentially that, you know…

Privately own transportation services

in New York arrive in an uncorrelated

fashion with each other,

constant over time, ok?

And you can see, of course, that you know,

if you wait, you know, too long maybe

the time of day changes, maybe some other

features of the system changes, so

this p might change, ok, in which case

this model here would no longer have

the same functional form of

the max. Ent. Model, ok?

And you can see there how additional

mechanistic phenomena might drive

the system away from the simple

max. Ent. Model constraint model, ok?