Okay, thank you for joining me

we are going to talk today about what

is called Max Ent, or

the Principal of Maximum Entropy, or

sometimes called Maximum Entropy Methods.

Max Ent can break into two parts

and those two parts here will have

just in the 1st and 2nd half

of this unit.

Max Entropy was invented by

a gentleman named E.T. Jaynes.

He was the first person to

really put this all onto a single paper

for physical review and

Jaynes was after some really deep

philosophical, almost epistemological,

questions about the nature of

reality, why physical laws took

the shape that they did.

But in recent years, what we've

found is that max ent, the principal of

maximum entropy, has found

an enormous amount

of use in machine learning, in the

modelling of real-world processes,

as opposed to, lets say, their

explanation and the understanding

of those processes.

So there are some people who are really

interested in prediction, for example,

they would like to learn what the

stock market looks like and predict

what it is going to look like tomorrow.

They want to learn the natural, lets

say, a particular patients cancer;

they would like to model in such

a way that the model is good enough

that tomorrow they can predict what's

going to happen next.

That's a huge goal, intellectually

incredibly ambitious goal that people

in the artificial intelligence & machine

learning community have. And

max ent is huge in that part of the

world and that part of the intellectual

world and what we will do is begin

there. In the 2nd part of the

talk, i am going to try to draw connections

from what you have learned on the

prediction & machine learning side

and try to apply those to some really

exciting problems that we find in the

study of biological systems, in the

study of social systems. In

particular I will try to get a little bit

at some of the kind of deeper

philosophical questions that maximum

entropy raises for us particularly when

it works so well as it does.

So what I will do is begin with the

prediction problem. In particular

I begin with the kind of problem which

maximum entropy excels at.

That is in the prediction of high

dimensional data; so high dimensional

data in this case, we'll explain as a kind

of working definition, say something

along the following line: a system is

high dimensional if the number of

configurations (which we call n) is

much greater than the amount of

data that we have, and we call that k.

So this is the amount of data and this is

the number of possible configurations.

So the number of ways the system could look

is much greater than the number

of ways you've actually observe it in

the real world. The number of

times we observe in the real world.

Oftentimes we can talk about the

dimensions of a data set, so lets

for example, take an image, and lets take

a black and white image. Lets say

that the image has 10,000 pixels.

Each pixel in your image, each pixel in

your image, can take on, lets say, +1

or -1 value. Black, lets say +1, and -1

lets say is white. So any image here

can have any arbitrary combination,

any arbitrary combination, of

pixel values. If those 10,000 pixels,

and each pixel can be the +1 or -1,

then the total # of images is 2^10,000.

Each pixel has a discrete dimension,

+1 or -1, and there's 10,000 of them.

So if you're trying to build a model

of, lets say, handwritten words, so you

would like to model for example all the

different ways in which i can write the

letter 'e'.

There is almost no way that you are

going to acquire, in fact I think there is,

probably, its probably possible to prove

that the universe will die a fearsome

heat death before you're able to gather enough

samples of my handwriting so

that the amount of data you have (k)

is anyway comparable to the 2^10,000.

Just to give you a sense, 2^10,000 is

sort of like 1,000^1,000 10^3,000. So thats

like a googal to the power of 30. So

in these cases here what we would

like to do is, we would like to talk

about, or at least give probability to

particular images drawn from

a set where the total number

of images is far fewer than the total

number of possible images. That's

where something like max ent excels...