So, in the previous section,
you had an example of coarse-graining

from image compression

I showed you two different examples

of what you might call
real-space compression.

These were just things that
I made up,

that have some parallel
to what people have done

when they come to coarse-grain
a physical system.

What I did in particular
was take things

that were nearby each other in space
--- I took a group

of objects, or a group of properties
of the system

that were nearby each other in space ---
and I summarized them in some way.

So in the majority-rule case, for example,
all I did was, I took all the pixels

in a certain grid square, and I said, OK,
if the number of black pixels

outnumbers the number of white pixels,
then that pixel will be black,

and otherwise, it will be white.

And what I've done there, of course,
is taken a whole set of very different

grid squares, each of which
looks different,

and I mapped them all onto a single
kind of square

that only has one property, zero or one,
whereas here, in the ten-by-ten case,

there are in fact a hundred bits
of information, you might say.

So this is the general property
of coarse-graining; this is

a simple example of how we build
a function, f, that maps from some space S

to some space S-prime.
This is a list of properties of,

or this is the set of properties,
that describe the fine-grained system ---

for example, it could be the entire
productivity list,

the entire employment history
of every single person in the economy ---

and over here is a much,
usually much smaller set of quantities

that we're going to use to describe
the coarse-grained system.

Coarse-graining is something we do
all the time.

Sometimes, we make it up ahead of time.
So for example,

when we came to do the coarse-graining
of the image from Alice in Wonderland,

we just took a couple different ways
in which we thought

that simplifying the image
might work --- and of course,

what it means for it to work
is a problem that we often

don't think about 'til
right at the end, but here,

we took the majority vote.
We're saying, OK, look, you know

if this pixel, if this kind of collection
of pixels here is mostly black,

let's just call it black.
And the idea is that your eye

might not really be
sensitive to it. In the end,

what we're doing when we choose
that rule is probably having a theory

about how the eye works,
or perhaps about what we care about

when we look at an image.
Of course, you'd be crazy to

do this on a bar code or QR code,
all right,

because in that case, the device
that's scanning the image

is much more sensitive
than your eye is,

and so if you took a photograph
of a QR code,

and then coarse-grained it
in the way we did,

the QR code would probably
no longer work.

In that case, the information that
we'd be throwing out,

the distinctions between the different
elements of this set here,

that the function all maps
to the same point here,

those distinctions would have been lost.
So sometimes,

we think about the problem we want
to solve,

and invent a coarse-graining algorithm.
Other times ---

and in fact this is becoming
increasingly common ---

other times, we don't know what
the coarse-graining is when we begin,

and in fact, we'll do something
really simple, like use

a clustering algorithm.
And so a classic example of clustering

might be k-means.
So what does k-means do,

what does a k-means algorithm do?
If you haven't heard this term,

don't worry; I'm going to tell you
what it means and you can go

look it up and do it
to your heart's content.

k-means takes a set of high-dimensional
data, and because we can't afford

a high-dimensional whiteboard,
we only have two,

so what I'm going to do here is,
I'm going to put a lot of

data points here, and maybe, for example,
this axis here is height,

and this axis here is weight, OK,
and these points here refer

to individuals of a species
that you've captured in the wild,

maybe these are birds, right.
And so what k-means will do is,

it will take that rich description,
OK, where every individual is

described by a real number, OK,
on both the x and y axes here,

each bird you've caught, let's say,
has a certain height and a certain weight,

and you go out, and you sort of
laboriously, you know,

maybe you're in some tropical country,
like you're in Costa Rica or something,

which seems to be where
all the bird people go, OK,

and then you say, well, I don't know,
something's going on there,

and you hand it to k-means, which says,
oh, you know what, actually,

there's two clusters here, right?
There's this cluster, A,

and this cluster, B. And so
what k-means has done is

mapped from a two-dimensional
vector space, maybe not R^2

because you can't have, like, you know,
negative weights, so maybe it's just

the positive side of R^2, right,
it's mapped from that two-dimensional

vector space into a binary set, A and B.
So in general, when you call up

a clustering algorithm,
really what you're asking it to do

is coarse-grain my data, OK,
and then if you're a biologist,

you might be like, you know,
there's something funny, like

all the A birds, they're all female,
let's say, and all the B birds, OK,

they're all male, OK,
and now you feel really good

because in fact what you've done is,
you've discovered, using this data

here, using this rich high-dimensional
data, you've discovered a simple

low-dimensional explanation
for what's happening.

Why do birds have different weights?
Well, you know, the main thing is,

is if they're women, or if they're men,
if they're male or female.

So, clustering is sort of an automated way
to have the data suggest to you

what a good coarse-graining might be.
You may have noticed, of course, that

in fact, k-means operates with a notion
of closeness just as the

Alice in Wonderland coarse-graining did.
In both cases, we were saying that

things that were near each other, OK,
should probably be described by

a single description, right?
All these points here get

the description A; all these points here
get the description B.

k-means is in fact an example of
what we call a hard clustering, OK,

and in this case here it says that
each point is uniquely assigned

to a single cluster, and this kind of
hard clustering is a form of

coarse-graining, is the easiest to
understand, and we'll give an example of,

in particular, how hard clustering plays
a central role in information theory. OK.

There are other clustering algorithms
that give what are called soft clusterings,

and the most closely related
soft clustering algorithm to the

k-means one is called the GMM,
or the Gaussian mixture model.

And in fact the GMM is very similar
to k-means. What it does is, it takes

high-dimensional data, a vector of
high-dimensional data, or rather a set

of vectors of high-dimensional data,
right, each point referring to

a particular observation, and what the GMM
does is says, oh, you know, it looks like

this could be described as, like,
you're kind of drawing from two

Gaussians, right, the Gaussians
have a particular center, right,

they have a particular variance
and covariance matrix,

so they kind of, like,
these little sausages that lie here,

right, OK, and so what the GMM does is,
it produces a Bayesian model

of the way your data might have been
created, OK, and then for each point here,

it tells you the probability that that point
was drawn either from this Gaussian ---

we'll call that G_1 --- or this
Gaussian --- call it G_2.

So in fact, in this case, it's giving you
not a unique assignment, but it's

giving you a sort of probabilistic
assignment. In particular, it's mapping

this two-dimensional vector space ---
and let's say it finds two clusters ---

it's mapping that two-dimensional
vector space into the, sorry,

into the interval [0, 1], which you can
think of here as the probability

that the point was drawn from cluster A.
So this point here, right, really centered

to the first cluster, that would have a
very high probability of being a member

of G_1 and a much lower probability of
being a member of G_2.

So these are subtly different concepts
here --- there's the hard clustering case

when it takes each point and says, OK,
here is the coarse-grained category

you belong to. Just as in the case of the
Alice in Wonderland algorithm, what

we did was, we took that little
grid square, that ten-by-ten grid,

and we say, OK, you're majority zero,
therefore you belong to

the category zero, you're majority one,
you belong to the category one.

The soft clustering is a little weaker,
right; it sort of avoids

making a decision. You'll encounter,
probably in your own research,

different kinds of clustering algorithms,
different kinds of clustering choices.

People tend to prefer hard clustering,
because you don't have to think as hard

when you work with it.