So, in the previous section,
you had an example of coarse-graining
from image compression
I showed you two different examples
of what you might call
real-space compression.
These were just things that
I made up,
that have some parallel
to what people have done
when they come to coarse-grain
a physical system.
What I did in particular
was take things
that were nearby each other in space
--- I took a group
of objects, or a group of properties
of the system
that were nearby each other in space ---
and I summarized them in some way.
So in the majority-rule case, for example,
all I did was, I took all the pixels
in a certain grid square, and I said, OK,
if the number of black pixels
outnumbers the number of white pixels,
then that pixel will be black,
and otherwise, it will be white.
And what I've done there, of course,
is taken a whole set of very different
grid squares, each of which
looks different,
and I mapped them all onto a single
kind of square
that only has one property, zero or one,
whereas here, in the ten-by-ten case,
there are in fact a hundred bits
of information, you might say.
So this is the general property
of coarse-graining; this is
a simple example of how we build
a function, f, that maps from some space S
to some space S-prime.
This is a list of properties of,
or this is the set of properties,
that describe the fine-grained system ---
for example, it could be the entire
productivity list,
the entire employment history
of every single person in the economy ---
and over here is a much,
usually much smaller set of quantities
that we're going to use to describe
the coarse-grained system.
Coarse-graining is something we do
all the time.
Sometimes, we make it up ahead of time.
So for example,
when we came to do the coarse-graining
of the image from Alice in Wonderland,
we just took a couple different ways
in which we thought
that simplifying the image
might work --- and of course,
what it means for it to work
is a problem that we often
don't think about 'til
right at the end, but here,
we took the majority vote.
We're saying, OK, look, you know
if this pixel, if this kind of collection
of pixels here is mostly black,
let's just call it black.
And the idea is that your eye
might not really be
sensitive to it. In the end,
what we're doing when we choose
that rule is probably having a theory
about how the eye works,
or perhaps about what we care about
when we look at an image.
Of course, you'd be crazy to
do this on a bar code or QR code,
all right,
because in that case, the device
that's scanning the image
is much more sensitive
than your eye is,
and so if you took a photograph
of a QR code,
and then coarse-grained it
in the way we did,
the QR code would probably
no longer work.
In that case, the information that
we'd be throwing out,
the distinctions between the different
elements of this set here,
that the function all maps
to the same point here,
those distinctions would have been lost.
So sometimes,
we think about the problem we want
to solve,
and invent a coarse-graining algorithm.
Other times ---
and in fact this is becoming
increasingly common ---
other times, we don't know what
the coarse-graining is when we begin,
and in fact, we'll do something
really simple, like use
a clustering algorithm.
And so a classic example of clustering
might be k-means.
So what does k-means do,
what does a k-means algorithm do?
If you haven't heard this term,
don't worry; I'm going to tell you
what it means and you can go
look it up and do it
to your heart's content.
k-means takes a set of high-dimensional
data, and because we can't afford
a high-dimensional whiteboard,
we only have two,
so what I'm going to do here is,
I'm going to put a lot of
data points here, and maybe, for example,
this axis here is height,
and this axis here is weight, OK,
and these points here refer
to individuals of a species
that you've captured in the wild,
maybe these are birds, right.
And so what k-means will do is,
it will take that rich description,
OK, where every individual is
described by a real number, OK,
on both the x and y axes here,
each bird you've caught, let's say,
has a certain height and a certain weight,
and you go out, and you sort of
laboriously, you know,
maybe you're in some tropical country,
like you're in Costa Rica or something,
which seems to be where
all the bird people go, OK,
and then you say, well, I don't know,
something's going on there,
and you hand it to k-means, which says,
oh, you know what, actually,
there's two clusters here, right?
There's this cluster, A,
and this cluster, B. And so
what k-means has done is
mapped from a two-dimensional
vector space, maybe not R^2
because you can't have, like, you know,
negative weights, so maybe it's just
the positive side of R^2, right,
it's mapped from that two-dimensional
vector space into a binary set, A and B.
So in general, when you call up
a clustering algorithm,
really what you're asking it to do
is coarse-grain my data, OK,
and then if you're a biologist,
you might be like, you know,
there's something funny, like
all the A birds, they're all female,
let's say, and all the B birds, OK,
they're all male, OK,
and now you feel really good
because in fact what you've done is,
you've discovered, using this data
here, using this rich high-dimensional
data, you've discovered a simple
low-dimensional explanation
for what's happening.
Why do birds have different weights?
Well, you know, the main thing is,
is if they're women, or if they're men,
if they're male or female.
So, clustering is sort of an automated way
to have the data suggest to you
what a good coarse-graining might be.
You may have noticed, of course, that
in fact, k-means operates with a notion
of closeness just as the
Alice in Wonderland coarse-graining did.
In both cases, we were saying that
things that were near each other, OK,
should probably be described by
a single description, right?
All these points here get
the description A; all these points here
get the description B.
k-means is in fact an example of
what we call a hard clustering, OK,
and in this case here it says that
each point is uniquely assigned
to a single cluster, and this kind of
hard clustering is a form of
coarse-graining, is the easiest to
understand, and we'll give an example of,
in particular, how hard clustering plays
a central role in information theory. OK.
There are other clustering algorithms
that give what are called soft clusterings,
and the most closely related
soft clustering algorithm to the
k-means one is called the GMM,
or the Gaussian mixture model.
And in fact the GMM is very similar
to k-means. What it does is, it takes
high-dimensional data, a vector of
high-dimensional data, or rather a set
of vectors of high-dimensional data,
right, each point referring to
a particular observation, and what the GMM
does is says, oh, you know, it looks like
this could be described as, like,
you're kind of drawing from two
Gaussians, right, the Gaussians
have a particular center, right,
they have a particular variance
and covariance matrix,
so they kind of, like,
these little sausages that lie here,
right, OK, and so what the GMM does is,
it produces a Bayesian model
of the way your data might have been
created, OK, and then for each point here,
it tells you the probability that that point
was drawn either from this Gaussian ---
we'll call that G_1 --- or this
Gaussian --- call it G_2.
So in fact, in this case, it's giving you
not a unique assignment, but it's
giving you a sort of probabilistic
assignment. In particular, it's mapping
this two-dimensional vector space ---
and let's say it finds two clusters ---
it's mapping that two-dimensional
vector space into the, sorry,
into the interval [0, 1], which you can
think of here as the probability
that the point was drawn from cluster A.
So this point here, right, really centered
to the first cluster, that would have a
very high probability of being a member
of G_1 and a much lower probability of
being a member of G_2.
So these are subtly different concepts
here --- there's the hard clustering case
when it takes each point and says, OK,
here is the coarse-grained category
you belong to. Just as in the case of the
Alice in Wonderland algorithm, what
we did was, we took that little
grid square, that ten-by-ten grid,
and we say, OK, you're majority zero,
therefore you belong to
the category zero, you're majority one,
you belong to the category one.
The soft clustering is a little weaker,
right; it sort of avoids
making a decision. You'll encounter,
probably in your own research,
different kinds of clustering algorithms,
different kinds of clustering choices.
People tend to prefer hard clustering,
because you don't have to think as hard
when you work with it.