So, in the previous section, you had an example of coarse-graining from image compression I showed you two different examples of what you might call real-space compression. These were just things that I made up, that have some parallel to what people have done when they come to coarse-grain a physical system. What I did in particular was take things that were nearby each other in space --- I took a group of objects, or a group of properties of the system that were nearby each other in space --- and I summarized them in some way. So in the majority-rule case, for example, all I did was, I took all the pixels in a certain grid square, and I said, OK, if the number of black pixels outnumbers the number of white pixels, then that pixel will be black, and otherwise, it will be white. And what I've done there, of course, is taken a whole set of very different grid squares, each of which looks different, and I mapped them all onto a single kind of square that only has one property, zero or one, whereas here, in the ten-by-ten case, there are in fact a hundred bits of information, you might say. So this is the general property of coarse-graining; this is a simple example of how we build a function, f, that maps from some space S to some space S-prime. This is a list of properties of, or this is the set of properties, that describe the fine-grained system --- for example, it could be the entire productivity list, the entire employment history of every single person in the economy --- and over here is a much, usually much smaller set of quantities that we're going to use to describe the coarse-grained system. Coarse-graining is something we do all the time. Sometimes, we make it up ahead of time. So for example, when we came to do the coarse-graining of the image from Alice in Wonderland, we just took a couple different ways in which we thought that simplifying the image might work --- and of course, what it means for it to work is a problem that we often don't think about 'til right at the end, but here, we took the majority vote. We're saying, OK, look, you know if this pixel, if this kind of collection of pixels here is mostly black, let's just call it black. And the idea is that your eye might not really be sensitive to it. In the end, what we're doing when we choose that rule is probably having a theory about how the eye works, or perhaps about what we care about when we look at an image. Of course, you'd be crazy to do this on a bar code or QR code, all right, because in that case, the device that's scanning the image is much more sensitive than your eye is, and so if you took a photograph of a QR code, and then coarse-grained it in the way we did, the QR code would probably no longer work. In that case, the information that we'd be throwing out, the distinctions between the different elements of this set here, that the function all maps to the same point here, those distinctions would have been lost. So sometimes, we think about the problem we want to solve, and invent a coarse-graining algorithm. Other times --- and in fact this is becoming increasingly common --- other times, we don't know what the coarse-graining is when we begin, and in fact, we'll do something really simple, like use a clustering algorithm. And so a classic example of clustering might be k-means. So what does k-means do, what does a k-means algorithm do? If you haven't heard this term, don't worry; I'm going to tell you what it means and you can go look it up and do it to your heart's content. k-means takes a set of high-dimensional data, and because we can't afford a high-dimensional whiteboard, we only have two, so what I'm going to do here is, I'm going to put a lot of data points here, and maybe, for example, this axis here is height, and this axis here is weight, OK, and these points here refer to individuals of a species that you've captured in the wild, maybe these are birds, right. And so what k-means will do is, it will take that rich description, OK, where every individual is described by a real number, OK, on both the x and y axes here, each bird you've caught, let's say, has a certain height and a certain weight, and you go out, and you sort of laboriously, you know, maybe you're in some tropical country, like you're in Costa Rica or something, which seems to be where all the bird people go, OK, and then you say, well, I don't know, something's going on there, and you hand it to k-means, which says, oh, you know what, actually, there's two clusters here, right? There's this cluster, A, and this cluster, B. And so what k-means has done is mapped from a two-dimensional vector space, maybe not R^2 because you can't have, like, you know, negative weights, so maybe it's just the positive side of R^2, right, it's mapped from that two-dimensional vector space into a binary set, A and B. So in general, when you call up a clustering algorithm, really what you're asking it to do is coarse-grain my data, OK, and then if you're a biologist, you might be like, you know, there's something funny, like all the A birds, they're all female, let's say, and all the B birds, OK, they're all male, OK, and now you feel really good because in fact what you've done is, you've discovered, using this data here, using this rich high-dimensional data, you've discovered a simple low-dimensional explanation for what's happening. Why do birds have different weights? Well, you know, the main thing is, is if they're women, or if they're men, if they're male or female. So, clustering is sort of an automated way to have the data suggest to you what a good coarse-graining might be. You may have noticed, of course, that in fact, k-means operates with a notion of closeness just as the Alice in Wonderland coarse-graining did. In both cases, we were saying that things that were near each other, OK, should probably be described by a single description, right? All these points here get the description A; all these points here get the description B. k-means is in fact an example of what we call a hard clustering, OK, and in this case here it says that each point is uniquely assigned to a single cluster, and this kind of hard clustering is a form of coarse-graining, is the easiest to understand, and we'll give an example of, in particular, how hard clustering plays a central role in information theory. OK. There are other clustering algorithms that give what are called soft clusterings, and the most closely related soft clustering algorithm to the k-means one is called the GMM, or the Gaussian mixture model. And in fact the GMM is very similar to k-means. What it does is, it takes high-dimensional data, a vector of high-dimensional data, or rather a set of vectors of high-dimensional data, right, each point referring to a particular observation, and what the GMM does is says, oh, you know, it looks like this could be described as, like, you're kind of drawing from two Gaussians, right, the Gaussians have a particular center, right, they have a particular variance and covariance matrix, so they kind of, like, these little sausages that lie here, right, OK, and so what the GMM does is, it produces a Bayesian model of the way your data might have been created, OK, and then for each point here, it tells you the probability that that point was drawn either from this Gaussian --- we'll call that G_1 --- or this Gaussian --- call it G_2. So in fact, in this case, it's giving you not a unique assignment, but it's giving you a sort of probabilistic assignment. In particular, it's mapping this two-dimensional vector space --- and let's say it finds two clusters --- it's mapping that two-dimensional vector space into the, sorry, into the interval [0, 1], which you can think of here as the probability that the point was drawn from cluster A. So this point here, right, really centered to the first cluster, that would have a very high probability of being a member of G_1 and a much lower probability of being a member of G_2. So these are subtly different concepts here --- there's the hard clustering case when it takes each point and says, OK, here is the coarse-grained category you belong to. Just as in the case of the Alice in Wonderland algorithm, what we did was, we took that little grid square, that ten-by-ten grid, and we say, OK, you're majority zero, therefore you belong to the category zero, you're majority one, you belong to the category one. The soft clustering is a little weaker, right; it sort of avoids making a decision. You'll encounter, probably in your own research, different kinds of clustering algorithms, different kinds of clustering choices. People tend to prefer hard clustering, because you don't have to think as hard when you work with it.