So in the first module we gave you a series of examples about how as scientists or indeed as engineers we try to simplify our observations. We don't keep records of everything that happens at the finest possible detail. Instead what we do very often is if we have some data at high resolution either out there to be gathered or already on our hard drives, what we're often interested in doing is simplifying it in some way and I give you a generic term for this which we call coarse-graining. In particular, what I did for the case of the etching of Alice and her kitten Dinah is show you different coarse-graining prescriptions, how you could take that image and strategically throw out information, reduce the amount of information you're keeping about the original image and still if the coarse- graining prescription is chosen well, still keep some kind of sense of what's happening in the system itself, what's happening in this image, and so ok, maybe you're not quite sure what kind of animal Alice is playing with at this point but at least you know that she's playing with some kind of animal. We in fact gave these three coarse-graining prescriptions. The first one was majority vote: I take a little square and I have all the pixels vote on what color, black or white, the output pixel should be and in particular in the example here I took a square that was 10 by 10 so I took a hundred data points and just in the majority vote assigned that either a 0 or 1 and made the pixel in other words 10 times larger and that's what happens when you do it here. You could also do something even simpler which is take that 10 by 10 grid and just have the pixel in the upper right-hand corner sort of dictate what the final coarse- grained pixel is going to be. The final example I gave you we only talked about it we didn't really show the mathematics for how this happens was the compression algorithm in actual use in the real world and that's the JPEG. What the JPEG does in fact is throw out information that tells you about the very fine scale oscillations in the data, the patterns on the back of the armchair for example and throw those out when it thinks those are going to be undetectable to the human eye while at the same time keeping the overall longer scale fluctuations in the image the difference between let's say where the light is and where the shadow is. It does that through a Fourier transform and essentially all it really does is just chop off the high frequency components in a way that engineers have decided is at least not entirely visually unpleasing. So coarse-graining is essential part of renormalization but it's not the whole story in fact it's only half, because as scientists what we do is not just gather data and we don't just simplify it. Instead what we do is we tend to build models of that data so let's take the highest resolution that you can imagine for a particular system and take the model that you think best predicts or describes or explains the data at that high resolution level. Then we can ask the obvious question: What model best describes or predicts or explains the data at the coarse-grained level and what's the relationship between those two models the model that describes everything and the model that describes something. The entire story of renormalization is the relationship between what happens when you coarse-grain the data and what happens when you look at the underlying structures of the models that that coarse-graining demands. Surprisingly as we'll see sometimes when you course- grain the data the models that you need get more complicated. Sometimes conversely they simplify and it'll be that kind of process that we'll study now. In order to describe the normalization you'll see of course that I have to tell you not just what is happening to the data but also what's happening to the model so I have to give you an example not just of some data that we're coarse-graining but of a model that describes it. The model that we'll use is the Markov chain. So how do Markov chains work, what do they operate on, what are they supposed to describe or explain or predict? In general Markov chains describe time series, a series of observations that unfold that evolve or unfold one moment after another. The simplest case is when each observation is a symbolic unit so you know an A or B or C, when the stock went up or the stock went down, that the person said this word or that word and More complicated story is that each of those observations could be a continuous number like a temperature or a field the value of let's say of the electric field at a certain point in time at a certain point in space. Here we'll just deal with symbolic time series because they're much easier to handle at least when the number of symbols is small. So that's our basic idea that's our fine-grained data and then we're going to imagine coarse- graining that to produce a lower resolution time series. Now of course that's not the only way to coarse-grain a time series. You could also imagine course-graining each symbol so let's say each point in time you chose from a set of 10 symbols. One way to coarse-grain that time series is instead of choosing from a set of 10 you map those 10 symbols to either a symbol A or a symbol B that kind of course-graining something we'll see a little bit later you might think of it as a projection you're projecting down the state space. Here we'll do something a little simpler. Imagine for example the time series was gathered at intervals of one second. Now what we're going to imagine is that you know you didn't have expensive enough equipment or your hard drive got full so instead of keeping every second of the evolution instead what you did was let's say kept every other second or every third second. In other words you took a block of the time series and you decimated it. You just took the first observation within each block over time sort of a one-dimensional version of how we coarse-grained the sketch of Alice by John Tenniel. All right so that's the data we'll be operating on. The Markov chain provides you a model of how our time series is produced and then what we're going to see is what happens to those models as you ask them to describe or predict or explain the lower resolution version. We'll compare one model to another as they operate on different kinds of data. So, here's a Markov chain. On the left-hand side what I've shown you is a depiction of the model I'll tell you how to read the representation in a second. On the right hand side there's a sample bit of data that it might predict or explain. In this case what I did was I just ran the model itself because it's a nice way it's a generative one it will tell you what's going to happen next and so I can just start somewhere and allow the model to produce a simulated time series. Markov chains are stochastic so in any particular case it will produce generically a different sample run. On the right-hand side what you have is just one example of the kind of data a Markov chain could produce, conversely the kind of data that a Markov chain could describe or predict. So the left-hand side shows the Markov chain itself. What you see there are three nodes A, B and C and it's simple enough when the system is in state A it emits the symbol A when it's in state B it emits the symbol B and when it's the state C it emits the symbol C and then upon emitting that symbol it makes a transition. It jumps to one of the other states and the probability of jumping to each of those other states is dictated by the model itself and in fact I represented that here by the arrows. So let's say you begin in state A. With 90 percent probability you jump to state B. Say you're in state C, with 30 percent probability you stay in state C, with 70 percent probability you jump to state A. Specifying those transition probabilities corresponds to specifying the entire set of free parameters for the Markov chain. Once you tell me the transition probabilities you've told me everything I need to know about the underlying model. Markov chains are fun but it's also important to realize how limited they are. For example if I am at the B and then I'm at the C what happens next does not depend upon the fact that I emitted a B. When I'm in state C it doesn't matter how I got there. Similarly if I'm in state A it doesn't matter how I got there. Whatever I do is conditioned entirely upon what's happening at the current time step. There's no nonlocality in the Markov chain is another way to put it or you can think of it as a system with no memory. If I'm in state B that defines everything about what's going to happen next. And so you can see how to read this sample run. In the sample run I begin in state B and in fact deterministically I know it has to happen next and go to state C. In the sample run when I ran this Markov chain forward it chose randomly to stay in state C which will happen 30% of the time so it emitted symbol B and C then it stayed in C and then in fact on the next time step to jump to A and you can see that system as it evolved from one moment to the next. So now let's take the observable data associated with that Markov chain or one instantiation of that data associated the Markov chain and coarse-grain it using this decimation transformation and in particular what I'm going to do is just take blocks of two observations in a row and take out the second one or conversely I'm going to coarse-grain that block of two observations to a single observation and I'll have it dictated by the value of the first point in that block. So in this case I've used blocks of size 2. I have coarse-grained the data at let's say a two-second time scale. It's easy enough to build the Markov chain associated with that coarse-grain run and in particular let's just walk through the image we have here to see that the arrows make sense. In the next section I'll tell you how to derive these mathematically but for now it's a useful exercise to see if we can understand the relationship between the one step in the two step model. Notice that in the one step model it's impossible for the system to jump from B to A. But in fact in the coarse grained system it's in fact very likely that if you see a B the next symbol you'll see will be an A. It happens 70 percent of the time and you can see if you look at the coarse-grained run it's pretty easy to find cases where you have B and then A. In fact, on the first line I see it once and on the second line I see it three times BA BA CB A. But in the sample right at the finer grain time scale that's impossible and of course that's impossible because when you're in state B you deterministically go to state C. At the coarse-grained timescale though it's pretty easy to go from B to A. You go by AC but that C step is of course unobservable. It's coarse-grained away. And in fact in that case if you start in state B 100% of the time you go to state C. And so for it to show up as an A at the next time step all that has to happen is that when you're in state C you have to jump to A and that happens 70 percent of the time. And so what I've done there in a sort of laborious way is tell you exactly why the arrow from B to A is labeled with a 70% probability. Of course that's just when you coarse-grade from the fine-grained time scale to blocks of two you can just as easily imagine coarse-graining that system from blocks up to two blocks of three, to blocks of four and so forth and so if we look here and how this model evolves over time we're basically telling the parallel story to how the data is being coarse-grained we coarse- grain in blocks of two steps or we coarse- grain in blocks of three steps. As you can see the model changes and it's a little bit hard if you look at it to see exactly the logic of that change. When you go from one step to two step in some senses the model maybe looks like it gets a bit more complicated. All I mean is that there are some transitions that now become possible that were impossible before. You become a little less certain about what's going to happen. Except then when I go to three steps some of those transitions go away again so in fact I become a little bit more certain now. So what is the limit of this process? What happens when we start coarse- graining not on the two step but with a three step time scale, when we continue this and coarse-grain on increasingly longer and longer time scales? It turns out that models tend to flow to what we'll call fixed points. In other words as you go from one coarse-graining level to the next, the corresponding model tends to change less and less, at least if you wait long enough. They tend to converge on a single model as the coarse- graining timescale gets longer and longer and longer. And in fact not only does every Markov chain converge to a particular sort of limit case Markov chain. In fact the Markov chains that the Markov chain that it converges to has a particularly simple structure. Here it is. Here's the limit point. If you take that one step Markov chain and ask what happens if you coarse- grain the data on really long time scales. If you look at the transition probabilities you'll notice that every state has incoming arrows all with the same weight. The chance of going to a C is independent upon which state you were in previously. It's always in this limit it's always 40%. Similarly if you're transitioning to state A it doesn't matter where you come from. The probability is always the same 31 percent. If you think about how many parameters you need to describe the model at the one-step case, well, you have three states, each state has three transition probabilities but since probabilities have to sum to one in fact for each state you only have two free parameters. So you have three states, two free parameters each. There's six parameters that describe an arbitrary Markov model on three states. However, if the pattern holds and it does for nearly every Markov chain that you can write down. Not everyone just nearly everyone. But if that pattern holds what that means is that when you coarse-grain on sufficiently long time scales you only in fact need two parameters. It's a remarkable compression. In other words not just of the data but of the model space. You begin in this sort of six dimensional manifold but if you coarse- grain the data enough the models corresponding to that coarse-graining all flow to this very low-dimensional, two- dimensional in fact, manifold, where the model is described simply by the incoming probabilities to each of those three states and since they all have to be the same the chance of you going to C is the same no matter where you begin and the chance of you going B be is the same no matter where you begin and since the probabilities outgoing from each state have to be one even though there are three probabilities to specify only two of them are needed. The third one is just one minus the sum of the other two. In the next module I'll show you how to make these computations exact but for now this is our first example of how theories change when the data you ask them to describe is coarse-grained.