So, in the previous section of this module we delve a little bit deeper into the mathematics of how transition matrices like these renormalize when you coarse-grain the data, what does the accompanying model do. And what we found is a kind of phase diagram here for the transition matricies and in fact what I've done is produced a general phase diagram that's good for any two-state Markov chain. And what we find is that in general, modulo some slightly tricky things to do with the case where the first eigenvalue is imaginary, but modulo these kind of tricky cases, if you begin with a model anywhere in this space you eventually flow to some point on this diagonal line here and on that diagonal line every update you do no matter where you begin no matter what your initial distribution is one update will take you directly to the stationary distribution which is now defined not by two parameters, but in fact by only one, you go from a two-dimensional manifold down to a one-dimensional manifold and stay there. You don't flow once you hit this line. You don't kind of move back or forth. It's sort of a line of fixed points for this renormalization transformation. So, in this final section, somewhat optional, I'd like to ask kind of fun question. So, we begin at this point here and let's take this matrix here where the Epsilon's are equal to each other, they lie somewhere along that line, where the self-loop probability for A is equal to the self-loop probability for B. What happened or where did that matrix come from? Is it possible to find another matrix that works at a finer time scale than you thought could have been there in the first place? Another way to think of this is that you've gathered some data in order to get this model here. But what if you were gathering the data not rapidly enough to hit the actual sort of fundamental clock of the system. What if in fact your data was already coarse-grained and you could imagine this? Imagine that you built this model here off of some data, then your collaborator calls you up and say: I got great news, we have this new high resolution time-sensing device that enables us to actually measure not every 10 minutes, but in fact every minute Well great, okay, what kind of model could T have been generated by? The mathematical way to answer this question or to ask this question is to say can we find a transition matrix A such that when you square it you get the transition matrix T, and this is under the assumption that your collaborator has come up with a resolution device now that can split twice as fast as it originally could. So let's find it. Let's assume T is given by this matrix form here and then all we have to do is figure out All we have to do now is figure out what A and B are in this new transition matrix here. We have to square this matrix, set it equal to T and solve for A and B and in fact I'll write this in a slightly better fashion so that A and B become the self-loop probabilities for the two states. So, our challenge becomes find A and B. Okay, let's do it. Here is our problem. We're going to square A or multiply A by itself. And we're going to set that equal at the end to the transition matrix T. So if we do that multiplication we find in the top corner here we have a² + 1 - B (1 minus a). And let's say down to the right we have "a" times "1 - b" plus "b" times "1 - b". Down here we have "a" times "1 - a" plus "b" times "1 - a" and in the far lower right-hand corner we have "1 - a" times "1 - b" plus "b" squared. This we're going to set equal to our original and much-beloved transition matrix T. So We'll leave the full solution to the homework, but I'll give you a little clue on how to do it which is that in fact A and B have to be equal. If A and B were not equal we wouldn't be able to satisfy this set of equations here. And so once I tell you that A and B are equal, then I can say Ok, let's try and solve part of this problem by setting that term there equal to ϵ. So, what happens we have a squared plus 1 minus a squared because I've told you that "a" is equal to "b", you can prove it yourself if you're going to be distressed by what is about to happen. So I have a² + (1-a)² so that's a² + 1 - 2a + a² = ϵ Using my amazing algebra skills we have 2a² - 2a + (1 - ϵ) = 0 And now we use our quadratic equation skills to solve for a. We have negative b plus or minus root b squared that's 4 minus 4 times a times c which is one minus epsilon all over 2 times the square term coefficient which is four. If I simplify that I get one half plus or minus And now I can pull out the factor of two here So we have one half square root one minus 2 times (1 - ϵ). Okay, so far so good. What I'd like you to notice is what happens when let's say epsilon is equal to ¼. In this case under the square root sign here we have the square root of one minus two times three over four which is equal to one minus three over two which is equal to minus one half in other words the only way to satisfy the equation A² = T is if some of the components of this matrix here have imaginary or generally complex entries. But if A has complex entries then it corresponds or rather cannot correspond to any natural stochastic matrix because we interpret the entries of A in fact as transition probabilities, and now we're being asked to imagine that some of those transition probabilities are equal to things like ½ plus or minus ½ square root of the negative of minus ½ So in fact what we found is that T is not or rather when epsilon is in fact as you can work out when epsilon is less than 1/2 that matrix T could not have been generated by a coarse-graining of any finer scaled two-state system. So where did T come from? Well, one way to see where the problem is is the following. If we imagine that the state or the system is a good counter so that it oscillates ABABABAB and only very rarely slips and produces a BB or AA or even more rarely a triple A or triple B If it's a good counter meaning epsilon is small What would this look like if we expanded it at the finer timescale? In that case we'd have A ? B ? A ? B ? A and we'd have to fill in this gap here. But if we only had two states, if we only had the possibility of it filling in with either A or B let's imagine for example we filled that question mark at that point with A. In that case at time t₁ the system was in state A and jumped directly to A and then at time t plus ½ the finer timescale, it jumped from state A directly to state B. But if it's doing this deterministically, then how can it know that at this time step it's in A and should jump to A and at this time step or at this time step here it's an A again, but this time it should jump to B? Markov chains are memoryless, the only thing that determines what a Markov chain will do next is the state that it happens to be in. And again, you can imagine filling this state in here with B. And now you have the same problem, because here the state goes from A jumps to B, very good. Okay, here the state goes to B, from B it jumps to B, okay good. Now if the state deterministically jumps from B to B, we know this one here has to be B. But then we have a problem because we have to explain how the system somehow knows that this B is different from the last B? How the system knows that okay, two ticks have gone by and now I have to jump to A? Something has gone wrong. In order to get T you have to derive it from an A with more memory. The clock is ticking back and forth, but now it stays for half a time step in one of the ticks and the system has to remember okay, now it's time to go If epsilon is too small this process here is too reliable and it can't do it just by sort of stochastically deciding to jump from B to A instead of B to B. Can we find another solution, can we find an A that will work for T? Well, we know the one thing we're going to probably have to do is give A more states than T. And so here is a solution that will allow you, I will show how a Markov chain can coarse-grain all the way down to the T matrix you originally saw. So it's going to be a little tricky here. I'm going to draw, instead of a two-state system i'm going to draw a four state system, and now there are states A, B, C and D. The transitions will look like this: A goes to B with probability 1 minus epsilon, B always goes to C, C can occasionally slip back to B, C usually goes to D, D goes deterministically to A, and A goes back to D with only some tiny probability. So, what do we have here? We have a hidden Markov model but now it's over 4 states and not just 2 states what I'm going to do is draw what happens if you take this where the matrix represented by this and square it and what we find is the following. The system actually splits up like so. In fact, if you're in state A after two time steps of this evolution by this A matrix operator here, you jump to see with probability one minus epsilon and you stay in A with probability epsilon. It's also true that if you happen to be in state D, you jump to B with probability one minus epsilon, you stay in D with probability epsilon. This is now equivalent to the original operator, the original Model T as long as we do the following coarse-graining of the states. If we say A and D are both equivalent to state A in the observations and C and B are both equivalent to state B in the coarse-grained observations, all of a sudden A², this matrix here, reduces down to the original evolution matrix we had for the slippy AB counter. The big lesson we get out of this is that sometimes in fact a matrix or a model in general could have been generated by something outside of the original model space. When you coarse-grain the data and get a new model the model that you get may be a different, fundamentally different kind of model than the one that you began with. In this case here it began with a 4-state model and coarse-grained it down to a two state model looking at it in reverse you say that the 2-state model, that T transition, the slippy AB counter, if the counter is reliable enough or epsilon is in fact less than a half then this construction here will work. But what you have to do is you have to assume that the finer-grained data was created by a process with more memory, right, four states instead of just two and you can see the deterministic transitions here allow you to count whether or not it's time to jump out of the A or the C state yet.