So, can we say anything general about that process? The answer turns out to be: Yes. And what we're going to do is represent this matrix not in the form that you're familiar with the ϵ and 1 - ϵ form where the column vector refers to the probability of being in state A or the probability of being in state B instead what we're going to do is diagonalize this matrix so if the matrix is T what we're going to do is find its eigenvectors and the eigenvectors of the matrix T are defined by T acting on let's say the first eigenvector gives you back the first eigenvector potentially times some constant λ, so this is the eigenvector of the transition matrix or the evolution operator T and this is the eigenvalue. A matrix like this generically has two eigenvectors and two eigenvalues and so we can write the eigenvectors as v₁ and v₂. And now what i'm going to do is for an arbitrary probability distribution over states A and B. I'm going to represent that as a sum, as a weighted sum over these two eigenvectors, so we'll call that α₁v₁ + α₂v₂. And so if I now make a column vector that represents that probability in eigenvector space what I'll have is an α₁ and α₂ like this. So if I have diagonalized that matrix now when it acts upon this transform space it looks it takes a very simple form, and in fact, I'll write out the exact form it takes here the matrix now become diagonal and when it acts on the diagonalized version of the probability vector as you would expect given the nature of the eigenvectors itself It turns into a scaled version of each of these two entries. So what I've done here is go from this way of representing the Markov chain to this diagonalized representation here which we can call T hat and T hat now acts not on the probability of being in states A and B but instead on the amount of probability that you have in the first eigenvector pattern and the amount of probability you have in the second eigenvector pattern. What you can see here is that this matrix, the two eigenvalues it has, one of these eigenvalues is equal to 1. And in fact it turns out from a theory that you can prove in linear algebra that any stochastic matrix will have as its largest eigenvalue something whose absolute value is equal to 1. In fact sometimes this eigenvalue here can be negative one. Sometimes it can even be imaginary. Generically for most ordinary stochastic matrices including the one we're going to look at here and the conditions on that are somewhat technical we have to have irreducibility [or] ergodicity, but in general what you'll have is that the largest eigenvalue for stochastic matrix is equal to unity. All the other eigenvalues will have absolute values that are smaller than 1. So notice here as long as ϵ as long as ϵ is less than unity right, as long as there's some probability for a transition between A and B this term here will have an absolute value less than 1. Notice that if ϵ is less than ½ in fact the second eigenvalue will be negative, but it still in its absolute value will be less than 1. So now we've reduced the problem of taking T to the N to the problem of taking T hat to the N where T hat is 1 on the diagonal and then 2 epsilon minus 1 here and that is now acting on this transformed representation of the probabilities of being in state A and B. If we raise this here to the power of N this matrix takes a very simple form, this first diagonal term of course 1 to the power of n is just 1 the off diagonal terms remain 0 and now you have 2ϵ - 1 to the power of n again acting on the transformed representation. As n gets very large this term here because it is less than 1 and absolute value will continue to get smaller and smaller so for example let's take the simple case where epsilon is equal to let's say ¼. In that case along the diagonal you'll have a 1 up here and a negative ½ to the n here and when you act upon an arbitrary column vector α₁ α₂ this will turn into α₁ will remain unchanged but α₂ will be down by a very large factor so let's say if n is a thousand then 1 over 2 to the power of a thousand is down by a factor of a million or so. There will be this minus sign here So depending upon whether n is even or odd it will be a very tiny positive number or a very tiny negative number but in general as you take increasingly large powers of this T You will find that no matter what you put into the system, no matter what combination of α₁ and α₂ you put into the system out the other side you will get a vector that is pure α₁. Another way to say this using the language of the probability of being in each of these states is that if you coarse-grain your data enough the corresponding model will take any probability distribution and map it directly to that first eigenvector of the system. That first eigenvector has a special name. It's called the stationary distribution. It's called the stationary distribution of the original stochastic matrix. And so now we know that no matter what you put in. Let's say if you begin entirely in state A if the system has been coarse-grained enough, after one coarse-grain time step using the evolution operator T² or rather T to the n as n gets very large. At the next time step you will go to a unique probability distribution given by v₁ and in fact if you begin in state B, you will also be taken to the identical probability distribution v₁. And so in the end the matrix will take the following form. If you begin in state A you have some probability called P(A) of staying in state A. And you have some probability of ending up and state B called P(B) and then we know, of course, that that's equal to 1 - P(A) Similarly, if you're in state B you have some probability to stay in state B and you have some probability to end up back in state A and, again, we can write the probability of being in state B as 1 minus the probability of being in state A so What the simplification means is that no matter what probabilities for transitions you put in here and in particular we are free at short time scales to put arbitrary probabilities for the self loop for A and arbitrary probabilities for the self loop for B. If you begin with this description on short time scales, if you renormalize enough, if you coarse-grain the observations enough and ask what the corresponding model is you find in fact that every model can be described by a single parameter P(A). You move from models that exist in a 2-dimensional space where I have to tell you the self-loop probabilities for each of these individual states. If you coarse-grain enough that takes you to a set of models that can be described by a single probability, P(A) and so in fact one way to imagine the limit of this coarse-graining process how all of these different Markov chains flow is in a following diagram which we call, sometimes call, phase diagram for this system, which I can draw like this. On the X axis we have the self-loop probability for the A state and on the y-axis we have the self-loop probability for the B state So every point in this space refers to a distinct Markov chain model. All of these models actually flow to a single line on this plane. All the points on this line here have the potential to be fixed points that as you coarse-grain the system further and further they all end up on this line where the probability of the self-loop for the A state is equal to 1 minus the probability for the self-loop for the B state and the way to think about this is P(A) given A when you coarse-grain enough just becomes the stationary probability, the stationary distribution for that Markov chain P(A) the self-loop probability for B just becomes the probability for B. And we know that that now has to be equal to 1 - P(A) for that stationary distribution. So what we find is that as you continue to renormalize this system as you continue to coarse-grain and ask what the corresponding model is you get driven towards this line here and as you get driven to that line here you flow to some what we call fixed point where you get closer and closer to the limit of T to the n as n goes to infinity. So that's a more formal, still somewhat cartoonish story that I've told the reason I say it's cartoonish is that I haven't told you the exact conditions for this flow to work. Implicitly what you have to have is that first eigenvalue has to be equal to one and there can be no other eigenvalues that also have absolute value equal to one. So for example a model that does not flow model that does not flow to somewhere on this line is one where the system continually oscillates back and forth. In that case the self-loop probability for A is zero, the self-loop probability for B is zero. But no matter how many times you multiply the system by itself it never kind of blurs out and gives you a stationary distribution. I haven't told you the exact conditions that tell you that some of these models are ruled out sort of on the corners of this space. But basically if you're somewhere in the interior you always flow to a unique fixed point under the coarse-graining operation.