So, in the previous section of this module
we delve a little bit deeper into the mathematics
of how transition matrices like these
renormalize when you coarse-grain the data,
what does the accompanying model do.
And what we found is a kind of phase diagram here
for the transition matricies
and in fact what I've done is produced a general phase diagram
that's good for any two-state Markov chain.
And what we find is that in general,
modulo some slightly tricky
things to do with the case
where the first eigenvalue is imaginary,
but modulo these kind of tricky cases,
if you begin with a model anywhere in this space
you eventually flow to some point on
this diagonal line here
and on that diagonal line every update you do
no matter where you begin
no matter what your initial distribution is
one update will take you directly
to the stationary distribution which is
now defined not by two parameters,
but in fact by only one,
you go from a two-dimensional manifold
down to a one-dimensional manifold
and stay there.
You don't flow once you hit this line.
You don't kind of move back or forth.
It's sort of a line of fixed points
for this renormalization transformation.
So, in this final section,
somewhat optional,
I'd like to ask kind of fun question.
So, we begin at this point here
and let's take this matrix here where
the Epsilon's are equal to each other,
they lie somewhere along that line,
where the self-loop probability for A
is equal to the self-loop probability for B.
What happened or where
did that matrix come from?
Is it possible to find another matrix
that works at a finer time scale
than you thought could have
been there in the first place?
Another way to think of this is that
you've gathered some data
in order to get this model here.
But what if you were gathering
the data not rapidly enough
to hit the actual sort of fundamental clock of the system.
What if in fact your data was already
coarse-grained and you could imagine this?
Imagine that you built this model
here off of some data,
then your collaborator calls you up
and say: I got great news,
we have this new high resolution
time-sensing device
that enables us to actually measure not
every 10 minutes, but in fact every minute
Well great, okay, what kind of model
could T have been generated by?
The mathematical way to answer this
question or to ask this question
is to say can we find a transition matrix A
such that when you square it you get the transition matrix T,
and this is under the assumption that your collaborator has come up
with a resolution device now that can split
twice as fast as it originally could.
So let's find it. Let's assume T
is given by this matrix form here
and then all we have to do is figure out
All we have to do now is figure out
what A and B are in this new transition matrix here.
We have to square this matrix,
set it equal to T and solve for A and B
and in fact I'll write this
in a slightly better fashion
so that A and B become
the self-loop probabilities
for the two states.
So, our challenge becomes
find A and B.
Okay, let's do it.
Here is our problem. We're going to
square A or multiply A by itself.
And we're going to set that equal
at the end to the transition matrix T.
So if we do that multiplication
we find in the top corner here
we have a² + 1 - B (1 minus a).
And let's say down to the right
we have "a" times "1 - b" plus "b" times "1 - b".
Down here we have
"a" times "1 - a" plus "b" times "1 - a"
and in the far lower right-hand corner
we have
"1 - a" times "1 - b" plus "b" squared.
This we're going to set equal to our original
and much-beloved transition matrix T.
So
We'll leave the full solution
to the homework,
but I'll give you a little clue on how to do it
which is that in fact A and B have to be equal.
If A and B were not equal
we wouldn't be able to satisfy
this set of equations here.
And so once I tell you that A and B
are equal, then I can say
Ok, let's try and solve
part of this problem
by setting that term there equal to ϵ.
So, what happens we have a squared
plus 1 minus a squared
because I've told you that "a" is equal to "b",
you can prove it yourself if you're going
to be distressed by what is about to happen.
So I have a² + (1-a)² so that's
a² + 1 - 2a + a² = ϵ
Using my amazing algebra skills we have
2a² - 2a + (1 - ϵ) = 0
And now we use our quadratic equation
skills to solve for a.
We have negative b plus or minus
root b squared that's 4
minus 4 times a times c
which is one minus epsilon all over
2 times the square term coefficient
which is four.
If I simplify that I get one half plus or minus
And now I can pull out the factor of two here
So we have one half square root one minus
2 times (1 - ϵ).
Okay, so far so good.
What I'd like you to notice is what
happens when let's say epsilon is equal to ¼.
In this case under the square root sign here
we have the square root of one minus two
times three over four which is equal to
one minus three over two which is
equal to minus one half
in other words the only way to satisfy
the equation A² = T
is if some of the components of this matrix here
have imaginary or generally complex entries.
But if A has complex entries then it
corresponds or rather cannot correspond to any
natural stochastic matrix because we
interpret the entries of A
in fact as transition probabilities, and now we're
being asked to imagine
that some of those transition probabilities are equal to things
like ½ plus or minus ½ square root of the negative of minus ½
So in fact what we found is that T is not
or rather when epsilon is in fact as you can work out
when epsilon is less than 1/2
that matrix T could not have been generated by a
coarse-graining of any finer scaled two-state system.
So where did T come from?
Well, one way to see where the problem is is the following.
If we imagine that the state or the system
is a good counter
so that it oscillates ABABABAB
and only very rarely slips and produces a BB
or AA or even more rarely a
triple A or triple B
If it's a good counter meaning epsilon is small
What would this look like if we expanded it
at the finer timescale?
In that case we'd have A ? B ? A ? B ? A
and we'd have to fill in this gap here.
But if we only had two states,
if we only had the possibility of it filling in
with either A or B
let's imagine for example
we filled that question mark at that point with A.
In that case at time t₁
the system was in state A
and jumped directly to A
and then at time t plus ½
the finer timescale, it jumped from state A
directly to state B.
But if it's doing this deterministically,
then how can it know that at this time step
it's in A and should jump to A
and at this time step or at this time step here
it's an A again,
but this time it should jump to B?
Markov chains are memoryless, the only thing that
determines what a Markov chain will do next
is the state that it happens to be in.
And again, you can imagine filling this state in here with B.
And now you have the same problem,
because here the state goes from A
jumps to B, very good.
Okay, here the state goes to B,
from B it jumps to B, okay good.
Now if the state deterministically jumps from B to B,
we know this one here has to be B.
But then we have a problem because we have to explain
how the system somehow knows that
this B is different from the last B?
How the system knows that okay, two ticks have gone
by and now I have to jump to A?
Something has gone wrong.
In order to get T
you have to derive it from an A with more memory.
The clock is ticking back and forth,
but now it stays for half a time step in one
of the ticks and the system has to remember
okay, now it's time to go
If epsilon is too small this process here is too reliable
and it can't do it just by sort of stochastically
deciding to jump from B to A instead of B to B.
Can we find another solution,
can we find an A that will work for T?
Well, we know the one thing we're
going to probably have to do
is give A more states than T.
And so here is a solution that will allow you,
I will show how a Markov chain can coarse-grain
all the way down to the T matrix you originally saw.
So it's going to be a little tricky here.
I'm going to draw, instead of a two-state system
i'm going to draw a four state system,
and now there are states A, B, C and D.
The transitions will look like this:
A goes to B with probability 1 minus epsilon,
B always goes to C,
C can occasionally slip back to B,
C usually goes to D,
D goes deterministically to A,
and A goes back to D
with only some tiny probability.
So, what do we have here?
We have a hidden Markov model
but now it's over 4 states and not just 2 states
what I'm going to do is draw what happens
if you take this where the matrix represented
by this and square it
and what we find is the following.
The system actually splits up
like so. In fact, if you're in state A
after two time steps of this evolution
by this A matrix operator here,
you jump to see with probability one minus epsilon
and you stay in A with probability epsilon.
It's also true that if you happen to be in state D,
you jump to B with probability one minus epsilon,
you stay in D with probability epsilon.
This is now equivalent to the original operator,
the original Model T
as long as we do the following coarse-graining of the states.
If we say A and D are both equivalent to state A
in the observations
and C and B are both equivalent to state B
in the coarse-grained observations,
all of a sudden A², this matrix here, reduces down to the original
evolution matrix we had
for the slippy AB counter.
The big lesson we get out of this
is that sometimes in fact
a matrix or a model in general
could have been generated by something
outside of the original model space.
When you coarse-grain the data and get a new model
the model that you get may be a different,
fundamentally different kind of model
than the one that you began with.
In this case here it began with a 4-state model
and coarse-grained it down to a two state model
looking at it in reverse
you say that the 2-state model, that T transition,
the slippy AB counter,
if the counter is reliable enough or
epsilon is in fact less than a half
then this construction here will work.
But what you have to do is you have to assume
that the finer-grained data was created
by a process with more memory,
right, four states
instead of just two and you can see
the deterministic transitions here
allow you to count whether or not it's time
to jump out of the A or the C state yet.