So, in the previous section of this module

we delve a little bit deeper into the mathematics

of how transition matrices like these

renormalize when you coarse-grain the data,

what does the accompanying model do.

And what we found is a kind of phase diagram here

for the transition matricies

and in fact what I've done is produced a general phase diagram

that's good for any two-state Markov chain.

And what we find is that in general,

modulo some slightly tricky
things to do with the case

where the first eigenvalue is imaginary,

but modulo these kind of tricky cases,

if you begin with a model anywhere in this space

you eventually flow to some point on
this diagonal line here

and on that diagonal line every update you do

no matter where you begin
no matter what your initial distribution is

one update will take you directly

to the stationary distribution which is
now defined not by two parameters,

but in fact by only one,

you go from a two-dimensional manifold
down to a one-dimensional manifold

and stay there.

You don't flow once you hit this line.
You don't kind of move back or forth.

It's sort of a line of fixed points
for this renormalization transformation.

So, in this final section,
somewhat optional,

I'd like to ask kind of fun question.

So, we begin at this point here

and let's take this matrix here where
the Epsilon's are equal to each other,

they lie somewhere along that line,

where the self-loop probability for A
is equal to the self-loop probability for B.

What happened or where
did that matrix come from?

Is it possible to find another matrix
that works at a finer time scale

than you thought could have
been there in the first place?

Another way to think of this is that
you've gathered some data

in order to get this model here.

But what if you were gathering
the data not rapidly enough

to hit the actual sort of fundamental clock of the system.

What if in fact your data was already
coarse-grained and you could imagine this?

Imagine that you built this model
here off of some data,

then your collaborator calls you up
and say: I got great news,

we have this new high resolution
time-sensing device

that enables us to actually measure not
every 10 minutes, but in fact every minute

Well great, okay, what kind of model

could T have been generated by?

The mathematical way to answer this
question or to ask this question

is to say can we find a transition matrix A

such that when you square it you get the transition matrix T,

and this is under the assumption that your collaborator has come up

with a resolution device now that can split

twice as fast as it originally could.

So let's find it. Let's assume T
is given by this matrix form here

and then all we have to do is figure out

All we have to do now is figure out
what A and B are in this new transition matrix here.

We have to square this matrix,
set it equal to T and solve for A and B

and in fact I'll write this
in a slightly better fashion

so that A and B become
the self-loop probabilities

for the two states.

So, our challenge becomes
find A and B.

Okay, let's do it.

Here is our problem. We're going to
square A or multiply A by itself.

And we're going to set that equal
at the end to the transition matrix T.

So if we do that multiplication
we find in the top corner here

we have a² + 1 - B (1 minus a).

And let's say down to the right

we have "a" times "1 - b" plus "b" times "1 - b".

Down here we have
"a" times "1 - a" plus "b" times "1 - a"

and in the far lower right-hand corner

we have
"1 - a" times "1 - b" plus "b" squared.

This we're going to set equal to our original
and much-beloved transition matrix T.

So

We'll leave the full solution
to the homework,

but I'll give you a little clue on how to do it

which is that in fact A and B have to be equal.

If A and B were not equal
we wouldn't be able to satisfy

this set of equations here.

And so once I tell you that A and B
are equal, then I can say

Ok, let's try and solve
part of this problem

by setting that term there equal to ϵ.

So, what happens we have a squared
plus 1 minus a squared

because I've told you that "a" is equal to "b",

you can prove it yourself if you're going
to be distressed by what is about to happen.

So I have a² + (1-a)² so that's
a² + 1 - 2a + a² = ϵ

Using my amazing algebra skills we have
2a² - 2a + (1 - ϵ) = 0

And now we use our quadratic equation
skills to solve for a.

We have negative b plus or minus
root b squared that's 4

minus 4 times a times c

which is one minus epsilon all over
2 times the square term coefficient

which is four.

If I simplify that I get one half plus or minus

And now I can pull out the factor of two here

So we have one half square root one minus
2 times (1 - ϵ).

Okay, so far so good.

What I'd like you to notice is what
happens when let's say epsilon is equal to ¼.

In this case under the square root sign here

we have the square root of one minus two
times three over four which is equal to

one minus three over two which is
equal to minus one half

in other words the only way to satisfy
the equation A² = T

is if some of the components of this matrix here

have imaginary or generally complex entries.

But if A has complex entries then it
corresponds or rather cannot correspond to any

natural stochastic matrix because we
interpret the entries of A

in fact as transition probabilities, and now we're
being asked to imagine

that some of those transition probabilities are equal to things

like ½ plus or minus ½ square root of the negative of minus ½

So in fact what we found is that T is not

or rather when epsilon is in fact as you can work out

when epsilon is less than 1/2

that matrix T could not have been generated by a
coarse-graining of any finer scaled two-state system.

So where did T come from?

Well, one way to see where the problem is is the following.

If we imagine that the state or the system
is a good counter

so that it oscillates ABABABAB

and only very rarely slips and produces a BB
or AA or even more rarely a

triple A or triple B

If it's a good counter meaning epsilon is small

What would this look like if we expanded it
at the finer timescale?

In that case we'd have A ? B ? A ? B ? A

and we'd have to fill in this gap here.

But if we only had two states,

if we only had the possibility of it filling in
with either A or B

let's imagine for example

we filled that question mark at that point with A.

In that case at time t₁

the system was in state A
and jumped directly to A

and then at time t plus ½

the finer timescale, it jumped from state A
directly to state B.

But if it's doing this deterministically,

then how can it know that at this time step

it's in A and should jump to A

and at this time step or at this time step here
it's an A again,

but this time it should jump to B?

Markov chains are memoryless, the only thing that
determines what a Markov chain will do next

is the state that it happens to be in.

And again, you can imagine filling this state in here with B.

And now you have the same problem,
because here the state goes from A

jumps to B, very good.

Okay, here the state goes to B,
from B it jumps to B, okay good.

Now if the state deterministically jumps from B to B,
we know this one here has to be B.

But then we have a problem because we have to explain

how the system somehow knows that
this B is different from the last B?

How the system knows that okay, two ticks have gone
by and now I have to jump to A?

Something has gone wrong.
In order to get T

you have to derive it from an A with more memory.

The clock is ticking back and forth,

but now it stays for half a time step in one
of the ticks and the system has to remember

okay, now it's time to go

If epsilon is too small this process here is too reliable

and it can't do it just by sort of stochastically
deciding to jump from B to A instead of B to B.

Can we find another solution,
can we find an A that will work for T?

Well, we know the one thing we're
going to probably have to do

is give A more states than T.

And so here is a solution that will allow you,

I will show how a Markov chain can coarse-grain

all the way down to the T matrix you originally saw.

So it's going to be a little tricky here.

I'm going to draw, instead of a two-state system

i'm going to draw a four state system,

and now there are states A, B, C and D.

The transitions will look like this:

A goes to B with probability 1 minus epsilon,

B always goes to C,

C can occasionally slip back to B,

C usually goes to D,

D goes deterministically to A,

and A goes back to D
with only some tiny probability.

So, what do we have here?

We have a hidden Markov model

but now it's over 4 states and not just 2 states

what I'm going to do is draw what happens

if you take this where the matrix represented

by this and square it

and what we find is the following.

The system actually splits up

like so. In fact, if you're in state A

after two time steps of this evolution
by this A matrix operator here,

you jump to see with probability one minus epsilon
and you stay in A with probability epsilon.

It's also true that if you happen to be in state D,

you jump to B with probability one minus epsilon,
you stay in D with probability epsilon.

This is now equivalent to the original operator,
the original Model T

as long as we do the following coarse-graining of the states.

If we say A and D are both equivalent to state A
in the observations

and C and B are both equivalent to state B

in the coarse-grained observations,

all of a sudden A², this matrix here, reduces down to the original

evolution matrix we had

for the slippy AB counter.

The big lesson we get out of this
is that sometimes in fact

a matrix or a model in general

could have been generated by something
outside of the original model space.

When you coarse-grain the data and get a new model

the model that you get may be a different,

fundamentally different kind of model
than the one that you began with.

In this case here it began with a 4-state model

and coarse-grained it down to a two state model

looking at it in reverse

you say that the 2-state model, that T transition,
the slippy AB counter,

if the counter is reliable enough or
epsilon is in fact less than a half

then this construction here will work.

But what you have to do is you have to assume

that the finer-grained data was created
by a process with more memory,

right, four states

instead of just two and you can see
the deterministic transitions here

allow you to count whether or not it's time
to jump out of the A or the C state yet.