1
00:00:08,710 --> 00:00:13,604
So, can we say anything
general about that process?
2
00:00:13,604 --> 00:00:15,734
The answer turns out to be: Yes.
3
00:00:16,869 --> 00:00:19,679
And what we're going to do
is represent this matrix
4
00:00:19,680 --> 00:00:22,159
not in the form
that you're familiar with
5
00:00:22,159 --> 00:00:25,044
the ϵ and 1 - ϵ form
6
00:00:25,044 --> 00:00:28,034
where the column vector refers to
7
00:00:28,034 --> 00:00:30,080
the probability of being in state A
8
00:00:30,080 --> 00:00:31,940
or the probability of being in state B
9
00:00:31,940 --> 00:00:35,000
instead what we're going to do is
diagonalize this matrix
10
00:00:35,000 --> 00:00:38,970
so if the matrix is T what we're
going to do is find its eigenvectors
11
00:00:38,970 --> 00:00:41,754
and the eigenvectors
of the matrix T are defined
12
00:00:41,754 --> 00:00:45,504
by T acting on let's say
the first eigenvector
13
00:00:45,670 --> 00:00:51,124
gives you back the first eigenvector
potentially times some constant λ,
14
00:00:51,124 --> 00:00:53,464
so this is the eigenvector
15
00:00:54,819 --> 00:00:58,249
of the transition matrix or
the evolution operator T
16
00:00:58,249 --> 00:01:00,439
and this is the eigenvalue.
17
00:01:01,629 --> 00:01:04,197
A matrix like this generically has two
18
00:01:05,349 --> 00:01:07,350
eigenvectors
19
00:01:07,600 --> 00:01:09,070
and two
20
00:01:09,070 --> 00:01:11,070
eigenvalues
21
00:01:11,680 --> 00:01:14,609
and so we can write the eigenvectors
22
00:01:14,609 --> 00:01:16,359
as v₁ and v₂.
23
00:01:16,364 --> 00:01:19,230
And now what i'm going to do is for an arbitrary
24
00:01:19,230 --> 00:01:23,004
probability distribution over states
A and B. I'm going to represent that
25
00:01:23,004 --> 00:01:24,394
as a sum,
26
00:01:24,789 --> 00:01:27,994
as a weighted sum over these two eigenvectors,
27
00:01:27,994 --> 00:01:33,214
so we'll call that α₁v₁ + α₂v₂.
28
00:01:34,750 --> 00:01:37,780
And so if I now make a column vector
29
00:01:37,780 --> 00:01:40,939
that represents that probability
in eigenvector space
30
00:01:40,939 --> 00:01:45,309
what I'll have is an α₁ and α₂ like this.
31
00:01:46,539 --> 00:01:50,999
So if I have diagonalized
that matrix now
32
00:01:50,999 --> 00:01:52,399
when it acts upon
33
00:01:52,780 --> 00:01:55,014
this transform space
34
00:01:55,014 --> 00:01:57,736
it looks it takes a very simple form,
35
00:01:57,736 --> 00:02:00,606
and in fact, I'll write out
the exact form it takes here
36
00:02:01,329 --> 00:02:04,169
the matrix now become diagonal
37
00:02:05,619 --> 00:02:07,619
and when it acts
38
00:02:08,830 --> 00:02:11,460
on the diagonalized version
of the probability vector
39
00:02:13,690 --> 00:02:15,464
as you would expect
40
00:02:15,464 --> 00:02:17,894
given the nature of
the eigenvectors itself
41
00:02:18,220 --> 00:02:20,220
It turns into
42
00:02:22,010 --> 00:02:25,360
a scaled version of
each of these two entries.
43
00:02:25,365 --> 00:02:27,275
So what I've done here is go from
44
00:02:28,190 --> 00:02:31,879
this way of representing
the Markov chain
45
00:02:31,879 --> 00:02:34,809
to this diagonalized representation here
46
00:02:34,809 --> 00:02:36,714
which we can call T hat
47
00:02:36,714 --> 00:02:40,874
and T hat now acts not on the
probability of being in states A and B
48
00:02:40,874 --> 00:02:43,689
but instead on the amount
49
00:02:43,700 --> 00:02:44,960
of probability that you have
50
00:02:44,960 --> 00:02:47,485
in the first eigenvector pattern
51
00:02:47,485 --> 00:02:51,125
and the amount of probability you have
in the second eigenvector pattern.
52
00:02:51,770 --> 00:02:54,200
What you can see here is that this matrix,
53
00:02:55,430 --> 00:02:57,430
the two eigenvalues it has,
54
00:02:57,560 --> 00:03:00,485
one of these eigenvalues is equal to 1.
55
00:03:00,485 --> 00:03:04,865
And in fact it turns out from a theory
that you can prove in linear algebra
56
00:03:05,090 --> 00:03:11,710
that any stochastic matrix
will have as its largest eigenvalue
57
00:03:12,980 --> 00:03:17,139
something whose
absolute value is equal to 1.
58
00:03:17,250 --> 00:03:23,150
In fact sometimes this eigenvalue here
can be negative one.
59
00:03:23,150 --> 00:03:25,230
Sometimes it can even be imaginary.
60
00:03:25,410 --> 00:03:28,750
Generically for most ordinary
stochastic matrices
61
00:03:28,750 --> 00:03:30,858
including the one we're going to look at here
62
00:03:31,370 --> 00:03:33,884
and the conditions on that
are somewhat technical
63
00:03:33,884 --> 00:03:36,044
we have to have irreducibility [or] ergodicity,
64
00:03:37,190 --> 00:03:41,089
but in general what you'll have
is that the largest eigenvalue
65
00:03:41,089 --> 00:03:43,880
for stochastic matrix is equal to unity.
66
00:03:44,535 --> 00:03:47,407
All the other eigenvalues
will have absolute values
67
00:03:47,407 --> 00:03:48,937
that are smaller than 1.
68
00:03:49,190 --> 00:03:52,570
So notice here as long as ϵ
69
00:03:53,480 --> 00:03:56,210
as long as ϵ is less than unity
70
00:03:56,210 --> 00:04:00,130
right, as long as there's some probability
for a transition between A and B
71
00:04:00,355 --> 00:04:04,335
this term here will have
an absolute value less than 1.
72
00:04:04,335 --> 00:04:09,589
Notice that if ϵ is less than ½ in fact
the second eigenvalue will be negative,
73
00:04:09,589 --> 00:04:13,039
but it still in its absolute value
will be less than 1.
74
00:04:13,268 --> 00:04:19,849
So now we've reduced the problem
of taking T to the N
75
00:04:19,900 --> 00:04:23,320
to the problem of taking T hat to the N
76
00:04:23,320 --> 00:04:27,040
where T hat is 1 on the diagonal
77
00:04:27,040 --> 00:04:31,710
and then 2 epsilon minus 1 here
and that is now acting
78
00:04:32,419 --> 00:04:36,690
on this transformed representation of the
probabilities of being in state A and B.
79
00:04:36,690 --> 00:04:40,760
If we raise this here to the power of N
80
00:04:40,760 --> 00:04:43,530
this matrix takes a very simple form,
81
00:04:43,530 --> 00:04:46,979
this first diagonal term
of course 1 to the power of n is just 1
82
00:04:46,979 --> 00:04:48,910
the off diagonal terms remain 0
83
00:04:48,910 --> 00:04:52,400
and now you have 2ϵ - 1 to the power of n
84
00:04:52,400 --> 00:04:57,559
again acting on
the transformed representation.
85
00:04:59,810 --> 00:05:02,830
As n gets very large this term here
86
00:05:03,680 --> 00:05:07,015
because it is less than 1 and
absolute value
87
00:05:07,015 --> 00:05:09,100
will continue to get smaller and smaller
88
00:05:09,100 --> 00:05:11,620
so for example let's take
the simple case
89
00:05:11,620 --> 00:05:15,450
where epsilon is equal to let's say ¼.
90
00:05:15,450 --> 00:05:20,800
In that case along the diagonal
you'll have
91
00:05:20,810 --> 00:05:24,250
a 1 up here and a negative ½ to the n here
92
00:05:24,280 --> 00:05:29,250
and when you act upon an
arbitrary column vector α₁ α₂
93
00:05:30,110 --> 00:05:33,489
this will turn into α₁
will remain unchanged
94
00:05:34,100 --> 00:05:38,830
but α₂ will be down by a very large factor
95
00:05:38,830 --> 00:05:40,900
so let's say if n is a thousand
96
00:05:41,180 --> 00:05:46,060
then 1 over 2 to the power of a thousand
is down by a factor of a million or so.
97
00:05:46,190 --> 00:05:48,100
There will be this minus sign here
98
00:05:48,100 --> 00:05:50,359
So depending upon whether n is even or odd
99
00:05:50,359 --> 00:05:54,039
it will be a very tiny positive number
or a very tiny negative number
100
00:05:54,050 --> 00:05:59,340
but in general as you take
increasingly large powers of this T
101
00:06:00,200 --> 00:06:04,620
You will find that no matter what
you put into the system,
102
00:06:04,620 --> 00:06:09,910
no matter what combination
of α₁ and α₂ you put into the system
103
00:06:09,910 --> 00:06:16,510
out the other side you will
get a vector that is pure α₁.
104
00:06:17,980 --> 00:06:19,575
Another way to say this
105
00:06:19,575 --> 00:06:22,665
using the language of the probability
of being in each of these states
106
00:06:22,665 --> 00:06:27,045
is that if you
coarse-grain your data enough
107
00:06:27,045 --> 00:06:31,549
the corresponding model will take
any probability distribution
108
00:06:31,549 --> 00:06:37,179
and map it directly to that first
eigenvector of the system.
109
00:06:37,179 --> 00:06:41,490
That first eigenvector has a special name.
It's called the stationary distribution.
110
00:06:43,389 --> 00:06:47,200
It's called the stationary distribution
of the original stochastic matrix.
111
00:06:47,950 --> 00:06:49,720
And so now
112
00:06:49,720 --> 00:06:53,080
we know that no matter
what you put in.
113
00:06:53,080 --> 00:06:55,650
Let's say if you begin entirely in state A
114
00:06:56,350 --> 00:07:01,000
if the system has been coarse-grained enough,
after one coarse-grain time step
115
00:07:01,010 --> 00:07:06,559
using the evolution operator T²
or rather T to the n as n gets very large.
116
00:07:06,559 --> 00:07:13,239
At the next time step you will go to a unique
probability distribution given by v₁
117
00:07:13,239 --> 00:07:17,200
and in fact if you begin in state B,
you will also be taken
118
00:07:17,200 --> 00:07:20,690
to the identical
probability distribution v₁.
119
00:07:20,690 --> 00:07:27,299
And so in the end
the matrix will take the following form.
120
00:07:27,609 --> 00:07:32,049
If you begin in state A
you have some probability called P(A)
121
00:07:32,260 --> 00:07:33,784
of staying in state A.
122
00:07:33,784 --> 00:07:37,564
And you have some probability of ending up and state B
123
00:07:37,564 --> 00:07:41,649
called P(B) and then we know, of course,
that that's equal to 1 - P(A)
124
00:07:42,400 --> 00:07:44,380
Similarly, if you're in state B
125
00:07:44,380 --> 00:07:47,979
you have some probability
to stay in state B
126
00:07:47,979 --> 00:07:52,560
and you have some probability
to end up back in state A
127
00:07:54,039 --> 00:07:58,569
and, again, we can write
the probability of being in state B
128
00:07:58,569 --> 00:08:02,229
as 1 minus the probability
of being in state A
129
00:08:03,190 --> 00:08:05,169
so
130
00:08:05,169 --> 00:08:08,059
What the simplification means
is that no matter
131
00:08:08,059 --> 00:08:13,019
what probabilities
for transitions you put in here
132
00:08:13,019 --> 00:08:16,349
and in particular we are free
133
00:08:16,349 --> 00:08:21,269
at short time scales to put arbitrary
probabilities for the self loop
134
00:08:21,269 --> 00:08:26,209
for A and arbitrary probabilities
for the self loop for B.
135
00:08:28,750 --> 00:08:33,200
If you begin with this description on short time scales,
136
00:08:33,200 --> 00:08:37,060
if you renormalize enough,
if you coarse-grain the observations enough
137
00:08:37,074 --> 00:08:39,031
and ask what the corresponding model is
138
00:08:39,031 --> 00:08:44,648
you find in fact that every model can be
described by a single parameter P(A).
139
00:08:44,648 --> 00:08:50,240
You move from models
that exist in a 2-dimensional space
140
00:08:50,240 --> 00:08:54,740
where I have to tell you the self-loop
probabilities for each of these individual states.
141
00:08:55,830 --> 00:08:58,430
If you coarse-grain enough that takes you
142
00:08:58,430 --> 00:09:05,090
to a set of models that can be described
by a single probability, P(A)
143
00:09:05,090 --> 00:09:12,670
and so in fact one way to imagine the
limit of this coarse-graining process
144
00:09:12,670 --> 00:09:17,940
how all of these different Markov chains
flow is in a following diagram
145
00:09:17,940 --> 00:09:23,380
which we call, sometimes call,
phase diagram for this system,
146
00:09:23,380 --> 00:09:24,870
which I can draw like this.
147
00:09:24,870 --> 00:09:29,030
On the X axis we have
the self-loop probability for the A state
148
00:09:29,030 --> 00:09:33,909
and on the y-axis we have
the self-loop probability for the B state
149
00:09:34,940 --> 00:09:43,680
So every point in this space
refers to a distinct Markov chain model.
150
00:09:43,680 --> 00:09:50,090
All of these models actually flow
to a single line on this plane.
151
00:09:50,090 --> 00:09:56,119
All the points on this line here
have the potential to be fixed points
152
00:09:56,119 --> 00:09:59,400
that as you coarse-grain the system
further and further
153
00:09:59,400 --> 00:10:04,230
they all end up on this line
where the probability
154
00:10:05,420 --> 00:10:08,320
of the self-loop for the A state is equal
155
00:10:08,870 --> 00:10:13,589
to 1 minus the probability
for the self-loop for the B state
156
00:10:13,879 --> 00:10:17,519
and the way to think about this is P(A) given A
157
00:10:18,050 --> 00:10:22,510
when you coarse-grain enough
just becomes the stationary probability,
158
00:10:22,510 --> 00:10:25,660
the stationary distribution for that Markov chain P(A)
159
00:10:25,660 --> 00:10:31,000
the self-loop probability for B
just becomes the probability for B.
160
00:10:31,000 --> 00:10:35,124
And we know that that now
has to be equal to 1 - P(A)
161
00:10:35,124 --> 00:10:36,960
for that stationary distribution.
162
00:10:36,960 --> 00:10:43,605
So what we find is that as you
continue to renormalize this system
163
00:10:44,090 --> 00:10:47,184
as you continue to coarse-grain
and ask what the corresponding model is
164
00:10:47,184 --> 00:10:50,210
you get driven towards this line here
165
00:10:50,210 --> 00:10:53,690
and as you get driven to that line here
166
00:10:53,690 --> 00:10:58,450
you flow to some
what we call fixed point
167
00:10:58,450 --> 00:11:03,330
where you get closer and closer
to the limit of T to the n
168
00:11:03,330 --> 00:11:07,020
as n goes to infinity.
169
00:11:07,030 --> 00:11:12,714
So that's a more formal, still somewhat
cartoonish story that I've told
170
00:11:12,714 --> 00:11:16,524
the reason I say it's cartoonish is that
I haven't told you the exact conditions
171
00:11:16,524 --> 00:11:18,800
for this flow to work.
172
00:11:19,250 --> 00:11:22,644
Implicitly what you have to have
is that first eigenvalue
173
00:11:22,644 --> 00:11:26,224
has to be equal to one and there
can be no other eigenvalues
174
00:11:26,224 --> 00:11:28,430
that also have
absolute value equal to one.
175
00:11:28,430 --> 00:11:33,300
So for example a model that does not flow
176
00:11:33,300 --> 00:11:37,580
model that does not flow
to somewhere on this line
177
00:11:37,580 --> 00:11:41,499
is one where the system
continually oscillates back and forth.
178
00:11:43,110 --> 00:11:45,920
In that case the self-loop probability for A is zero,
179
00:11:45,920 --> 00:11:48,160
the self-loop probability for B is zero.
180
00:11:48,160 --> 00:11:50,890
But no matter how many times
you multiply the system by itself
181
00:11:50,890 --> 00:11:54,530
it never kind of blurs out
and gives you a stationary distribution.
182
00:11:54,530 --> 00:11:56,865
I haven't told you the exact conditions
183
00:11:56,865 --> 00:11:59,260
that tell you that some
of these models are ruled out
184
00:11:59,260 --> 00:12:01,670
sort of on the corners of this space.
185
00:12:02,120 --> 00:12:05,180
But basically if you're somewhere in the interior
186
00:12:05,180 --> 00:12:09,066
you always flow to a unique fixed point
187
00:12:10,250 --> 00:12:12,609
under the coarse-graining operation.