1
00:00:08,969 --> 00:00:14,440
So in the first module we gave you a
series of examples about how as
2
00:00:14,440 --> 00:00:20,740
scientists or indeed as engineers we try
to simplify our observations. We don't
3
00:00:20,740 --> 00:00:25,960
keep records of everything that happens
at the finest possible detail. Instead
4
00:00:25,960 --> 00:00:31,630
what we do very often is if we have some
data at high resolution either out there
5
00:00:31,630 --> 00:00:35,350
to be gathered or already on our hard
drives, what we're often interested in
6
00:00:35,350 --> 00:00:39,940
doing is simplifying it in some way and
I give you a generic term for this which
7
00:00:39,940 --> 00:00:45,820
we call coarse-graining. In particular,
what I did for the case of the etching
8
00:00:45,820 --> 00:00:50,530
of Alice and her kitten Dinah is show
you different coarse-graining
9
00:00:50,530 --> 00:00:54,670
prescriptions, how you could take that
image and strategically throw out
10
00:00:54,670 --> 00:00:59,590
information, reduce the amount of
information you're keeping about the
11
00:00:59,590 --> 00:01:03,430
original image and still if the coarse-
graining prescription is chosen well,
12
00:01:03,430 --> 00:01:09,399
still keep some kind of sense of what's
happening in the system itself, what's
13
00:01:09,399 --> 00:01:13,420
happening in this image, and so ok, maybe
you're not quite sure what kind of
14
00:01:13,420 --> 00:01:17,649
animal Alice is playing with at this
point but at least you know that she's
15
00:01:17,649 --> 00:01:22,329
playing with some kind of animal. We in
fact gave these three coarse-graining
16
00:01:22,329 --> 00:01:28,149
prescriptions. The first one was majority
vote: I take a little square and I have
17
00:01:28,149 --> 00:01:35,109
all the pixels vote on what color, black
or white, the output pixel should be and
18
00:01:35,109 --> 00:01:38,979
in particular in the example here I took
a square that was 10 by 10 so I took a
19
00:01:38,979 --> 00:01:43,749
hundred data points and just in the
majority vote assigned that either a 0
20
00:01:43,749 --> 00:01:49,179
or 1 and made the pixel in other words
10 times larger and that's what happens
21
00:01:49,179 --> 00:01:53,200
when you do it here. You could also do
something even simpler which is take
22
00:01:53,200 --> 00:01:57,849
that 10 by 10 grid and just have the
pixel in the upper right-hand corner
23
00:01:57,849 --> 00:02:04,389
sort of dictate what the final coarse-
grained pixel is going to be. The final
24
00:02:04,389 --> 00:02:08,169
example I gave you we only talked about
it we didn't really show the mathematics
25
00:02:08,169 --> 00:02:11,319
for how this happens was the compression
algorithm in actual use in the real
26
00:02:11,319 --> 00:02:15,880
world and that's the JPEG. What the JPEG
does in fact is throw out information
27
00:02:15,880 --> 00:02:21,760
that tells you about the very
fine scale oscillations in the data, the
28
00:02:21,760 --> 00:02:28,000
patterns on the back of the armchair for
example and throw those out when it
29
00:02:28,000 --> 00:02:30,760
thinks those are going to be
undetectable to the human eye while at
30
00:02:30,760 --> 00:02:36,640
the same time keeping the overall longer
scale fluctuations in the image the
31
00:02:36,640 --> 00:02:41,650
difference between let's say where the
light is and where the shadow is. It does
32
00:02:41,650 --> 00:02:46,240
that through a Fourier transform and
essentially all it really does is just
33
00:02:46,240 --> 00:02:51,010
chop off the high frequency components
in a way that engineers have decided is
34
00:02:51,010 --> 00:02:57,970
at least not entirely visually
unpleasing. So coarse-graining is
35
00:02:57,970 --> 00:03:01,990
essential part of renormalization but it's
not the whole story in fact it's only
36
00:03:01,990 --> 00:03:07,810
half, because as scientists what we do is
not just gather data and we don't just
37
00:03:07,810 --> 00:03:13,690
simplify it. Instead what we do is we
tend to build models of that data so
38
00:03:13,690 --> 00:03:17,110
let's take the highest resolution that
you can imagine for a particular system
39
00:03:17,110 --> 00:03:22,090
and take the model that you think best
predicts or describes or explains the
40
00:03:22,090 --> 00:03:28,570
data at that high resolution level. Then
we can ask the obvious question: What
41
00:03:28,570 --> 00:03:33,670
model best describes or predicts or
explains the data at the coarse-grained
42
00:03:33,670 --> 00:03:39,489
level and what's the relationship
between those two models the model that
43
00:03:39,489 --> 00:03:45,160
describes everything and the model that
describes something. The entire story of
44
00:03:45,160 --> 00:03:48,670
renormalization is the relationship
between what happens when you
45
00:03:48,670 --> 00:03:52,510
coarse-grain the data and what happens
when you look at the underlying
46
00:03:52,510 --> 00:03:58,000
structures of the models that that
coarse-graining demands. Surprisingly as
47
00:03:58,000 --> 00:04:02,590
we'll see sometimes when you course-
grain the data the models that you need
48
00:04:02,590 --> 00:04:08,560
get more complicated. Sometimes
conversely they simplify and it'll be
49
00:04:08,560 --> 00:04:14,200
that kind of process that we'll study
now. In order to describe the
50
00:04:14,200 --> 00:04:18,700
normalization you'll see of course that
I have to tell you not just what is
51
00:04:18,700 --> 00:04:23,530
happening to the data but also what's
happening to the model so I have to give
52
00:04:23,530 --> 00:04:28,630
you an example not just of some data
that we're coarse-graining but of a model
53
00:04:28,630 --> 00:04:33,920
that describes it.
The model that we'll use is the Markov
54
00:04:33,920 --> 00:04:37,070
chain. So how do Markov chains work, what
do they operate on, what are they
55
00:04:37,070 --> 00:04:41,330
supposed to describe or explain or
predict? In general Markov chains
56
00:04:41,330 --> 00:04:45,950
describe time series, a series of
observations that unfold that
57
00:04:45,950 --> 00:04:50,930
evolve or unfold one moment after
another. The simplest case is when each
58
00:04:50,930 --> 00:04:57,110
observation is a symbolic unit so you
know an A or B or C, when the stock
59
00:04:57,110 --> 00:05:01,550
went up or the stock went down, that the
person said this word or that word and
60
00:05:01,550 --> 00:05:05,330
More complicated story is that each of
those observations could be a continuous
61
00:05:05,330 --> 00:05:10,580
number like a temperature or a field the
value of let's say of the electric field
62
00:05:10,580 --> 00:05:16,100
at a certain point in time at a certain
point in space. Here we'll just deal with
63
00:05:16,100 --> 00:05:19,550
symbolic time series because they're
much easier to handle at least when the
64
00:05:19,550 --> 00:05:23,450
number of symbols is small. So that's our
basic idea that's our fine-grained data
65
00:05:23,450 --> 00:05:27,229
and then we're going to imagine coarse-
graining that to produce a lower
66
00:05:27,229 --> 00:05:30,260
resolution time series. Now of course
that's not the only way to coarse-grain
67
00:05:30,260 --> 00:05:36,470
a time series. You could also imagine
course-graining each symbol so let's say
68
00:05:36,470 --> 00:05:40,940
each point in time you chose from a set
of 10 symbols. One way to coarse-grain
69
00:05:40,940 --> 00:05:44,180
that time series is instead of choosing
from a set of 10 you map those 10
70
00:05:44,180 --> 00:05:50,960
symbols to either a symbol A or a symbol
B that kind of course-graining something
71
00:05:50,960 --> 00:05:53,240
we'll see a little bit later you might
think of it as a projection you're
72
00:05:53,240 --> 00:05:56,300
projecting down the state space. Here
we'll do something a little simpler.
73
00:05:56,300 --> 00:06:00,940
Imagine for example the time series was
gathered at intervals of one second.
74
00:06:00,940 --> 00:06:04,669
Now what we're going to imagine is that
you know you didn't have expensive
75
00:06:04,669 --> 00:06:08,750
enough equipment or your hard drive got
full so instead of keeping every second
76
00:06:08,750 --> 00:06:13,310
of the evolution instead what you did
was let's say kept every other second or
77
00:06:13,310 --> 00:06:17,900
every third second. In other words you
took a block of the time series and you
78
00:06:17,900 --> 00:06:23,150
decimated it. You just took the first
observation within each block over time
79
00:06:23,150 --> 00:06:28,340
sort of a one-dimensional version of how
we coarse-grained the sketch of Alice by
80
00:06:28,340 --> 00:06:33,169
John Tenniel. All right so that's the
data we'll be operating on. The Markov
81
00:06:33,169 --> 00:06:38,180
chain provides you a model of how our
time series is produced and then what
82
00:06:38,180 --> 00:06:42,710
we're going to see is what happens to
those models as you ask them to describe
83
00:06:42,710 --> 00:06:46,599
or predict or explain
the lower resolution version. We'll
84
00:06:46,599 --> 00:06:52,589
compare one model to another as they
operate on different kinds of data.
85
00:06:52,589 --> 00:06:57,909
So, here's a Markov chain. On the left-hand
side what I've shown you is a depiction
86
00:06:57,909 --> 00:07:01,089
of the model I'll tell you how to read
the representation in a second. On the
87
00:07:01,089 --> 00:07:04,839
right hand side there's a sample bit of
data that it might predict or explain. In
88
00:07:04,839 --> 00:07:08,169
this case what I did was I just ran the
model itself because it's a nice way it's a
89
00:07:08,169 --> 00:07:12,550
generative one it will tell you what's
going to happen next and so I can just
90
00:07:12,550 --> 00:07:18,669
start somewhere and allow the model to
produce a simulated time series. Markov
91
00:07:18,669 --> 00:07:22,089
chains are stochastic so in any
particular case it will produce
92
00:07:22,089 --> 00:07:25,930
generically a different sample run. On
the right-hand side what you have is
93
00:07:25,930 --> 00:07:30,729
just one example of the kind of data a
Markov chain could produce, conversely
94
00:07:30,729 --> 00:07:36,159
the kind of data that a Markov chain
could describe or predict. So the
95
00:07:36,159 --> 00:07:39,550
left-hand side shows the Markov chain
itself. What you see there are three
96
00:07:39,550 --> 00:07:45,099
nodes A, B and C and it's simple enough
when the system is in state A it emits
97
00:07:45,099 --> 00:07:48,849
the symbol A when it's in state B it
emits the symbol B and when it's the
98
00:07:48,849 --> 00:07:52,959
state C it emits the symbol C and then
upon emitting that symbol it makes a
99
00:07:52,959 --> 00:07:57,550
transition. It jumps to one of the other
states and the probability of jumping to
100
00:07:57,550 --> 00:08:01,719
each of those other states is dictated
by the model itself and in fact I
101
00:08:01,719 --> 00:08:06,819
represented that here by the arrows. So
let's say you begin in state A. With 90
102
00:08:06,819 --> 00:08:12,459
percent probability you jump to state B.
Say you're in state C, with 30 percent
103
00:08:12,459 --> 00:08:17,309
probability you stay in state C, with 70
percent probability you jump to state A.
104
00:08:17,309 --> 00:08:21,669
Specifying those transition
probabilities corresponds to specifying
105
00:08:21,669 --> 00:08:25,719
the entire set of free parameters for
the Markov chain. Once you tell me the
106
00:08:25,719 --> 00:08:29,499
transition probabilities you've told me
everything I need to know about the
107
00:08:29,499 --> 00:08:34,299
underlying model. Markov chains are fun
but it's also important to realize how
108
00:08:34,299 --> 00:08:39,760
limited they are. For example if I am at
the B and then I'm at the C what happens
109
00:08:39,760 --> 00:08:42,669
next
does not depend upon the fact that I
110
00:08:42,669 --> 00:08:48,040
emitted a B. When I'm in state C it
doesn't matter how I got there. Similarly
111
00:08:48,040 --> 00:08:52,959
if I'm in state A it doesn't matter how
I got there. Whatever I do is conditioned
112
00:08:52,959 --> 00:08:58,300
entirely upon what's happening at the
current time step. There's no nonlocality
113
00:08:58,300 --> 00:09:02,079
in the Markov chain is another way to
put it or you can think of it as a
114
00:09:02,079 --> 00:09:07,000
system with no memory. If I'm in state B
that defines everything about what's
115
00:09:07,000 --> 00:09:10,269
going to happen next.
And so you can see how to read this
116
00:09:10,269 --> 00:09:14,230
sample run. In the sample run I begin
in state B and in fact deterministically
117
00:09:14,230 --> 00:09:18,370
I know it has to happen next and go to
state C. In the sample run when I ran
118
00:09:18,370 --> 00:09:22,059
this Markov chain forward it chose
randomly to stay in state C which will
119
00:09:22,059 --> 00:09:27,819
happen 30% of the time so it emitted
symbol B and C then it stayed in C and
120
00:09:27,819 --> 00:09:32,079
then in fact on the next time step to
jump to A and you can see that system as
121
00:09:32,079 --> 00:09:38,850
it evolved from one moment to the next.
So now let's take the observable data
122
00:09:38,850 --> 00:09:43,600
associated with that Markov chain or one
instantiation of that data associated
123
00:09:43,600 --> 00:09:47,949
the Markov chain and coarse-grain it
using this decimation transformation and
124
00:09:47,949 --> 00:09:52,180
in particular what I'm going to do is
just take blocks of two observations in
125
00:09:52,180 --> 00:09:56,649
a row and take out the second one or
conversely I'm going to coarse-grain
126
00:09:56,649 --> 00:10:00,309
that block of two observations to a
single observation and I'll have it
127
00:10:00,309 --> 00:10:04,720
dictated by the value of the first point
in that block. So in this case I've used
128
00:10:04,720 --> 00:10:10,569
blocks of size 2. I have coarse-grained the
data at let's say a two-second time
129
00:10:10,569 --> 00:10:16,059
scale. It's easy enough to build the
Markov chain associated with that
130
00:10:16,059 --> 00:10:21,220
coarse-grain run and in particular let's just
walk through the image we have here to
131
00:10:21,220 --> 00:10:25,000
see that the arrows make sense. In the
next section I'll tell you how to derive
132
00:10:25,000 --> 00:10:29,800
these mathematically but for now it's a
useful exercise to see if we can
133
00:10:29,800 --> 00:10:33,610
understand the relationship between the
one step in the two step model.
134
00:10:33,610 --> 00:10:37,990
Notice that in the one step model it's
impossible for the system to jump from B
135
00:10:37,990 --> 00:10:43,329
to A. But in fact in the coarse grained
system it's in fact very likely that if
136
00:10:43,329 --> 00:10:47,589
you see a B the next symbol you'll see
will be an A. It happens 70 percent of
137
00:10:47,589 --> 00:10:51,430
the time and you can see if you look at
the coarse-grained run it's pretty easy to
138
00:10:51,430 --> 00:10:56,620
find cases where you have B and then A.
In fact, on the first line I see it once
139
00:10:56,620 --> 00:11:03,490
and on the second line I see it three
times BA BA CB A. But in the sample right
140
00:11:03,490 --> 00:11:06,250
at the finer grain time scale that's
impossible and of course that's
141
00:11:06,250 --> 00:11:12,460
impossible because when you're in state B
you deterministically go to state C. At the
142
00:11:12,460 --> 00:11:17,530
coarse-grained timescale though it's
pretty easy to go from B to A. You go by
143
00:11:17,530 --> 00:11:19,300
AC
but that C step is of course
144
00:11:19,300 --> 00:11:24,280
unobservable. It's coarse-grained away.
And in fact in that case if you start in
145
00:11:24,280 --> 00:11:28,960
state B 100% of the time you go to state
C. And so for it to show up as an A at
146
00:11:28,960 --> 00:11:32,350
the next time step all that has to
happen is that when you're in state C
147
00:11:32,350 --> 00:11:36,070
you have to jump to A and that happens 70
percent of the time. And so what I've
148
00:11:36,070 --> 00:11:42,370
done there in a sort of laborious way is
tell you exactly why the arrow from B to
149
00:11:42,370 --> 00:11:49,300
A is labeled with a 70% probability. Of
course that's just when you coarse-grade
150
00:11:49,300 --> 00:11:53,950
from the fine-grained time scale to
blocks of two you can just as easily
151
00:11:53,950 --> 00:11:58,600
imagine coarse-graining that
system from blocks up to two blocks of
152
00:11:58,600 --> 00:12:04,240
three, to blocks of four and so forth
and so if we look here and how this
153
00:12:04,240 --> 00:12:08,320
model evolves over time we're basically
telling the parallel story to how the
154
00:12:08,320 --> 00:12:13,180
data is being coarse-grained we coarse-
grain in blocks of two steps or we coarse-
155
00:12:13,180 --> 00:12:17,020
grain in blocks of three steps. As you
can see the model changes and it's a
156
00:12:17,020 --> 00:12:21,550
little bit hard if you look at it to see
exactly the logic of that change. When
157
00:12:21,550 --> 00:12:24,760
you go from one step to two step in some
senses the model maybe looks like it
158
00:12:24,760 --> 00:12:28,330
gets a bit more complicated. All I mean
is that there are some transitions that
159
00:12:28,330 --> 00:12:32,050
now become possible that were impossible
before. You become a little less certain
160
00:12:32,050 --> 00:12:36,460
about what's going to happen. Except then
when I go to three steps some of those
161
00:12:36,460 --> 00:12:39,460
transitions go away again so in fact
I become a little bit more certain
162
00:12:39,460 --> 00:12:45,760
now. So what is the limit of this process?
What happens when we start coarse-
163
00:12:45,760 --> 00:12:49,690
graining not on the two step but with a
three step time scale, when we continue
164
00:12:49,690 --> 00:12:54,220
this and coarse-grain on increasingly
longer and longer time scales? It turns
165
00:12:54,220 --> 00:12:59,020
out that models tend to flow to what
we'll call fixed points. In other words
166
00:12:59,020 --> 00:13:03,550
as you go from one coarse-graining level
to the next, the corresponding model
167
00:13:03,550 --> 00:13:08,050
tends to change less and less, at least
if you wait long enough. They tend to
168
00:13:08,050 --> 00:13:12,670
converge on a single model as the coarse-
graining timescale gets longer and
169
00:13:12,670 --> 00:13:18,850
longer and longer. And in fact not only
does every Markov chain converge to a
170
00:13:18,850 --> 00:13:25,080
particular sort of limit case
Markov chain. In fact the Markov chains
171
00:13:25,080 --> 00:13:30,360
that the Markov chain that it
converges to has a particularly simple
172
00:13:30,360 --> 00:13:34,980
structure. Here it is. Here's the limit
point. If you take that one step Markov
173
00:13:34,980 --> 00:13:39,120
chain and ask what happens if you coarse-
grain the data on really long time
174
00:13:39,120 --> 00:13:42,570
scales. If you look at the transition
probabilities
175
00:13:42,570 --> 00:13:47,010
you'll notice that every state has
incoming arrows all with the same weight.
176
00:13:47,010 --> 00:13:51,600
The chance of going to a C is
independent upon which state you were in
177
00:13:51,600 --> 00:13:58,740
previously. It's always in this limit
it's always 40%. Similarly if
178
00:13:58,740 --> 00:14:03,000
you're transitioning to state A it
doesn't matter where you come from. The
179
00:14:03,000 --> 00:14:08,100
probability is always the same
31 percent. If you think about how many
180
00:14:08,100 --> 00:14:12,870
parameters you need to describe the
model at the one-step case, well, you have
181
00:14:12,870 --> 00:14:16,590
three states, each state has three
transition probabilities but since
182
00:14:16,590 --> 00:14:19,980
probabilities have to sum to one in fact
for each state you only have two free
183
00:14:19,980 --> 00:14:24,840
parameters. So you have three states, two
free parameters each. There's six
184
00:14:24,840 --> 00:14:30,420
parameters that describe an arbitrary
Markov model on three states. However, if
185
00:14:30,420 --> 00:14:35,010
the pattern holds and it does for nearly
every Markov chain that you can write
186
00:14:35,010 --> 00:14:41,550
down. Not everyone just nearly everyone.
But if that pattern holds what that
187
00:14:41,550 --> 00:14:45,390
means is that when you coarse-grain on
sufficiently long time scales you only
188
00:14:45,390 --> 00:14:50,400
in fact need two parameters. It's a
remarkable compression. In other words
189
00:14:50,400 --> 00:14:55,170
not just of the data but of the model
space. You begin in this sort of six
190
00:14:55,170 --> 00:14:59,610
dimensional manifold but if you coarse-
grain the data enough the models
191
00:14:59,610 --> 00:15:04,500
corresponding to that coarse-graining all
flow to this very low-dimensional, two-
192
00:15:04,500 --> 00:15:10,230
dimensional in fact, manifold,
where the model is described simply by
193
00:15:10,230 --> 00:15:15,060
the incoming probabilities to each of
those three states and since they all
194
00:15:15,060 --> 00:15:18,780
have to be the same the chance of you
going to C is the same no matter where
195
00:15:18,780 --> 00:15:22,290
you begin and the chance of you going B
be is the same no matter where you begin
196
00:15:22,290 --> 00:15:26,820
and since the probabilities outgoing
from each state have to be one even
197
00:15:26,820 --> 00:15:31,320
though there are three probabilities to
specify only two of them are needed. The
198
00:15:31,320 --> 00:15:35,770
third one is just one minus
the sum of the other two.
199
00:15:35,780 --> 00:15:37,860
In the next module
200
00:15:37,860 --> 00:15:41,120
I'll show you how to make
these computations exact
201
00:15:41,120 --> 00:15:42,870
but for now this is our first
202
00:15:42,870 --> 00:15:47,790
example of how theories change when the
data you ask them to describe is
203
00:15:47,790 --> 00:15:50,510
coarse-grained.