So in the first module we gave you a
series of examples about how as
scientists or indeed as engineers we try
to simplify our observations. We don't
keep records of everything that happens
at the finest possible detail. Instead
what we do very often is if we have some
data at high resolution either out there
to be gathered or already on our hard
drives, what we're often interested in
doing is simplifying it in some way and
I give you a generic term for this which
we call coarse-graining. In particular,
what I did for the case of the etching
of Alice and her kitten Dinah is show
you different coarse-graining
prescriptions, how you could take that
image and strategically throw out
information, reduce the amount of
information you're keeping about the
original image and still if the coarse-
graining prescription is chosen well,
still keep some kind of sense of what's
happening in the system itself, what's
happening in this image, and so ok, maybe
you're not quite sure what kind of
animal Alice is playing with at this
point but at least you know that she's
playing with some kind of animal. We in
fact gave these three coarse-graining
prescriptions. The first one was majority
vote: I take a little square and I have
all the pixels vote on what color, black
or white, the output pixel should be and
in particular in the example here I took
a square that was 10 by 10 so I took a
hundred data points and just in the
majority vote assigned that either a 0
or 1 and made the pixel in other words
10 times larger and that's what happens
when you do it here. You could also do
something even simpler which is take
that 10 by 10 grid and just have the
pixel in the upper right-hand corner
sort of dictate what the final coarse-
grained pixel is going to be. The final
example I gave you we only talked about
it we didn't really show the mathematics
for how this happens was the compression
algorithm in actual use in the real
world and that's the JPEG. What the JPEG
does in fact is throw out information
that tells you about the very
fine scale oscillations in the data, the
patterns on the back of the armchair for
example and throw those out when it
thinks those are going to be
undetectable to the human eye while at
the same time keeping the overall longer
scale fluctuations in the image the
difference between let's say where the
light is and where the shadow is. It does
that through a Fourier transform and
essentially all it really does is just
chop off the high frequency components
in a way that engineers have decided is
at least not entirely visually
unpleasing. So coarse-graining is
essential part of renormalization but it's
not the whole story in fact it's only
half, because as scientists what we do is
not just gather data and we don't just
simplify it. Instead what we do is we
tend to build models of that data so
let's take the highest resolution that
you can imagine for a particular system
and take the model that you think best
predicts or describes or explains the
data at that high resolution level. Then
we can ask the obvious question: What
model best describes or predicts or
explains the data at the coarse-grained
level and what's the relationship
between those two models the model that
describes everything and the model that
describes something. The entire story of
renormalization is the relationship
between what happens when you
coarse-grain the data and what happens
when you look at the underlying
structures of the models that that
coarse-graining demands. Surprisingly as
we'll see sometimes when you course-
grain the data the models that you need
get more complicated. Sometimes
conversely they simplify and it'll be
that kind of process that we'll study
now. In order to describe the
normalization you'll see of course that
I have to tell you not just what is
happening to the data but also what's
happening to the model so I have to give
you an example not just of some data
that we're coarse-graining but of a model
that describes it.
The model that we'll use is the Markov
chain. So how do Markov chains work, what
do they operate on, what are they
supposed to describe or explain or
predict? In general Markov chains
describe time series, a series of
observations that unfold that
evolve or unfold one moment after
another. The simplest case is when each
observation is a symbolic unit so you
know an A or B or C, when the stock
went up or the stock went down, that the
person said this word or that word and
More complicated story is that each of
those observations could be a continuous
number like a temperature or a field the
value of let's say of the electric field
at a certain point in time at a certain
point in space. Here we'll just deal with
symbolic time series because they're
much easier to handle at least when the
number of symbols is small. So that's our
basic idea that's our fine-grained data
and then we're going to imagine coarse-
graining that to produce a lower
resolution time series. Now of course
that's not the only way to coarse-grain
a time series. You could also imagine
course-graining each symbol so let's say
each point in time you chose from a set
of 10 symbols. One way to coarse-grain
that time series is instead of choosing
from a set of 10 you map those 10
symbols to either a symbol A or a symbol
B that kind of course-graining something
we'll see a little bit later you might
think of it as a projection you're
projecting down the state space. Here
we'll do something a little simpler.
Imagine for example the time series was
gathered at intervals of one second.
Now what we're going to imagine is that
you know you didn't have expensive
enough equipment or your hard drive got
full so instead of keeping every second
of the evolution instead what you did
was let's say kept every other second or
every third second. In other words you
took a block of the time series and you
decimated it. You just took the first
observation within each block over time
sort of a one-dimensional version of how
we coarse-grained the sketch of Alice by
John Tenniel. All right so that's the
data we'll be operating on. The Markov
chain provides you a model of how our
time series is produced and then what
we're going to see is what happens to
those models as you ask them to describe
or predict or explain
the lower resolution version. We'll
compare one model to another as they
operate on different kinds of data.
So, here's a Markov chain. On the left-hand
side what I've shown you is a depiction
of the model I'll tell you how to read
the representation in a second. On the
right hand side there's a sample bit of
data that it might predict or explain. In
this case what I did was I just ran the
model itself because it's a nice way it's a
generative one it will tell you what's
going to happen next and so I can just
start somewhere and allow the model to
produce a simulated time series. Markov
chains are stochastic so in any
particular case it will produce
generically a different sample run. On
the right-hand side what you have is
just one example of the kind of data a
Markov chain could produce, conversely
the kind of data that a Markov chain
could describe or predict. So the
left-hand side shows the Markov chain
itself. What you see there are three
nodes A, B and C and it's simple enough
when the system is in state A it emits
the symbol A when it's in state B it
emits the symbol B and when it's the
state C it emits the symbol C and then
upon emitting that symbol it makes a
transition. It jumps to one of the other
states and the probability of jumping to
each of those other states is dictated
by the model itself and in fact I
represented that here by the arrows. So
let's say you begin in state A. With 90
percent probability you jump to state B.
Say you're in state C, with 30 percent
probability you stay in state C, with 70
percent probability you jump to state A.
Specifying those transition
probabilities corresponds to specifying
the entire set of free parameters for
the Markov chain. Once you tell me the
transition probabilities you've told me
everything I need to know about the
underlying model. Markov chains are fun
but it's also important to realize how
limited they are. For example if I am at
the B and then I'm at the C what happens
next
does not depend upon the fact that I
emitted a B. When I'm in state C it
doesn't matter how I got there. Similarly
if I'm in state A it doesn't matter how
I got there. Whatever I do is conditioned
entirely upon what's happening at the
current time step. There's no nonlocality
in the Markov chain is another way to
put it or you can think of it as a
system with no memory. If I'm in state B
that defines everything about what's
going to happen next.
And so you can see how to read this
sample run. In the sample run I begin
in state B and in fact deterministically
I know it has to happen next and go to
state C. In the sample run when I ran
this Markov chain forward it chose
randomly to stay in state C which will
happen 30% of the time so it emitted
symbol B and C then it stayed in C and
then in fact on the next time step to
jump to A and you can see that system as
it evolved from one moment to the next.
So now let's take the observable data
associated with that Markov chain or one
instantiation of that data associated
the Markov chain and coarse-grain it
using this decimation transformation and
in particular what I'm going to do is
just take blocks of two observations in
a row and take out the second one or
conversely I'm going to coarse-grain
that block of two observations to a
single observation and I'll have it
dictated by the value of the first point
in that block. So in this case I've used
blocks of size 2. I have coarse-grained the
data at let's say a two-second time
scale. It's easy enough to build the
Markov chain associated with that
coarse-grain run and in particular let's just
walk through the image we have here to
see that the arrows make sense. In the
next section I'll tell you how to derive
these mathematically but for now it's a
useful exercise to see if we can
understand the relationship between the
one step in the two step model.
Notice that in the one step model it's
impossible for the system to jump from B
to A. But in fact in the coarse grained
system it's in fact very likely that if
you see a B the next symbol you'll see
will be an A. It happens 70 percent of
the time and you can see if you look at
the coarse-grained run it's pretty easy to
find cases where you have B and then A.
In fact, on the first line I see it once
and on the second line I see it three
times BA BA CB A. But in the sample right
at the finer grain time scale that's
impossible and of course that's
impossible because when you're in state B
you deterministically go to state C. At the
coarse-grained timescale though it's
pretty easy to go from B to A. You go by
AC
but that C step is of course
unobservable. It's coarse-grained away.
And in fact in that case if you start in
state B 100% of the time you go to state
C. And so for it to show up as an A at
the next time step all that has to
happen is that when you're in state C
you have to jump to A and that happens 70
percent of the time. And so what I've
done there in a sort of laborious way is
tell you exactly why the arrow from B to
A is labeled with a 70% probability. Of
course that's just when you coarse-grade
from the fine-grained time scale to
blocks of two you can just as easily
imagine coarse-graining that
system from blocks up to two blocks of
three, to blocks of four and so forth
and so if we look here and how this
model evolves over time we're basically
telling the parallel story to how the
data is being coarse-grained we coarse-
grain in blocks of two steps or we coarse-
grain in blocks of three steps. As you
can see the model changes and it's a
little bit hard if you look at it to see
exactly the logic of that change. When
you go from one step to two step in some
senses the model maybe looks like it
gets a bit more complicated. All I mean
is that there are some transitions that
now become possible that were impossible
before. You become a little less certain
about what's going to happen. Except then
when I go to three steps some of those
transitions go away again so in fact
I become a little bit more certain
now. So what is the limit of this process?
What happens when we start coarse-
graining not on the two step but with a
three step time scale, when we continue
this and coarse-grain on increasingly
longer and longer time scales? It turns
out that models tend to flow to what
we'll call fixed points. In other words
as you go from one coarse-graining level
to the next, the corresponding model
tends to change less and less, at least
if you wait long enough. They tend to
converge on a single model as the coarse-
graining timescale gets longer and
longer and longer. And in fact not only
does every Markov chain converge to a
particular sort of limit case
Markov chain. In fact the Markov chains
that the Markov chain that it
converges to has a particularly simple
structure. Here it is. Here's the limit
point. If you take that one step Markov
chain and ask what happens if you coarse-
grain the data on really long time
scales. If you look at the transition
probabilities
you'll notice that every state has
incoming arrows all with the same weight.
The chance of going to a C is
independent upon which state you were in
previously. It's always in this limit
it's always 40%. Similarly if
you're transitioning to state A it
doesn't matter where you come from. The
probability is always the same
31 percent. If you think about how many
parameters you need to describe the
model at the one-step case, well, you have
three states, each state has three
transition probabilities but since
probabilities have to sum to one in fact
for each state you only have two free
parameters. So you have three states, two
free parameters each. There's six
parameters that describe an arbitrary
Markov model on three states. However, if
the pattern holds and it does for nearly
every Markov chain that you can write
down. Not everyone just nearly everyone.
But if that pattern holds what that
means is that when you coarse-grain on
sufficiently long time scales you only
in fact need two parameters. It's a
remarkable compression. In other words
not just of the data but of the model
space. You begin in this sort of six
dimensional manifold but if you coarse-
grain the data enough the models
corresponding to that coarse-graining all
flow to this very low-dimensional, two-
dimensional in fact, manifold,
where the model is described simply by
the incoming probabilities to each of
those three states and since they all
have to be the same the chance of you
going to C is the same no matter where
you begin and the chance of you going B
be is the same no matter where you begin
and since the probabilities outgoing
from each state have to be one even
though there are three probabilities to
specify only two of them are needed. The
third one is just one minus
the sum of the other two.
In the next module
I'll show you how to make
these computations exact
but for now this is our first
example of how theories change when the
data you ask them to describe is
coarse-grained.