So in the first module we gave you a
series of examples about how as

scientists or indeed as engineers we try
to simplify our observations. We don't

keep records of everything that happens
at the finest possible detail. Instead

what we do very often is if we have some
data at high resolution either out there

to be gathered or already on our hard
drives, what we're often interested in

doing is simplifying it in some way and
I give you a generic term for this which

we call coarse-graining. In particular,
what I did for the case of the etching

of Alice and her kitten Dinah is show
you different coarse-graining

prescriptions, how you could take that
image and strategically throw out

information, reduce the amount of
information you're keeping about the

original image and still if the coarse-
graining prescription is chosen well,

still keep some kind of sense of what's
happening in the system itself, what's

happening in this image, and so ok, maybe
you're not quite sure what kind of

animal Alice is playing with at this
point but at least you know that she's

playing with some kind of animal. We in
fact gave these three coarse-graining

prescriptions. The first one was majority
vote: I take a little square and I have

all the pixels vote on what color, black
or white, the output pixel should be and

in particular in the example here I took
a square that was 10 by 10 so I took a

hundred data points and just in the
majority vote assigned that either a 0

or 1 and made the pixel in other words
10 times larger and that's what happens

when you do it here. You could also do
something even simpler which is take

that 10 by 10 grid and just have the
pixel in the upper right-hand corner

sort of dictate what the final coarse-
grained pixel is going to be. The final

example I gave you we only talked about
it we didn't really show the mathematics

for how this happens was the compression
algorithm in actual use in the real

world and that's the JPEG. What the JPEG
does in fact is throw out information

that tells you about the very
fine scale oscillations in the data, the

patterns on the back of the armchair for
example and throw those out when it

thinks those are going to be
undetectable to the human eye while at

the same time keeping the overall longer
scale fluctuations in the image the

difference between let's say where the
light is and where the shadow is. It does

that through a Fourier transform and
essentially all it really does is just

chop off the high frequency components
in a way that engineers have decided is

at least not entirely visually
unpleasing. So coarse-graining is

essential part of renormalization but it's
not the whole story in fact it's only

half, because as scientists what we do is
not just gather data and we don't just

simplify it. Instead what we do is we
tend to build models of that data so

let's take the highest resolution that
you can imagine for a particular system

and take the model that you think best
predicts or describes or explains the

data at that high resolution level. Then
we can ask the obvious question: What

model best describes or predicts or
explains the data at the coarse-grained

level and what's the relationship
between those two models the model that

describes everything and the model that
describes something. The entire story of

renormalization is the relationship
between what happens when you

coarse-grain the data and what happens
when you look at the underlying

structures of the models that that
coarse-graining demands. Surprisingly as

we'll see sometimes when you course-
grain the data the models that you need

get more complicated. Sometimes
conversely they simplify and it'll be

that kind of process that we'll study
now. In order to describe the

normalization you'll see of course that
I have to tell you not just what is

happening to the data but also what's
happening to the model so I have to give

you an example not just of some data
that we're coarse-graining but of a model

that describes it.
The model that we'll use is the Markov

chain. So how do Markov chains work, what
do they operate on, what are they

supposed to describe or explain or
predict? In general Markov chains

describe time series, a series of
observations that unfold that

evolve or unfold one moment after
another. The simplest case is when each

observation is a symbolic unit so you
know an A or B or C, when the stock

went up or the stock went down, that the
person said this word or that word and

More complicated story is that each of
those observations could be a continuous

number like a temperature or a field the
value of let's say of the electric field

at a certain point in time at a certain
point in space. Here we'll just deal with

symbolic time series because they're
much easier to handle at least when the

number of symbols is small. So that's our
basic idea that's our fine-grained data

and then we're going to imagine coarse-
graining that to produce a lower

resolution time series. Now of course
that's not the only way to coarse-grain

a time series. You could also imagine
course-graining each symbol so let's say

each point in time you chose from a set
of 10 symbols. One way to coarse-grain

that time series is instead of choosing
from a set of 10 you map those 10

symbols to either a symbol A or a symbol
B that kind of course-graining something

we'll see a little bit later you might
think of it as a projection you're

projecting down the state space. Here
we'll do something a little simpler.

Imagine for example the time series was
gathered at intervals of one second.

Now what we're going to imagine is that
you know you didn't have expensive

enough equipment or your hard drive got
full so instead of keeping every second

of the evolution instead what you did
was let's say kept every other second or

every third second. In other words you
took a block of the time series and you

decimated it. You just took the first
observation within each block over time

sort of a one-dimensional version of how
we coarse-grained the sketch of Alice by

John Tenniel. All right so that's the
data we'll be operating on. The Markov

chain provides you a model of how our
time series is produced and then what

we're going to see is what happens to
those models as you ask them to describe

or predict or explain
the lower resolution version. We'll

compare one model to another as they
operate on different kinds of data.

So, here's a Markov chain. On the left-hand
side what I've shown you is a depiction

of the model I'll tell you how to read
the representation in a second. On the

right hand side there's a sample bit of
data that it might predict or explain. In

this case what I did was I just ran the
model itself because it's a nice way it's a

generative one it will tell you what's
going to happen next and so I can just

start somewhere and allow the model to
produce a simulated time series. Markov

chains are stochastic so in any
particular case it will produce

generically a different sample run. On
the right-hand side what you have is

just one example of the kind of data a
Markov chain could produce, conversely

the kind of data that a Markov chain
could describe or predict. So the

left-hand side shows the Markov chain
itself. What you see there are three

nodes A, B and C and it's simple enough
when the system is in state A it emits

the symbol A when it's in state B it
emits the symbol B and when it's the

state C it emits the symbol C and then
upon emitting that symbol it makes a

transition. It jumps to one of the other
states and the probability of jumping to

each of those other states is dictated
by the model itself and in fact I

represented that here by the arrows. So
let's say you begin in state A. With 90

percent probability you jump to state B.
Say you're in state C, with 30 percent

probability you stay in state C, with 70
percent probability you jump to state A.

Specifying those transition
probabilities corresponds to specifying

the entire set of free parameters for
the Markov chain. Once you tell me the

transition probabilities you've told me
everything I need to know about the

underlying model. Markov chains are fun
but it's also important to realize how

limited they are. For example if I am at
the B and then I'm at the C what happens

next
does not depend upon the fact that I

emitted a B. When I'm in state C it
doesn't matter how I got there. Similarly

if I'm in state A it doesn't matter how
I got there. Whatever I do is conditioned

entirely upon what's happening at the
current time step. There's no nonlocality

in the Markov chain is another way to
put it or you can think of it as a

system with no memory. If I'm in state B
that defines everything about what's

going to happen next.
And so you can see how to read this

sample run. In the sample run I begin
in state B and in fact deterministically

I know it has to happen next and go to
state C. In the sample run when I ran

this Markov chain forward it chose
randomly to stay in state C which will

happen 30% of the time so it emitted
symbol B and C then it stayed in C and

then in fact on the next time step to
jump to A and you can see that system as

it evolved from one moment to the next.
So now let's take the observable data

associated with that Markov chain or one
instantiation of that data associated

the Markov chain and coarse-grain it
using this decimation transformation and

in particular what I'm going to do is
just take blocks of two observations in

a row and take out the second one or
conversely I'm going to coarse-grain

that block of two observations to a
single observation and I'll have it

dictated by the value of the first point
in that block. So in this case I've used

blocks of size 2. I have coarse-grained the
data at let's say a two-second time

scale. It's easy enough to build the
Markov chain associated with that

coarse-grain run and in particular let's just
walk through the image we have here to

see that the arrows make sense. In the
next section I'll tell you how to derive

these mathematically but for now it's a
useful exercise to see if we can

understand the relationship between the
one step in the two step model.

Notice that in the one step model it's
impossible for the system to jump from B

to A. But in fact in the coarse grained
system it's in fact very likely that if

you see a B the next symbol you'll see
will be an A. It happens 70 percent of

the time and you can see if you look at
the coarse-grained run it's pretty easy to

find cases where you have B and then A.
In fact, on the first line I see it once

and on the second line I see it three
times BA BA CB A. But in the sample right

at the finer grain time scale that's
impossible and of course that's

impossible because when you're in state B
you deterministically go to state C. At the

coarse-grained timescale though it's
pretty easy to go from B to A. You go by

AC
but that C step is of course

unobservable. It's coarse-grained away.
And in fact in that case if you start in

state B 100% of the time you go to state
C. And so for it to show up as an A at

the next time step all that has to
happen is that when you're in state C

you have to jump to A and that happens 70
percent of the time. And so what I've

done there in a sort of laborious way is
tell you exactly why the arrow from B to

A is labeled with a 70% probability. Of
course that's just when you coarse-grade

from the fine-grained time scale to
blocks of two you can just as easily

imagine coarse-graining that
system from blocks up to two blocks of

three, to blocks of four and so forth
and so if we look here and how this

model evolves over time we're basically
telling the parallel story to how the

data is being coarse-grained we coarse-
grain in blocks of two steps or we coarse-

grain in blocks of three steps. As you
can see the model changes and it's a

little bit hard if you look at it to see
exactly the logic of that change. When

you go from one step to two step in some
senses the model maybe looks like it

gets a bit more complicated. All I mean
is that there are some transitions that

now become possible that were impossible
before. You become a little less certain

about what's going to happen. Except then
when I go to three steps some of those

transitions go away again so in fact
I become a little bit more certain

now. So what is the limit of this process?
What happens when we start coarse-

graining not on the two step but with a
three step time scale, when we continue

this and coarse-grain on increasingly
longer and longer time scales? It turns

out that models tend to flow to what
we'll call fixed points. In other words

as you go from one coarse-graining level
to the next, the corresponding model

tends to change less and less, at least
if you wait long enough. They tend to

converge on a single model as the coarse-
graining timescale gets longer and

longer and longer. And in fact not only
does every Markov chain converge to a

particular sort of limit case
Markov chain. In fact the Markov chains

that the Markov chain that it
converges to has a particularly simple

structure. Here it is. Here's the limit
point. If you take that one step Markov

chain and ask what happens if you coarse-
grain the data on really long time

scales. If you look at the transition
probabilities

you'll notice that every state has
incoming arrows all with the same weight.

The chance of going to a C is
independent upon which state you were in

previously. It's always in this limit
it's always 40%. Similarly if

you're transitioning to state A it
doesn't matter where you come from. The

probability is always the same
31 percent. If you think about how many

parameters you need to describe the
model at the one-step case, well, you have

three states, each state has three
transition probabilities but since

probabilities have to sum to one in fact
for each state you only have two free

parameters. So you have three states, two
free parameters each. There's six

parameters that describe an arbitrary
Markov model on three states. However, if

the pattern holds and it does for nearly
every Markov chain that you can write

down. Not everyone just nearly everyone.
But if that pattern holds what that

means is that when you coarse-grain on
sufficiently long time scales you only

in fact need two parameters. It's a
remarkable compression. In other words

not just of the data but of the model
space. You begin in this sort of six

dimensional manifold but if you coarse-
grain the data enough the models

corresponding to that coarse-graining all
flow to this very low-dimensional, two-

dimensional in fact, manifold,
where the model is described simply by

the incoming probabilities to each of
those three states and since they all

have to be the same the chance of you
going to C is the same no matter where

you begin and the chance of you going B
be is the same no matter where you begin

and since the probabilities outgoing
from each state have to be one even

though there are three probabilities to
specify only two of them are needed. The

third one is just one minus
the sum of the other two.

In the next module

I'll show you how to make
these computations exact

but for now this is our first

example of how theories change when the
data you ask them to describe is

coarse-grained.