in order to evolve strategies with a
genetic algorithm

we need to encode them in such a way
that the genetic algorithm can deal with

them

so here's how this works. So remember
here's our example the strategy

with all possible situations represented.

And an action given for each possible
situation

what we can do now is just take the list
of

actions and remember

the order of the possibles 243
situations.

So now we can cut off this sort of
actions and remember that the first

action corresponds to the first
situation

which is everything empty. The second
action corresponds to the second

situation which is everything

empty except for can of the current site
and so on.

So now we can take just that list

actions and we can encode it according to

a numerical code. So what I'm gonna
do is give each

action and number that its code
number

026 then I can encode this

particular strategy by substituting the
code number

for each other actions in the list. So
now.

a 0 here in this first position means that

the very first situation that is all

empty corresponds to the action move
north

a 6 here means that the third situation

whatever that was in our ordered list
corresponds to

the action move random and so on.

And this way we now have a list of
numbers

that represents a specific strategy

when Robbie coast to perform an action
he looks around his situation

he remembers the ordering a possible
situations

and he says oh I'm in situation let's say
201

that he looks for the two hundred and
first entry on this list

figures out what the code corresponds to
in terms of an action and performs that

action.

and the does that again and again and
again so that's how work in and hold a

strategy

so you can think of this strategy very
in kind of metaphorical sense

as Robbie's DNA this is what's going to
evolve

thislist of numbers and of course
this list numbers has 243 values

so if you wanna think of each one of
these numbers as a

particular gene you can say that Robbie
has 243 genes.

of course the analogy with genetics is
not

meant to be exact it's only meant to be
sort of a very

rough conceptual 1. Okay so now here's a
question

how many possible strategies are there
given this representation

we'd like to find a good strategy but
the question is how

how many would we have to sort through
in order to find a good one

we can ask how many possible ones are
there. Well if we look at

243 different positions

and in each position there could be
seven possible actions

since we have the numbers 0 through six
to go here

so the total number possible strategies
is

seven-time seven-time seven for each
possible number we could put in each

position

243 times so the answer is a stunning 7
race to 243 powered

which is one of those astronomical
numbers

that you could never hope to get through

in your lifetime or the even the
lifetime up the whole universe

this is an impossible number nobody
could possibly search through all these

possible strategies

and the goal here is to have a genetic
algorithm

search intelligently in this vast space

for a good strategy now we're finally
ready to

figure out how to use a genetic
algorithm to involve strategies

so the first thing that we're gonna do
is to generate

200 random strategies. the number two
hundred is just arbitrary I just picked

it here

but some number have completely random
strategies that can be our starting

population

of strategies. So here's just a

list I ran this when I run the TA I
generate

200 individuals. For each individual is a
strategy it'll

this sequence of 243 numbers

Each numbers between 0 and six so you
can just see they look

kind of random hopefully because they are.

okay so now we're going to calculate the
fitness

each strategy by having Robbie use that
strategy

to perform a bunch of actions

in this world which is filled with this
empty soda cans

and when I'm gonna call environment

is a particular configuration empty soda
cans in his world.

we start out we can just put empty soda
cans at random places

then we let Robbie obeyed a particular
strategy for certain number of

actions that he can perform. We calculate
his reward minuses penalties

and we do it again for a different
environment that is a different

configuration of cannes.

We do this many times and that fitness
of the particular strategy

is the average reward minus penalties

over all those different environments. At
this point

we've calculated the fitness separately
for

200 different strategies and now the
amazing part is

where the pair up can create offspring
be a sexual recombination

corset in quotations here because
they're strings numbers on a computer

with random mutations. The fitter the
parents

the more offspring they create so what
we do is we pick

to fit parents pick them
probabilistically

with the fitter once been more likely to
you be chosen.

now I'm gonna create an offspring by
choosing some point

here in the strings to split them

and the offspring the child is gonna get
this

first part from its parent 1 and
the second part from its parent 2.

okay so I make the child just by copying

part of the parent 1 and part parent 2
you to create the child

and then the child can be randomly
mutated by

choosing a few random positions and

changing those two random numbers
between 0 and

6 which is randomly change some numbers

and now we have our child. And we keep
doing this again and again

until we have a new generation of
offspring

the old generation completely dies out
and then we start again

at step 2 with the new generation

and we do this for many generations

until we're happy with the strategy that
GA has found

or into or second running then we quit.

I wanted to point out that the maximum
possible fitness of a strategy is about

500

that's because there's 100 squares total

in Robbie's world and each time Robbie
starts out in an environment

there's about 50 cans that is about half
the sites

have cans on them. Sometimes it's less
than 50 sometimes it's more than 50

because the randomness but on average
it's 50

and each can is worth 10 points

and so therefore the maximum number of
points

that Robbie can get environment is about
500.

so before I ran the GA

I came up with my own hand design
strategy which was

the simplest reasonable strategy I could
think of

here it is if there's a can in the
current site pick it up

otherwise if there is can in one of the
adjacent sites move to that side

otherwise choose a random direction to
move in.

so I implemented that strategy as a
string

of 243 values.

I wrote good program to do that. Then I

that Robbie used that strategy to
collect cans and a number different

environments and I took his average
score

over number different environments and

the average fitness was 346 out of

about 500 max possible

okay so not too bad

but then I implemented the genetic
algorithm and ran the genetic algorithm

just as I explain to you and found that

its average fitness have the best the
average fitness is the best

evolve strategy was 486

almost perfect. So the question was

why is the genetic algorithm doing so
much better

than I am. What is its strategy

doing that is so much better

this turns out to be a non-trivial
question to answer

I know the logic behind my strategy but
the genetic algorithm

at the end of its run spits out a
stringer

243 numbers and says here's my strategy

this one works well its has a high
fitness but

it's hard for us to see exactly how it
works

it's kind of analogous to the human
genome

you know that the human genome was
sequenced we know

with the sequences but there's a big

jumped from knowing what the sequences
to knowing how it works what the

functionality is

how the different genes give rise to
different functions

so to look at to genetic algorithm
strategy

I first plotted for run up the genetic
algorithm

applauded the general at the best
fitness in the population as a function

I love generation

sell it as time goes up but see how

the fitness have the best individual in
the population and

are improving this is the version I
implement remote programming languages

see

you are going to be able to play with
the version I implemented in net logo

a little while but that one actually is
relatively slow

so I ran the sea persian to get these
results

so you can see that at the very first
generation

fitness is very low it's below 0

so it's negative will see why in a
minute

but very quickly the best individual in
the population

starts having higher and higher fitness

you know that each generation the whole
population turns over so these

individuals are different in each
generation

the fitness stays about the same for a
little while here and then

it takes a steep jump the steep jump
until it reaches another plateau takes

another jump

another plot to all these little
variations appear

are because the fitness function itself
have some randomness in it because we're

randomly generating

the cans. The cans aren't always the same
number

and the layout of cans ferries from
environment to environment so

you get that different fitness is even
for the same individual a different

generations because serbian tried out on
different can configurations