in order to evolve strategies with a
genetic algorithm
we need to encode them in such a way
that the genetic algorithm can deal with
them
so here's how this works. So remember
here's our example the strategy
with all possible situations represented.
And an action given for each possible
situation
what we can do now is just take the list
of
actions and remember
the order of the possibles 243
situations.
So now we can cut off this sort of
actions and remember that the first
action corresponds to the first
situation
which is everything empty. The second
action corresponds to the second
situation which is everything
empty except for can of the current site
and so on.
So now we can take just that list
actions and we can encode it according to
a numerical code. So what I'm gonna
do is give each
action and number that its code
number
026 then I can encode this
particular strategy by substituting the
code number
for each other actions in the list. So
now.
a 0 here in this first position means that
the very first situation that is all
empty corresponds to the action move
north
a 6 here means that the third situation
whatever that was in our ordered list
corresponds to
the action move random and so on.
And this way we now have a list of
numbers
that represents a specific strategy
when Robbie coast to perform an action
he looks around his situation
he remembers the ordering a possible
situations
and he says oh I'm in situation let's say
201
that he looks for the two hundred and
first entry on this list
figures out what the code corresponds to
in terms of an action and performs that
action.
and the does that again and again and
again so that's how work in and hold a
strategy
so you can think of this strategy very
in kind of metaphorical sense
as Robbie's DNA this is what's going to
evolve
thislist of numbers and of course
this list numbers has 243 values
so if you wanna think of each one of
these numbers as a
particular gene you can say that Robbie
has 243 genes.
of course the analogy with genetics is
not
meant to be exact it's only meant to be
sort of a very
rough conceptual 1. Okay so now here's a
question
how many possible strategies are there
given this representation
we'd like to find a good strategy but
the question is how
how many would we have to sort through
in order to find a good one
we can ask how many possible ones are
there. Well if we look at
243 different positions
and in each position there could be
seven possible actions
since we have the numbers 0 through six
to go here
so the total number possible strategies
is
seven-time seven-time seven for each
possible number we could put in each
position
243 times so the answer is a stunning 7
race to 243 powered
which is one of those astronomical
numbers
that you could never hope to get through
in your lifetime or the even the
lifetime up the whole universe
this is an impossible number nobody
could possibly search through all these
possible strategies
and the goal here is to have a genetic
algorithm
search intelligently in this vast space
for a good strategy now we're finally
ready to
figure out how to use a genetic
algorithm to involve strategies
so the first thing that we're gonna do
is to generate
200 random strategies. the number two
hundred is just arbitrary I just picked
it here
but some number have completely random
strategies that can be our starting
population
of strategies. So here's just a
list I ran this when I run the TA I
generate
200 individuals. For each individual is a
strategy it'll
this sequence of 243 numbers
Each numbers between 0 and six so you
can just see they look
kind of random hopefully because they are.
okay so now we're going to calculate the
fitness
each strategy by having Robbie use that
strategy
to perform a bunch of actions
in this world which is filled with this
empty soda cans
and when I'm gonna call environment
is a particular configuration empty soda
cans in his world.
we start out we can just put empty soda
cans at random places
then we let Robbie obeyed a particular
strategy for certain number of
actions that he can perform. We calculate
his reward minuses penalties
and we do it again for a different
environment that is a different
configuration of cannes.
We do this many times and that fitness
of the particular strategy
is the average reward minus penalties
over all those different environments. At
this point
we've calculated the fitness separately
for
200 different strategies and now the
amazing part is
where the pair up can create offspring
be a sexual recombination
corset in quotations here because
they're strings numbers on a computer
with random mutations. The fitter the
parents
the more offspring they create so what
we do is we pick
to fit parents pick them
probabilistically
with the fitter once been more likely to
you be chosen.
now I'm gonna create an offspring by
choosing some point
here in the strings to split them
and the offspring the child is gonna get
this
first part from its parent 1 and
the second part from its parent 2.
okay so I make the child just by copying
part of the parent 1 and part parent 2
you to create the child
and then the child can be randomly
mutated by
choosing a few random positions and
changing those two random numbers
between 0 and
6 which is randomly change some numbers
and now we have our child. And we keep
doing this again and again
until we have a new generation of
offspring
the old generation completely dies out
and then we start again
at step 2 with the new generation
and we do this for many generations
until we're happy with the strategy that
GA has found
or into or second running then we quit.
I wanted to point out that the maximum
possible fitness of a strategy is about
500
that's because there's 100 squares total
in Robbie's world and each time Robbie
starts out in an environment
there's about 50 cans that is about half
the sites
have cans on them. Sometimes it's less
than 50 sometimes it's more than 50
because the randomness but on average
it's 50
and each can is worth 10 points
and so therefore the maximum number of
points
that Robbie can get environment is about
500.
so before I ran the GA
I came up with my own hand design
strategy which was
the simplest reasonable strategy I could
think of
here it is if there's a can in the
current site pick it up
otherwise if there is can in one of the
adjacent sites move to that side
otherwise choose a random direction to
move in.
so I implemented that strategy as a
string
of 243 values.
I wrote good program to do that. Then I
that Robbie used that strategy to
collect cans and a number different
environments and I took his average
score
over number different environments and
the average fitness was 346 out of
about 500 max possible
okay so not too bad
but then I implemented the genetic
algorithm and ran the genetic algorithm
just as I explain to you and found that
its average fitness have the best the
average fitness is the best
evolve strategy was 486
almost perfect. So the question was
why is the genetic algorithm doing so
much better
than I am. What is its strategy
doing that is so much better
this turns out to be a non-trivial
question to answer
I know the logic behind my strategy but
the genetic algorithm
at the end of its run spits out a
stringer
243 numbers and says here's my strategy
this one works well its has a high
fitness but
it's hard for us to see exactly how it
works
it's kind of analogous to the human
genome
you know that the human genome was
sequenced we know
with the sequences but there's a big
jumped from knowing what the sequences
to knowing how it works what the
functionality is
how the different genes give rise to
different functions
so to look at to genetic algorithm
strategy
I first plotted for run up the genetic
algorithm
applauded the general at the best
fitness in the population as a function
I love generation
sell it as time goes up but see how
the fitness have the best individual in
the population and
are improving this is the version I
implement remote programming languages
see
you are going to be able to play with
the version I implemented in net logo
a little while but that one actually is
relatively slow
so I ran the sea persian to get these
results
so you can see that at the very first
generation
fitness is very low it's below 0
so it's negative will see why in a
minute
but very quickly the best individual in
the population
starts having higher and higher fitness
you know that each generation the whole
population turns over so these
individuals are different in each
generation
the fitness stays about the same for a
little while here and then
it takes a steep jump the steep jump
until it reaches another plateau takes
another jump
another plot to all these little
variations appear
are because the fitness function itself
have some randomness in it because we're
randomly generating
the cans. The cans aren't always the same
number
and the layout of cans ferries from
environment to environment so
you get that different fitness is even
for the same individual a different
generations because serbian tried out on
different can configurations