in order to evolve strategies with a genetic algorithm we need to encode them in such a way that the genetic algorithm can deal with them so here's how this works. So remember here's our example the strategy with all possible situations represented. And an action given for each possible situation what we can do now is just take the list of actions and remember the order of the possibles 243 situations. So now we can cut off this sort of actions and remember that the first action corresponds to the first situation which is everything empty. The second action corresponds to the second situation which is everything empty except for can of the current site and so on. So now we can take just that list actions and we can encode it according to a numerical code. So what I'm gonna do is give each action and number that its code number 026 then I can encode this particular strategy by substituting the code number for each other actions in the list. So now. a 0 here in this first position means that the very first situation that is all empty corresponds to the action move north a 6 here means that the third situation whatever that was in our ordered list corresponds to the action move random and so on. And this way we now have a list of numbers that represents a specific strategy when Robbie coast to perform an action he looks around his situation he remembers the ordering a possible situations and he says oh I'm in situation let's say 201 that he looks for the two hundred and first entry on this list figures out what the code corresponds to in terms of an action and performs that action. and the does that again and again and again so that's how work in and hold a strategy so you can think of this strategy very in kind of metaphorical sense as Robbie's DNA this is what's going to evolve thislist of numbers and of course this list numbers has 243 values so if you wanna think of each one of these numbers as a particular gene you can say that Robbie has 243 genes. of course the analogy with genetics is not meant to be exact it's only meant to be sort of a very rough conceptual 1. Okay so now here's a question how many possible strategies are there given this representation we'd like to find a good strategy but the question is how how many would we have to sort through in order to find a good one we can ask how many possible ones are there. Well if we look at 243 different positions and in each position there could be seven possible actions since we have the numbers 0 through six to go here so the total number possible strategies is seven-time seven-time seven for each possible number we could put in each position 243 times so the answer is a stunning 7 race to 243 powered which is one of those astronomical numbers that you could never hope to get through in your lifetime or the even the lifetime up the whole universe this is an impossible number nobody could possibly search through all these possible strategies and the goal here is to have a genetic algorithm search intelligently in this vast space for a good strategy now we're finally ready to figure out how to use a genetic algorithm to involve strategies so the first thing that we're gonna do is to generate 200 random strategies. the number two hundred is just arbitrary I just picked it here but some number have completely random strategies that can be our starting population of strategies. So here's just a list I ran this when I run the TA I generate 200 individuals. For each individual is a strategy it'll this sequence of 243 numbers Each numbers between 0 and six so you can just see they look kind of random hopefully because they are. okay so now we're going to calculate the fitness each strategy by having Robbie use that strategy to perform a bunch of actions in this world which is filled with this empty soda cans and when I'm gonna call environment is a particular configuration empty soda cans in his world. we start out we can just put empty soda cans at random places then we let Robbie obeyed a particular strategy for certain number of actions that he can perform. We calculate his reward minuses penalties and we do it again for a different environment that is a different configuration of cannes. We do this many times and that fitness of the particular strategy is the average reward minus penalties over all those different environments. At this point we've calculated the fitness separately for 200 different strategies and now the amazing part is where the pair up can create offspring be a sexual recombination corset in quotations here because they're strings numbers on a computer with random mutations. The fitter the parents the more offspring they create so what we do is we pick to fit parents pick them probabilistically with the fitter once been more likely to you be chosen. now I'm gonna create an offspring by choosing some point here in the strings to split them and the offspring the child is gonna get this first part from its parent 1 and the second part from its parent 2. okay so I make the child just by copying part of the parent 1 and part parent 2 you to create the child and then the child can be randomly mutated by choosing a few random positions and changing those two random numbers between 0 and 6 which is randomly change some numbers and now we have our child. And we keep doing this again and again until we have a new generation of offspring the old generation completely dies out and then we start again at step 2 with the new generation and we do this for many generations until we're happy with the strategy that GA has found or into or second running then we quit. I wanted to point out that the maximum possible fitness of a strategy is about 500 that's because there's 100 squares total in Robbie's world and each time Robbie starts out in an environment there's about 50 cans that is about half the sites have cans on them. Sometimes it's less than 50 sometimes it's more than 50 because the randomness but on average it's 50 and each can is worth 10 points and so therefore the maximum number of points that Robbie can get environment is about 500. so before I ran the GA I came up with my own hand design strategy which was the simplest reasonable strategy I could think of here it is if there's a can in the current site pick it up otherwise if there is can in one of the adjacent sites move to that side otherwise choose a random direction to move in. so I implemented that strategy as a string of 243 values. I wrote good program to do that. Then I that Robbie used that strategy to collect cans and a number different environments and I took his average score over number different environments and the average fitness was 346 out of about 500 max possible okay so not too bad but then I implemented the genetic algorithm and ran the genetic algorithm just as I explain to you and found that its average fitness have the best the average fitness is the best evolve strategy was 486 almost perfect. So the question was why is the genetic algorithm doing so much better than I am. What is its strategy doing that is so much better this turns out to be a non-trivial question to answer I know the logic behind my strategy but the genetic algorithm at the end of its run spits out a stringer 243 numbers and says here's my strategy this one works well its has a high fitness but it's hard for us to see exactly how it works it's kind of analogous to the human genome you know that the human genome was sequenced we know with the sequences but there's a big jumped from knowing what the sequences to knowing how it works what the functionality is how the different genes give rise to different functions so to look at to genetic algorithm strategy I first plotted for run up the genetic algorithm applauded the general at the best fitness in the population as a function I love generation sell it as time goes up but see how the fitness have the best individual in the population and are improving this is the version I implement remote programming languages see you are going to be able to play with the version I implemented in net logo a little while but that one actually is relatively slow so I ran the sea persian to get these results so you can see that at the very first generation fitness is very low it's below 0 so it's negative will see why in a minute but very quickly the best individual in the population starts having higher and higher fitness you know that each generation the whole population turns over so these individuals are different in each generation the fitness stays about the same for a little while here and then it takes a steep jump the steep jump until it reaches another plateau takes another jump another plot to all these little variations appear are because the fitness function itself have some randomness in it because we're randomly generating the cans. The cans aren't always the same number and the layout of cans ferries from environment to environment so you get that different fitness is even for the same individual a different generations because serbian tried out on different can configurations