So how many possible situations are there?
Well, I could certainly
count them this way.
I could say that, well,
there's three possibilities that
I could fill in for north:
empty, can, and wall.
Robby could find himself in an empty
squ-- with an empty square to the north,
with a square that
has a can in it, or with a wall.
So those are the three possibilities
for north.
And south has the same
three possibilities, and so on.
And so I could multiply those together:
three times three times three
five times for the five,
uh, the four directions
and the current site to get 243.
Well, of course, some of those situations
could never occur in the world.
Wall, wall, wall, wall, wall:
all of them being walls.
Well that's just impossible,
the way we've set up our world.
But for the purposes of
the genetic algorithm,
I'm not going to
worry about that.
I'm just going to list
all of those possibilities,
even ones that aren't real possibilities.
So we're going to have a total of 243
possible situations, where some of them
actually aren't really possibilities,
but that's ok, as you'll see.
And a strategy would be a listing
of all of those possibilities,
along with the action that Robby
should take
in each of those possible situations.
So here's an example of the strategy.
Here, I've--well, I'm not going to
list all 243 situations,
but I could have
the computer do that systematically.
And I've filled in a random action that
I've chosen randomly
for each of those situations.
OK, so if we look back at our first
picture, we see that Robby finds himself
in this situation:
wall to the north, empty south,
can to the east, wall to the west,
empty current site.
And I've filled in move west.
So if Robby found himself in this
situation and was obeying this strategy,
he would move west
and crash into a wall.
So I didn't say that this was a good
strategy, but it is a strategy.
It just happens to be a very bad strategy.
And now I can ask a question:
what would Robby's score be
after following this strategy
for three time steps?
We saw at time step one,
he crashed into a wall.
Well, in this world, he bounces back into
the same site he was before,
and he finds himself in exactly
the same situation.
Rather stupidly, I admit, given that
he is in the same situation,
he has to take this action again.
So he would crash into a wall again.
And likewise, on the third time step,
he would crash into the wall again.
He would never learn not to crash
into the wall.
Well that's kind of stupid, but his score
would be a minus five for every time
he crashed into the wall. So he would get
a -15 after three time steps.
So what we want the genetic algorithm
to do is to use evolution
to weed out these kinds of
really stupid strategies
and to find the good strategies.
Before I show you how the genetic
algorithm works, let's do one more quiz.
Here's our quiz: we have a strategy for
Robby, showing only two relevant situations.
Robby starts here in this position,
and the quiz asks what's Robby's score
after performing two actions
according to this above strategy.
So, what you have to do is to find
the situation in the list of situations
that corresponds to Robby's
current situation.
Do the action that is associated
with that situation,
and then look for the new situation
in the list,
do the move that's associated with
that situation,
and give what his resulting score
would be.