In this subunit,
I'm going to go through
a detailed example of a
simple genetic algorithm that evolves
control programs for virtual robots.
Well, our virtual robot is called "Robby".
He's a very simple robot
who lives in a simulated world.
I described him first in my book,
"Complexity: A Guided Tour",
and his job is to go around,
collecting empty soda cans.
Robbie was inspired by a real robot,
called "Herbert",
which operated in the 1980's,
in the MIT Artificial Intelligence Lab.
It went around the lab, on its wheels,
going into the different offices,
and collecting empty soda cans,
and taking them to the recycle bin.
Herbert was the work of Jonathan Connell,
a graduate student at MIT working with
Rod Brooks and Peter Ning,
and was a very impressive robot
that used only very simple algorithms
to control its behaviour.
Our version of "Herbert" is called "Robby",
and he's a virtual robot
who lives this simulated world
consists of a 10 by 10 grid of squares
You can think of the squares as
sort of "virtual offices",
and each office is either empty
or it contains and empty soda can.
So here's an example of a soda can.
And Robby's job is to move around this world,
collecting the empty soda cans.
So we have to give Robby a program
that tells him what to do at each step.
I could program that myself, but
instead I'm going to use a
genetic algorithm to evolve programs to
do this task.
So first I need to tell you what
Robby can see and what he can do.
He's a very simple robot, and he has
very bad vision, so the only thing
he can see are the contents
of his own square
and the squares to the north, south
east and west.
So his current site, north south,
east and west.
In his current state, his current office
or room is empty.
To the north there is a wall,
to the west there is a wall,
to the south there is an empty square
and to the east there is a can.
So that's all he can see.
He uses what he can see to decide which
action to take at each time step.
And he has seven possible actions:
he can move to the north,
move to the south,
move to the east,
move to the west,
that's all moving one square.
He can make a random move:
choose a random direction to move in.
He can stay where he is and do nothing,
or he can try to pick up a can.
And if he tries to pick up a can and
there is a can there, he succeeds.
Or he can try to pick up a can mistakenly
in an empty square.
And he has certain rewards and penalties
that he gets for doing these things.
If he picks up a can, he gets 10 points,
that's a good thing to do.
If he tries to pick up a can on an empty site,
he gets minus one point,
and if he crashes into a wall,
he gets minus five points.
And his overall score
as he moves around is the
sum of the rewards and penalties
that he gets.
The goal is to use a genetic algorithm to
evolve a control program
that is a strategy for Robby.
So let me talk about what a "strategy" is.
The definition that I'm going to use is
a set of rules that specifies an action
for every possible situation.
Where the possible situations are
the inputs to Robby,
the possible inputs that he can see.
So let's make a list of possible situations
that Robby can find himself in.
The possible situations would be
what he sees
to the north, south, east,
west, his current site.
And a strategy would be for each possible
situation an action he would take.
So here's a possible situation--
maybe the simplest one--
everything is empty.
He's in an empty cell and
cells in all directions are empty.
That's one possible situation.
Here's another possible situation:
everything is empty except
in his current site he has a can.
And we could go on and on listing
possible situations.
And in a second, we'll figure out how
many possible situations there are,
but just to make sure that you are getting
all of this, let's stop for a quick quiz.
The quiz has two questions.
The first says look at this situation,
and suppose that Robby has a score
of zero so far,
and he takes some actions:
move east, move east, pick up can,
move east, pick up can, move south.
So he takes those six actions, and
you have to answer what his
situation is after those six actions,
and what his score is.
And here's to remind you what
the scoring process is.