so last time we talked about

representing a go board as a list of

numbers that represent the board

position and translating that into a

supervised learning problem by saying

the input is this list of numbers and

the output is the next move that we want

to make well today we're going to talk

about how to take that intuition and

turn it into a good game playing system

in general so the first thing to notice

is that in a board game it's actually

not just one supervised learning problem

right it's not like image classification

where there's just an answer for that

image when you're playing a board game

there's actually several moves to go in

advance so there's actually this

connected series of supervised learning

problems and so the first thing to think

about for board games is can we exploit

that property and so one thing one idea

is this notion that's called tree search

and it's something that you've all done

if you've played a game which is okay so

I'm gonna go left and then my opponent

goes left and then I'll go right and

then I win great

oh wait a minute but if my if my

opponent goes right instead all the

sudden is really bad for me so wait wait

so maybe I shouldn't go left at all

maybe I should go right first and that's

this idea of thinking a couple moves

ahead seeing if that position is good

for you or bad for you and then kind of

working your way back up to figure out

if that's something that you really want

to do and so this is this tree search

notion that we're gonna keep building

upon now if you're doing something like

tic-tac-toe you can actually just write

a list of okay I'm going to move here

and then if my opponent goes there I'm

going to move here if they go there I'm

gonna go here and you can write a

computer program that's just the series

of if statements so if I do this and

they do that then I'll do this and

that's in fact if you remember from two

lectures ago that's how the original

tic-tac-toe program was written and

which was a big feat for nineteen fifty

two but it's not very interesting today

because all we just do is write this

list of if statements that what we

should do the problem is is that this

doesn't actually work for larger games

like chess and go

so tic-tac-toe has about 6,000 board

positions but chess has ten to the forty

five and go has ten to the 171 now if if

you only have to write six thousand if

statements that's not so bad but if you

need to write enough if statements for

all of the atoms in the universe

and then a lot more than that that's

just physically impossible to do so we

can't code a champion go program in the

same way that we can code a champion

tic-tac-toe program and as a fun side

note on the slide is written with so

many decimal places for go because in

fact we know the exact number of legal

go board positions a result that came

out in 2016 so if we can't just write

this long set of this statements and

this tree is too long to think about all

the way down to winning or losing how do

we actually build a program so one way

to do it is the way that deep blue works

which is we use human intelligence to

encode something that's called a value

function so that is what we can say is

it's good if I have my queen my queen is

worth a lot of points if my opponent has

her queen then that's also worth the

same number of points but if she's

losing her queen then that's worth a lot

of points to me so to make that more

concrete what we can do is say that each

piece on the chessboard is worth some

number of points and a position is good

if I have more points than my opponent

has and then finally we add on something

like a checkmate if I can win is worth

200 points so I really incentivize

positions in which I win and so so at

the beginning of the game for instance

we have the same amount of material and

so that would be an even position at 0

so then what I did to construct this

plot is I went and tried to play a

really top chess program and a couple

moves in I got trounced and my board

position I was negative 17 into worth of

material so that position was really bad

for me and then I hit a little flip the

button and now using that material I

then was able to beat the chess program

and got to a checkmate position that was

worth 200 points so this is kind of an

idea of how we can take a chess position

and map it to a value that's not the

same as winning or losing but it's

something that says this kind of

position is good or this kind of

position is bad and what this lets us do

now is I don't you don't have to think

all the way down the tree until the game

is a winner loss instead we can just lop

off the entire bottom of the tree and

think a couple of moves ahead and then

see now we do this if I do this they do

that and I think a couple moves ahead to

get to a good position and in fact this

is

deep blue works it uses a human coded

value function with this intelligence

and then it does what computers do best

we'll just crunch through millions and

millions of positions to see which

branch of the tree is best for the

computer and that style of program was

enough to be garry kasparov to become

the chess champion so humans tried this

same approach for go so what we're gonna

do is take this board position we will

see what shapes are good or bad and then

make that be a number again like this is

a +4 position or this is a -6 position

unfortunately this approach really did

not work one problem is that you can

imagine each go stone really isn't worth

very much on its own it may even be

worth negative points if it's about to

be captured where the power of go stones

comes in is through those shapes that we

talked about last lecture and these

shapes are good or they're bad but

they're also good or bad in relation to

theirs opponent shapes and professional

go players do think about this and do

make these value judgments but it's

extremely difficult to encode that into

a set of rules a set of if statements

that a computer program can use and in

particular we can see on the slide a

really extreme example of this where go

has what are called these non-local

effects so we can see in the upper left

hand corner it's the exact same shape

with white making the exact same move

but for reasons that are complicated to

explain right now on the left hand board

position this is a bad really bad move

to make but if white has this one

additional stone in the opposite corner

of the board all of a sudden this move

is a really good move to make and so you

can see that these kinds of positions

make it very difficult to encode a value

function the second really big problem

with go is that go has a much larger

action space than chess that is how many

moves can you make at a time and so in

in chess the very first move you can do

20 different things sort of all all

eight of your pawns can either go one or

two up and then your horses can go to

one or two places and go the very

opening move there are 361 different

moves that you can make and so now if

you're trying to think a couple moves

ahead this effect compounds and makes

huge differences

so in chess to go each person making

three moves that's nine million

positions now that's a lot of positions

but it's not so many for a computer to

run through whereas in go three moves

each is fifty eight trillion moves so

that's a million times more than chess

so the go tree just expands a lot which

makes it very hard for computers to

think that many moves ahead so next time

what we're going to talk about is taking

is taking these problems and go of it's

hard to encode a value function and the

tree is just really huge and talk about

how we can use machine learning to build

a champion go player