last lecture we talked about how to

build a game playing program and talked

about the basics of tree search today

we're gonna talk about the alphago

program and its derivatives alphago

master and alphago zero who were able to

become the champion go player of the

world if we remember from last time

there were two major difficulties with

building a go program so one is that

when chess we could build this value

function by hand that told us using

human intelligence that told us how good

each board position was whereas in go we

found there was really difficult to do

this and humans really struggled to

write programs they were able to encode

this human knowledge about what it means

to be good at playing go the second

problem was that in chess there are

about 20 moves that you can make at any

given time whereas and go there's

something like 300 moves that you can

make at any given time and so in

particular thinking ahead three moves

for each player at the beginning of a go

game is fifty eight trillion positions

which is just so many to have to play

and so what we're gonna do now is talk

about how we can get around these

problems so the first is to realize that

there are two major supervised learning

probes going on here so we talked before

about that we can take each go board

position and represent it as a list of

numbers where each position is either

empty black or white and so the first

learning problem is can we take this

board position and can we map it to a

value so this top position here this is

worth two points for black so it's good

whereas the position below it is bad for

black so it's negative one point this

and so what we can do now is instead of

having a human coded value function of

this Donis worth 6 points on its own and

that stones where to which didn't work

at all instead we're gonna learn it

using a machine learning algorithm the

second insight is also we need to figure

out which moves we need to look at

because as we said looking at every

possible move it's just too many so the

second thing is that also it's true that

when we have this board state so this

list of numbers we also want to learn

what are the likely moves that I'm going

to play so which five positions should I

think about in order to prune this tree

so it's not so branching out so crazily

so again there are these two major

machine learning problems the first

taking the board position and mapping it

to a value and the second is taking the

board position and mapping to what we

should think about so the way alphago

worked was they turned this in this

machine learning problem by they

downloaded a list of like all

professional games played in the last 30

years and a whole bunch of games played

between really high level amateurs and

then they literally made it a supervised

learning problem so learn which board

positions meant that the player was more

likely to win and in this board position

which moves was the human player likely

to play so they like machine learned

those two things the second thing that

they did is they augmented this value

function with something that's called

Monte Carlo tree search so if you

remember from before the tree search

idea is this idea of thinking ahead a

couple of moves but now we're gonna do

is at some point at the bottom of the

tree we're going to simulate a bunch of

random games and we're gonna in order to

imagine what's my probability of winning

given this board position and so then

we're gonna combine the value function

from the machine learning plus this

specialized algorithm Monte Carlo tree

search to give us a total likelihood of

winning and then we can think it back up

now the second thing that we're gonna do

is we're gonna have the computer play

against itself a whole bunch of times

and this process is known as

reinforcement learning so we get the

computer to play a million games against

itself and to have an update on how it

thinks oh these are probably going to be

good moves or these are probably going

to be winning positions and so by

playing itself a lot it starts from a

human knowledge but then updates itself

playing to end up with something that's

this hybrid of human knowledge and

computer knowledge and in fact this

procedure was enough to beat alpha to be

at least eight all in this four to one

match that was played in in 2016 now

there were two kind of major criticisms

of the original alphago program one was

that it needed this human database of

knowledge so he took these professional

go-go players had learned from them and

so a lot of people were asking like does

that mean these techniques only work

when we have expert systems available I

mean there aren't very many domains that

we care about where humans have

dedicated their lives to answering this

specific question that you're asking

the second thing that I didn't talk

about but was my major concern with the

original alphago algorithm was that they

also had these hand coded features that

are really particular to the game of go

but make the tree search procedure

difficult to enact correctly and so lose

this domain-specific knowledge that was

a specifically tackling features that

are hard for go and so my concern was

well there are these the same problems

exist in the real word cases but in a

lot of the real world cases we don't

know how to hand come a hand code these

domain-specific knowledge so in 2017 the

deepmind released alphago zero that

addressed both of these criticisms so

rather than starting from this human

database to run these machine learning

things they only did the reinforcement

lying procedure so he started with this

really terrible playing go program and

had it play against itself a lot of

times and by just just doing the

reinforcement learning in a really

clever way that was new they were able

to build this program and this new

alphago zero programs zero meaning zero

knowledge it was able to beat the

original alphago program a hundred to

zero in a hundred game match and beat

the second edition alphago master 89 to

11 so to recap this whole go section we

have go as a test case for machine

learning and we like it because it has

this constrained set of moves and we

know what it means to win and yet

computers for a long time are really bad

at doing it and it feels like it has

these properties that we need in order

to build the telogen systems that is not

looking at any possible thing that you

could do but focusing on a couple of

things that seem promising and then even

if you can't read out everything that's

going to happen having some sense of

what kinds of features are good for you

and what kinds of features are bad for

you and we saw that we went from in 2015

losing to top humans even though you

could make 5 moves at the beginning of

the game to all the sudden being way

better than the top humans over a span

of a couple of months and this was all

done by the clever use of transforming

this problem into a machine learning

problem and then taking that machine

learning problem and running this

algorithm reinforcement learning and

they're able to finally build this

alphago zero that

use any human knowledge at all