so last time we talked about representing a go board as a list of numbers that represent the board position and translating that into a supervised learning problem by saying the input is this list of numbers and the output is the next move that we want to make well today we're going to talk about how to take that intuition and turn it into a good game playing system in general so the first thing to notice is that in a board game it's actually not just one supervised learning problem right it's not like image classification where there's just an answer for that image when you're playing a board game there's actually several moves to go in advance so there's actually this connected series of supervised learning problems and so the first thing to think about for board games is can we exploit that property and so one thing one idea is this notion that's called tree search and it's something that you've all done if you've played a game which is okay so I'm gonna go left and then my opponent goes left and then I'll go right and then I win great oh wait a minute but if my if my opponent goes right instead all the sudden is really bad for me so wait wait so maybe I shouldn't go left at all maybe I should go right first and that's this idea of thinking a couple moves ahead seeing if that position is good for you or bad for you and then kind of working your way back up to figure out if that's something that you really want to do and so this is this tree search notion that we're gonna keep building upon now if you're doing something like tic-tac-toe you can actually just write a list of okay I'm going to move here and then if my opponent goes there I'm going to move here if they go there I'm gonna go here and you can write a computer program that's just the series of if statements so if I do this and they do that then I'll do this and that's in fact if you remember from two lectures ago that's how the original tic-tac-toe program was written and which was a big feat for nineteen fifty two but it's not very interesting today because all we just do is write this list of if statements that what we should do the problem is is that this doesn't actually work for larger games like chess and go so tic-tac-toe has about 6,000 board positions but chess has ten to the forty five and go has ten to the 171 now if if you only have to write six thousand if statements that's not so bad but if you need to write enough if statements for all of the atoms in the universe and then a lot more than that that's just physically impossible to do so we can't code a champion go program in the same way that we can code a champion tic-tac-toe program and as a fun side note on the slide is written with so many decimal places for go because in fact we know the exact number of legal go board positions a result that came out in 2016 so if we can't just write this long set of this statements and this tree is too long to think about all the way down to winning or losing how do we actually build a program so one way to do it is the way that deep blue works which is we use human intelligence to encode something that's called a value function so that is what we can say is it's good if I have my queen my queen is worth a lot of points if my opponent has her queen then that's also worth the same number of points but if she's losing her queen then that's worth a lot of points to me so to make that more concrete what we can do is say that each piece on the chessboard is worth some number of points and a position is good if I have more points than my opponent has and then finally we add on something like a checkmate if I can win is worth 200 points so I really incentivize positions in which I win and so so at the beginning of the game for instance we have the same amount of material and so that would be an even position at 0 so then what I did to construct this plot is I went and tried to play a really top chess program and a couple moves in I got trounced and my board position I was negative 17 into worth of material so that position was really bad for me and then I hit a little flip the button and now using that material I then was able to beat the chess program and got to a checkmate position that was worth 200 points so this is kind of an idea of how we can take a chess position and map it to a value that's not the same as winning or losing but it's something that says this kind of position is good or this kind of position is bad and what this lets us do now is I don't you don't have to think all the way down the tree until the game is a winner loss instead we can just lop off the entire bottom of the tree and think a couple of moves ahead and then see now we do this if I do this they do that and I think a couple moves ahead to get to a good position and in fact this is deep blue works it uses a human coded value function with this intelligence and then it does what computers do best we'll just crunch through millions and millions of positions to see which branch of the tree is best for the computer and that style of program was enough to be garry kasparov to become the chess champion so humans tried this same approach for go so what we're gonna do is take this board position we will see what shapes are good or bad and then make that be a number again like this is a +4 position or this is a -6 position unfortunately this approach really did not work one problem is that you can imagine each go stone really isn't worth very much on its own it may even be worth negative points if it's about to be captured where the power of go stones comes in is through those shapes that we talked about last lecture and these shapes are good or they're bad but they're also good or bad in relation to theirs opponent shapes and professional go players do think about this and do make these value judgments but it's extremely difficult to encode that into a set of rules a set of if statements that a computer program can use and in particular we can see on the slide a really extreme example of this where go has what are called these non-local effects so we can see in the upper left hand corner it's the exact same shape with white making the exact same move but for reasons that are complicated to explain right now on the left hand board position this is a bad really bad move to make but if white has this one additional stone in the opposite corner of the board all of a sudden this move is a really good move to make and so you can see that these kinds of positions make it very difficult to encode a value function the second really big problem with go is that go has a much larger action space than chess that is how many moves can you make at a time and so in in chess the very first move you can do 20 different things sort of all all eight of your pawns can either go one or two up and then your horses can go to one or two places and go the very opening move there are 361 different moves that you can make and so now if you're trying to think a couple moves ahead this effect compounds and makes huge differences so in chess to go each person making three moves that's nine million positions now that's a lot of positions but it's not so many for a computer to run through whereas in go three moves each is fifty eight trillion moves so that's a million times more than chess so the go tree just expands a lot which makes it very hard for computers to think that many moves ahead so next time what we're going to talk about is taking is taking these problems and go of it's hard to encode a value function and the tree is just really huge and talk about how we can use machine learning to build a champion go player