last lecture we talked about how to build a game playing program and talked about the basics of tree search today we're gonna talk about the alphago program and its derivatives alphago master and alphago zero who were able to become the champion go player of the world if we remember from last time there were two major difficulties with building a go program so one is that when chess we could build this value function by hand that told us using human intelligence that told us how good each board position was whereas in go we found there was really difficult to do this and humans really struggled to write programs they were able to encode this human knowledge about what it means to be good at playing go the second problem was that in chess there are about 20 moves that you can make at any given time whereas and go there's something like 300 moves that you can make at any given time and so in particular thinking ahead three moves for each player at the beginning of a go game is fifty eight trillion positions which is just so many to have to play and so what we're gonna do now is talk about how we can get around these problems so the first is to realize that there are two major supervised learning probes going on here so we talked before about that we can take each go board position and represent it as a list of numbers where each position is either empty black or white and so the first learning problem is can we take this board position and can we map it to a value so this top position here this is worth two points for black so it's good whereas the position below it is bad for black so it's negative one point this and so what we can do now is instead of having a human coded value function of this Donis worth 6 points on its own and that stones where to which didn't work at all instead we're gonna learn it using a machine learning algorithm the second insight is also we need to figure out which moves we need to look at because as we said looking at every possible move it's just too many so the second thing is that also it's true that when we have this board state so this list of numbers we also want to learn what are the likely moves that I'm going to play so which five positions should I think about in order to prune this tree so it's not so branching out so crazily so again there are these two major machine learning problems the first taking the board position and mapping it to a value and the second is taking the board position and mapping to what we should think about so the way alphago worked was they turned this in this machine learning problem by they downloaded a list of like all professional games played in the last 30 years and a whole bunch of games played between really high level amateurs and then they literally made it a supervised learning problem so learn which board positions meant that the player was more likely to win and in this board position which moves was the human player likely to play so they like machine learned those two things the second thing that they did is they augmented this value function with something that's called Monte Carlo tree search so if you remember from before the tree search idea is this idea of thinking ahead a couple of moves but now we're gonna do is at some point at the bottom of the tree we're going to simulate a bunch of random games and we're gonna in order to imagine what's my probability of winning given this board position and so then we're gonna combine the value function from the machine learning plus this specialized algorithm Monte Carlo tree search to give us a total likelihood of winning and then we can think it back up now the second thing that we're gonna do is we're gonna have the computer play against itself a whole bunch of times and this process is known as reinforcement learning so we get the computer to play a million games against itself and to have an update on how it thinks oh these are probably going to be good moves or these are probably going to be winning positions and so by playing itself a lot it starts from a human knowledge but then updates itself playing to end up with something that's this hybrid of human knowledge and computer knowledge and in fact this procedure was enough to beat alpha to be at least eight all in this four to one match that was played in in 2016 now there were two kind of major criticisms of the original alphago program one was that it needed this human database of knowledge so he took these professional go-go players had learned from them and so a lot of people were asking like does that mean these techniques only work when we have expert systems available I mean there aren't very many domains that we care about where humans have dedicated their lives to answering this specific question that you're asking the second thing that I didn't talk about but was my major concern with the original alphago algorithm was that they also had these hand coded features that are really particular to the game of go but make the tree search procedure difficult to enact correctly and so lose this domain-specific knowledge that was a specifically tackling features that are hard for go and so my concern was well there are these the same problems exist in the real word cases but in a lot of the real world cases we don't know how to hand come a hand code these domain-specific knowledge so in 2017 the deepmind released alphago zero that addressed both of these criticisms so rather than starting from this human database to run these machine learning things they only did the reinforcement lying procedure so he started with this really terrible playing go program and had it play against itself a lot of times and by just just doing the reinforcement learning in a really clever way that was new they were able to build this program and this new alphago zero programs zero meaning zero knowledge it was able to beat the original alphago program a hundred to zero in a hundred game match and beat the second edition alphago master 89 to 11 so to recap this whole go section we have go as a test case for machine learning and we like it because it has this constrained set of moves and we know what it means to win and yet computers for a long time are really bad at doing it and it feels like it has these properties that we need in order to build the telogen systems that is not looking at any possible thing that you could do but focusing on a couple of things that seem promising and then even if you can't read out everything that's going to happen having some sense of what kinds of features are good for you and what kinds of features are bad for you and we saw that we went from in 2015 losing to top humans even though you could make 5 moves at the beginning of the game to all the sudden being way better than the top humans over a span of a couple of months and this was all done by the clever use of transforming this problem into a machine learning problem and then taking that machine learning problem and running this algorithm reinforcement learning and they're able to finally build this alphago zero that use any human knowledge at all