Okay, thank you for joining me we are going to talk today about what is called Max Ent, or the Principal of Maximum Entropy, or sometimes called Maximum Entropy Methods. Max Ent can break into two parts and those two parts here will have just in the 1st and 2nd half of this unit. Max Entropy was invented by a gentleman named E.T. Jaynes. He was the first person to really put this all onto a single paper for physical review and Jaynes was after some really deep philosophical, almost epistemological, questions about the nature of reality, why physical laws took the shape that they did. But in recent years, what we've found is that max ent, the principal of maximum entropy, has found an enormous amount of use in machine learning, in the modelling of real-world processes, as opposed to, lets say, their explanation and the understanding of those processes. So there are some people who are really interested in prediction, for example, they would like to learn what the stock market looks like and predict what it is going to look like tomorrow. They want to learn the natural, lets say, a particular patients cancer; they would like to model in such a way that the model is good enough that tomorrow they can predict what's going to happen next. That's a huge goal, intellectually incredibly ambitious goal that people in the artificial intelligence & machine learning community have. And max ent is huge in that part of the world and that part of the intellectual world and what we will do is begin there. In the 2nd part of the talk, i am going to try to draw connections from what you have learned on the prediction & machine learning side and try to apply those to some really exciting problems that we find in the study of biological systems, in the study of social systems. In particular I will try to get a little bit at some of the kind of deeper philosophical questions that maximum entropy raises for us particularly when it works so well as it does. So what I will do is begin with the prediction problem. In particular I begin with the kind of problem which maximum entropy excels at. That is in the prediction of high dimensional data; so high dimensional data in this case, we'll explain as a kind of working definition, say something along the following line: a system is high dimensional if the number of configurations (which we call n) is much greater than the amount of data that we have, and we call that k. So this is the amount of data and this is the number of possible configurations. So the number of ways the system could look is much greater than the number of ways you've actually observe it in the real world. The number of times we observe in the real world. Oftentimes we can talk about the dimensions of a data set, so lets for example, take an image, and lets take a black and white image. Lets say that the image has 10,000 pixels. Each pixel in your image, each pixel in your image, can take on, lets say, +1 or -1 value. Black, lets say +1, and -1 lets say is white. So any image here can have any arbitrary combination, any arbitrary combination, of pixel values. If those 10,000 pixels, and each pixel can be the +1 or -1, then the total # of images is 2^10,000. Each pixel has a discrete dimension, +1 or -1, and there's 10,000 of them. So if you're trying to build a model of, lets say, handwritten words, so you would like to model for example all the different ways in which i can write the letter 'e'. There is almost no way that you are going to acquire, in fact I think there is, probably, its probably possible to prove that the universe will die a fearsome heat death before you're able to gather enough samples of my handwriting so that the amount of data you have (k) is anyway comparable to the 2^10,000. Just to give you a sense, 2^10,000 is sort of like 1,000^1,000 10^3,000. So thats like a googal to the power of 30. So in these cases here what we would like to do is, we would like to talk about, or at least give probability to particular images drawn from a set where the total number of images is far fewer than the total number of possible images. That's where something like max ent excels...