Just to remind youwhere we are, we have two constraints, one is, the expectation value of x, the average waiting time is four minutes, the other is that the probabilities sum to 1. Those are our two constraints, and then what we're going to do is maximize the entropy of a distribution, subject to those constraints. Okay? So S is going to be the entropy function, and we're not going to have 1 g, but in fact 2 gs, okay? One g is this function, one g is this function. Okay? So, how do you do maximization, under constraints, for more than one constraint? I gave you an intuitive picture about how you could do one constraint at a time, but how could you do two constraints? I'm going to tell you the answer, because it's much harder to work through the problem of multiple constraints, but it's an intuitive answer, one that is worth remembering, and if you ever want to work through it, there's plenty of places to find the answer. So, here's how to do lagrange multipliers, the method of lagrange multipliers, you remember the lagrange multiplier is that lambda term, okay? And so, that's where the method get its name from, you want to maximize the function f, subject to a set of constraints, now, and we'll number these constraints g sub i, so g1, g2, all the way through your n constraints however many you have, and the way you do that is you set the gradient of the function equal to a linear combination of the gradients of the constraints. So, here's the case where you have n constraints. So, this is the general method of lagrange multipliers, in order to maximize this function subject to these n constraints, set the gradient of the function f equal to some linear combination of gradient of the function g, and then the problem is just how do you find g. What you know is, you know your maximum point is such that you can add together all of these gradients in such a way, with some weight so that you can reproduce the original gradient of the contours. Okay? And so, now the problem is, what are these Ls, or what are these lambdas, so what I'm going to do is, I'm going to walk you through, now the problem of maximum entropy, using this formula here, and if these seem mysterious right now, by the end, they hopefully should not be. What you're going to do is turn knobs, and twiddle knobs around, until you get the lambda such that those lambdas satisfy the particular constraint values you have in mind. So, we're going to maximize not arbitrary function f, but in fact the entropy, and our constraints are going to be a constraint on the average, and a constraint on the normalization. So, we want the derivative of S with respect to p_i, we do this term by term in the vector, we want that to be equal to lambda1, times the derivative of g1 wrt p_i, plus lambda2 times the derivative of g2, with respect to p_i. Okay? So this reminds you, S is the entropy of the distribution, S is equal to minus the sum over all possible waiting times. So again, just for convenience's sake, I'm talking about the discrete case, you can take limits, if you have your measures set up correctly, and you can turn this into integrals, and turn this into integrals, and this as well into integral. But it's easier just conceptually, to talk about the discrete case first. So, g1, remember, this is a function of p, alright, p is a vector here, okay? g1 is just the sum i 0 to infinity, okay, of p_i times i. and I'm using just, I'm using i now instead of x, it's easier to write for me. Okay? So, this is the constraint function, okay, that constraints the average value. And of course what we want in the end is we want g1(p) to be equal to 4 minutes. g2(p) is just the normalization constraint so the function just looks like summing over all values of p, and of course in the end what we'll do is we'll take g2 = 1. And we previously defined the entropy here. So, what if the derivative of the entropy, with respect to a particular probability, right, a particular probability of a particular configuration, okay? So, Alright move this out here, S is equal to negative p_i log p_i i from 0 to infinity, okay? So, dS/d(p_i), equals, well the only term that's going to survive is where you have the p_i in it, and then we have the derivative of p_i log p_i, that has two terms: log p_i, and then the other one is p_i times the derivative of log p_i. derivative of log p_i is 1/p_i, so in fact, you have plus 1. So, this is the left hand side of your lagrange multiplier equation. And just to remind you, we set the base of the log to e. So, now we have to take the derivative of g1 with respect to p_i, okay? So again, take the derivative of this sum here with respect to p_i, and what of course you'll find, is that dg1/d(p_i) = i, and finally, dg2/d(p_i) = 1. There's only 1 term in the sum that doesn't get destroyed by the derivative. So, let's now put all this together, we have negative log p_i -1 is equal to lambda1, times the derivative of g_1, with respect to p_i, which is i, plus the derivative of g2 with respect to p_i, times lambda 2, so this is now our equation, okay, that is satisfied when you try to maximize the entropy, try to maximize this function, subject to these constraints, for some value of the constraint. So let's solve for p_i. So, let's move things around here, and we have negative 1 minus lambda1 i, minus lambda 2, equals log p_i, we'll exponentiate both sides, flip them around, and we have p_i is equal to e to the negative 1 minus lambda1 i, minus lambda 2. Okay? And, we can write that somewhat more succinctly in the following way, e to the lambda1 i divided by Z, where Z is equal to e to the 1 plus lambda 2. The probability of waiting for a certain time i, is equal to e to the negative lambda1 times i, there's an exponential distribution of waiting times. Now, all that remains is to figure out what on earth lambda1 is, and what on earth Z is. And what we're going to do is we're going to turn, we're going to figure out the value you have to set lambda1 to, in order to satisfy this particular value of the constraint, and this particular value of the constraint. So, we know the functional form of the distribution, and now we just have to figure out the parameters of that function. And there will be two parameters. So the first thing we know is of course, that the probability is normalized, and that means in fact that, okay? plugging in this functional form, and so now, already, we can solve for Z in terms of lambda1. So, eliminating the first variable here, Z, is easy. We can just set Z equal to the sum from i 0 to infinity, e to the negative lambda1 i okay? So we already eliminated one variable, and now, all we have to do is to solve for the other constraint. Okay? In particular, just let me write this here In particular, we have the sum from i 0 to infinity times e to the negative lambda1 i i, all over Z, that has to be equal to 4.