In the previous unit we showed that a naive maximum entropy model for the distribution of languages abundances for disturbing how popular a language is where popularity is how many times that language is used in the open source archive that model, which takes the following functional form and constraint the speculation value of language abundance that constraints only the average number of times a language is used in the archive over all languages this maximum entropy model is completely unsuccessful in reproducing the data we see it under predicts the popular languages and over predicts the less popular languages this is the best fit model, which is what happens when I solve lambda and z to get the correct average constraint that we see in the data, ok? so, what I'm gonna do is ask you to consider a richer model and again by model I mean probability distribution and this distribution is gonna try to explain not only the popularity of languages, but also, the amount of time devoted to them the amount of programmer time devoted so n here is again, as before the number of projects that appear in the language and this here epsilon is the amount of time devoted per project you should think of epsilon as modelling some kind of language efficiency and this n is modelling language popularity some languages are for example extremely popular and extremely efficient, so they have n is large and epsilon is small other languages perhaps, are not quite has popular and are perhaps less efficient or require more programmer time and of course there might be extremely efficient languages that are profoundly unpopular, and vise versa, the intuition here is that we are tying to describe two things at once, previously we where building a model that only included one variable, popularity, now we are gonna try to build a model that has two variables popularity and efficiency or amount of time devoted per project so, what we're gonna do is we're gonna constraint two quantities, as before we're gonna constraint, average number of projects and we're also gonna constraint the average amount of programer time devoted to a language so the idea is that there is some intrinsic force that keeps the average number of projects per language constant there is some kind of large scale social factors or set of social factors that keeps the average number of projects constant and also constraints the average number of programmer time devoted to a particular language so let's put in a maximum entropy distribution for this two parameter model we're gonna constraint n so we know that we have a term that looks like e to the lambda one times n as before e to the negative lambda one times n and we're also going to have a term in the exponential that looks like lambda two, that's the second Lagrange multiplier times n times epsilon and of course there will be an over all normalization factor of Z so the only thing that should be mysterious to you is where we got this second term from we know where this comes from but why do we have this? and so what I'll do is I'll very quickly remind you of how we got this functional forms in the first place we did the derivative of the function we wish to maximize the entropy and now we will have three constraints we'll have a constraint on the average number of projects that constraint g1 g1 of p is equals c we'll have a constraint g2 on this term here g2 of p is equal to some c prime and finally we have that over all normalization constraint g3 of p i and so I'll write this out here g1 of p is equal and g2 of p so, g1 of p is the average number of languages or the average number of projects for particular language that's P(n) times n and we can write that out in terms of the joint distribution as P(n,epsilon) times n now we not only sum over all values of n but we integrate over all of values of programmer time epsilon g2(p) looks very similar, except now we're constraining not just n but n times epsilon so we have to integrate over epsilon and we have to sum over all values of p and that's what our equation looks like that's what our constraint looks like for this second quantity here so now, when we take the derivative of g1 with respect of p1 we know what that looks like that looks like lambda 1 times n and when we take the derivative of this here with respect of p i everything comes out, except for lambda2 n epsilon and finally g 3 is just a constant on the over all normalization is says that when I integrate over all epsilon and sum over all n that's equal to 1 so when I take the derivative of this here with respect to p we get lambda 3 times unity on this other side here the derivative of entropy with respect to p i is equal to log p i negative log p i, plus, or rather minus one and when we arrange this I get p i is proportional to e to the negative lambda 1 n minus lambda 2 n epsilon everything else can be factored into some underlying normalization so that's where this functional form comes from and is a general principle a general rule of thumb that if you wanna see what the maximum entropy distribution is constraining look term by term look term by term in the exponential to find it what we have now is the following functional form for the joint distribution of language popularity and language efficiency and I'm gonna do one more thing which is I'm gonna try to recover the original marginal distribution Pn, by integrating the joint distribution with respect of epsilon so what I did was I build a more sophisticated model that had two constraints it constraint the average number of projects and the average amount of programmer time devoted to a particular language and now what I'm gonna do is I'm gonna integrate out this hidden variable so if I do that, if I do this integral here's what I get I still get this factor of e to the negative lambda 1 n I'll call this lambda prime I still get the factor of 1 over Z and the I simply have to do the following little integral negative lambda 2 n epsilon d-epsilon and that equal to 1 over n negative lambda 2 times e to the negative lambda 2 n epsilon my integral range is from zero to infinity and so when I put this terms in here my final functional form, and again I'm not keeping track of all the multiplications so I'm turning Z into Z' to absorb this factors here the final functional form looks like an exponential on top divided by n so my new functional form for the language abundance is modified previously, just to remind you previously, when I did Max. Ent. constraining only the average number of projects my distribution of language popularity looked like an exponential and we realized this is a really bad fit this is the fit you see in the plot here but now, what we've done is produce a joint distribution with two constraints one of the constraints is the same the other is a new constraint that involves that hidden variable programmer efficiency and when I integrate out that hidden variable I get a new functional form for language popularity in particular, we find that is proportional to e to the negative lambda n divided by n this distribution is called the Fisher log series so, here's the functional form which we got by integrating out this hidden programmer efficiency variable and what you can see now is that that distribution is a far better model for language popularity and what we've done here just to be clear is that we postulated this additional constraint involving a hidden variable that we can't see and we've integrated out that hidden variable what we should be impressed by is the fact not only that we are able to get a good fit but also by the underlying mechanisms that the Max. Ent. model is suggesting to you it's suggesting that two things are constraint languages tend not to get too popular on average there is some kind of limit to how popular a language can get on average that's the source of this constraint, here but there's also a limit to how much time or how much effort can be devoted to projects in a particular language and that's this constraint here so essentially, what's happening is this languages up here are allowed to become more popular because they don't take up too much programmer time per project this in the language of an ecologist this are the really abundant species with really low metabolic needs so we're now able to reproduce this really popular languages we're able to accurately explain, or we're able to accurately predict that there should be really run away popular languages like C and C++ and Java and also we're able to explain why the popularity in the languages in the tali here is lower than you would expect from an exponential model in particular presumably a lot of this languages are associated with lower efficiencies or rather epsilon higher meaning that there is a higher amount of time devoted per project to those languages and intuitively, we're programming in Java this is a really high efficiency language compared to programming in Haskell this is a profoundly beautiful language, perhaps but is one that you would not associate with efficiency writing fast libraries for parsing html pages so, by postulating the existence of this hidden variable and using a maximum entropy argument we can actually start reproducing the data and it feels like we starting to talk about important constraints in the system so you should be impressed by this in part because it's deeper than the result we've got in the taxi cab case in the taxi cab case we picked a particular constraint and it just so happened to correspond to a really simple mechanistic model and you might say well I could have come up with that mechanistic model all on my own thank you very much here what we've done it's postulate a set of constraints that are scientifically substantive the first constraint says that we're gonna constraint the average number of projects so the open source movement can do what ever it wants as long as it keeps the average number of projects per language fixed people are making decisions in fact, they are making decisions that are in a maximally disordered way that's the maximum entropy part of the assumption so that they preserve this quantity here on average and further more the open source community by what ever method or by what ever group cognitive method that's able to do this and presumably by a hole lot of different interacting process also preservers the average amount of programmer time devoted to a particular language