In the previous unit we showed that
a naive maximum entropy model
for the distribution
of languages abundances
for disturbing how popular a language is
where popularity is
how many times that language
is used in the open source archive
that model, which takes
the following functional form
and constraint the speculation value
of language abundance
that constraints only
the average number of
times a language is used
in the archive over all languages
this maximum entropy model
is completely unsuccessful
in reproducing the data we see
it under predicts the popular languages
and over predicts
the less popular languages
this is the best fit model, which is what
happens when I solve lambda and z
to get the correct average constraint
that we see in the data, ok?
so, what I'm gonna do is ask you
to consider a richer model
and again
by model I mean probability distribution
and this distribution is
gonna try to explain not only
the popularity of languages, but also,
the amount of time devoted to them
the amount of programmer time devoted
so n here is again, as before
the number of projects
that appear in the language
and this here
epsilon is the amount of
time devoted per project
you should think of epsilon as modelling
some kind of language efficiency
and this n is modelling
language popularity
some languages are for example
extremely popular
and extremely efficient, so they have
n is large and epsilon is small
other languages perhaps,
are not quite has popular
and are perhaps less efficient
or require more programmer time
and of course there might be
extremely efficient languages
that are profoundly unpopular,
and vise versa,
the intuition here is that we are tying to
describe two things at once, previously
we where building a model
that only included
one variable, popularity, now we are gonna
try to build a model that
has two variables
popularity and efficiency
or amount of time devoted per project
so, what we're gonna do
is we're gonna constraint
two quantities, as before
we're gonna constraint,
average number of projects
and we're also gonna constraint
the average amount of programer time
devoted to a language
so the idea is that there is some
intrinsic force that keeps the average
number of projects per language constant
there is some kind of
large scale social factors
or set of social factors
that keeps the average
number of projects constant
and also constraints
the average number of programmer time
devoted to a particular language
so
let's put in a
maximum entropy distribution
for this two parameter model
we're gonna constraint n
so we know
that we have a term that looks like
e to the lambda one times n
as before
e to the negative lambda one times n
and we're also going to have a term
in the exponential that looks like
lambda two, that's
the second Lagrange multiplier
times n times epsilon
and of course there will be
an over all normalization factor of Z
so the only thing that should
be mysterious to you
is where we got this second term from
we know where this comes from
but why do we have this?
and so what I'll do is
I'll very quickly remind you
of how we got this functional
forms in the first place
we did the derivative
of the function we wish
to maximize the entropy
and now we will have three constraints
we'll have a constraint
on the average number of projects
that constraint g1
g1 of p is equals c
we'll have a constraint g2
on this term here
g2 of p is equal to some c prime
and finally we have that
over all normalization constraint
g3 of p i
and so
I'll write this out here
g1 of p is equal
and g2 of p
so, g1 of p is the average
number of languages
or the average number of projects
for particular language
that's P(n) times n
and we can write that
out in terms of the joint distribution
as P(n,epsilon) times n
now we not only sum over all values of n
but we integrate over all
of values of programmer time epsilon
g2(p) looks very similar, except now
we're constraining not just n
but n times epsilon
so we have to integrate over epsilon
and we have to sum over all values of p
and that's what our equation looks like
that's what our constraint looks like
for this second quantity here
so now, when we take the derivative
of g1 with respect of p1
we know what that looks like
that looks like lambda 1 times n
and when we take the derivative
of this here with respect of p i
everything comes out, except
for lambda2 n epsilon
and finally g 3
is just a constant on
the over all normalization
is says that
when I integrate over all epsilon
and sum over all n that's equal to 1
so when I take the derivative of this here
with respect to p
we get lambda 3 times unity
on this other side here
the derivative of entropy
with respect to p i
is equal to log p i
negative log p i, plus, or rather
minus one
and when we arrange this I get p i
is proportional to e to the
negative lambda 1 n
minus lambda 2 n epsilon
everything else can be factored into some
underlying normalization
so that's where this
functional form comes from
and is a general principle
a general rule of thumb
that if you wanna see
what the maximum entropy
distribution is constraining
look term by term
look term by term in the exponential
to find it
what we have now
is the following functional form
for the joint distribution
of language popularity
and language efficiency
and I'm gonna do one more thing
which is I'm gonna try to recover
the original marginal distribution
Pn, by integrating
the joint distribution
with respect of epsilon
so what I did was
I build a more sophisticated model
that had two constraints
it constraint the average
number of projects
and the average amount of programmer time
devoted to a particular language
and now what I'm gonna do is
I'm gonna integrate out
this hidden variable
so if I do that, if I do this integral
here's what I get
I still get this factor of e to
the negative lambda 1 n
I'll call this lambda prime
I still get the factor of 1 over Z
and the I simply have to do
the following little integral
negative lambda 2 n epsilon d-epsilon
and that equal to 1 over
n negative lambda 2
times e to the negative lambda 2 n epsilon
my integral range is from zero to infinity
and so when I put this terms in here
my final functional form, and again
I'm not keeping track of
all the multiplications
so I'm turning Z into Z'
to absorb this factors here
the final functional form
looks like an exponential on top
divided by n
so my new functional form
for the language abundance
is modified
previously, just to remind you
previously, when I did Max. Ent.
constraining only the average
number of projects
my distribution of language popularity
looked like an exponential
and we realized this is a really bad fit
this is the fit you see in the plot here
but now, what we've done
is produce a joint distribution
with two constraints
one of the constraints is the same
the other is a new constraint
that involves that hidden variable
programmer efficiency
and when I integrate out
that hidden variable
I get a new functional form
for language popularity
in particular, we find
that is proportional
to e to the negative lambda n divided by n
this distribution is called
the Fisher log series
so, here's the functional form
which we got by integrating out
this hidden programmer efficiency variable
and what you can see now is that
that distribution is a far better model
for language popularity
and what we've done here
just to be clear
is that we postulated
this additional constraint
involving a hidden variable
that we can't see
and we've integrated out
that hidden variable
what we should be impressed by
is the fact not only
that we are able to get a good fit
but also by the underlying mechanisms
that the Max. Ent. model
is suggesting to you
it's suggesting that two
things are constraint
languages tend not to get too popular
on average
there is some kind of limit to how
popular a language can get
on average
that's the source of this constraint, here
but there's also a limit
to how much time or
how much effort can be devoted
to projects in a particular language
and that's this constraint here
so essentially, what's happening
is this languages up here
are allowed to become more popular
because they don't take up too
much programmer time per project
this in the language of an ecologist
this are the really abundant species
with really low metabolic needs
so we're now able to reproduce
this really popular languages
we're able to accurately explain, or
we're able to accurately predict
that there should be
really run away popular languages
like C and C++ and Java
and also we're able to explain
why the popularity in
the languages in the tali here
is lower than you would expect
from an exponential model
in particular presumably
a lot of this languages are
associated with lower efficiencies
or rather epsilon higher
meaning that there is a
higher amount of time
devoted per project to those languages
and intuitively, we're programming in Java
this is a really high efficiency language
compared to programming in Haskell
this is a profoundly
beautiful language, perhaps
but is one that you would
not associate with efficiency
writing fast libraries
for parsing html pages
so, by postulating the existence
of this hidden variable
and using a maximum entropy argument
we can actually start reproducing the data
and it feels like we starting
to talk about
important constraints in the system
so you should be impressed by this
in part because it's deeper
than the result we've got
in the taxi cab case
in the taxi cab case we
picked a particular constraint
and it just so happened to correspond
to a really simple mechanistic model
and you might say
well I could have come up
with that mechanistic model all on my own
thank you very much
here what we've done
it's postulate a set of constraints
that are scientifically substantive
the first constraint says that
we're gonna constraint
the average number of projects
so the open source movement
can do what ever it wants
as long as it keeps the average
number of projects per language fixed
people are making decisions
in fact, they are making decisions
that are in a maximally disordered way
that's the maximum entropy
part of the assumption
so that they preserve
this quantity here
on average
and further more
the open source community
by what ever method
or by what ever group cognitive method
that's able to do this
and presumably by a hole lot
of different interacting process
also preservers the average amount
of programmer time devoted
to a particular language