In this unit, we're going to cover a
different aspect of network structure
often called "scale-free structure, " but
more generally known as
"long-tailed network structure."
Here's the idea. Here's a picture of
typical structure of parts of the world
wide web. Where nodes here are web pages
and links are links between pages and you
can see that they are directional; that
one page links to another page, but not
necessarily in reverse. And you can also
see that there's hubs in this kind of
network. Pages that link to many other
pages and there's also hubs that are
pages that many other pages link to,
like right here.
Ok, so let's look at in contrast a random
network. So this is what a random network
might look like if we just randomly threw
down a bunch of nodes and put random
links between them. So let's look at the
degree distribution of these different
types of networks.
Here's what the degree distribution of a
random network looks like. This is a plot
where I'm not putting numbers on these,
but you can sort of see the shape, this
bell curve shape. This is also called a
"normal distribution." Or a "Gaussian
distribution." Or, to be more exact, this
is really what's called a
"Poisson distribution."
But let's not worry about that. Just be
aware of the general shape of this.
What's interesting about the random
network's distribution is that most nodes
have the same number of links to them.
If we look here in this middle. This
middle degree, sort of average degree
here, most nodes have that degree. There's
a small number of nodes that have very
low degree. A small number of nodes that
have very high degree. And after a certain
numbers out here they quickly go to zero.
Now let's contrast that with the degree
distribution of the world wide web.
Approximately. So here I've drawn a
picture of the degree distribution.
Similarly it has these bars except the
bars here are so narrow that it looks
like a solid black mass. But you can think
of it as this kind of plot, bar graph. But
the bars are really really narrow and
there's a huge number of them. So this
distribution looks really different from
this distribution. There's a very large
number of nodes here that have low
degree and a very small number that
have high degree. There's also some
other interesting things about this shape.
If I plot this from say, a thousand to ten
thousand, and suppose I take this little
square here. Starting at ten thousand
and now plot this little part all the way
to a hundred thousand, but I'm now
going to blow this up to be about the
same size as this. So let's do that.
So here's the ten thousand to a
hundred thousand and I blew the
whole thing up so what we were
seeing before is this little tiny
patch over here becomes larger
here. Now I have this, of course,
is much lower number of nodes
than in the previous plot, but I
just want you to concentrate
on the shape. If we blow up a
little part of this the shape
remains the same. Similarly if I
started at a hundred thousand
and plotted all the way to a million
the shape remains the same.
This might ring some bells in terms of
fractals. We have, a fractal, remember,
is something where if you take a small
part of it and blow it up it looks like
the whole original object. It's self
similar at all levels. And a power law
distribution is in fact a fractal object
in the same way we looked at
cauliflower and trees and so on. In
the sense that if you take a small
part of it and blow it up it has the
same shape as the original. So this
is also called a "scale- free
distribution"
and the reason is that this idea of
scale. At all scales it looks the same
and so it's called "scale-free."
Here's the approximate degree
distribution of the web. This is
empirically what's been reported. That
the number of nodes with degree k is
proportional to 1/(k^2). This kind of
mathematical expression is called a
"power law." Because it involves a
variable raised to some power. Here we
have 1/(k^2), but that's equivalent to
k^(-2). If you ever hear of the term
"scale-free distribution"
you just know that that's
equivalent to a power law distribution.
And what's important to know is what that
exponent is.
So the definition of a "scale-free
network" is a network with a "scale-
free" or "power law" degree distribution.
And we can see that scale-free networks
are fractal-like. Here's an abstract
scale-free network. It's fractal-like in
that you get hubs that have lots of
outgoing links or incoming links. And if
we scale down here we see at this level
little hubs. At each scale we can see
smaller and smaller hubs that have the
same shape as the original network
roughly. So we have self-similarity to
some extent in this network. I'm just
calling it "fractal-like." And that's why
it has a power law-like distribution. And
I say "power law-like" because in the
real world, networks aren't really
scale-free. Just like in the real world
objects aren't really fractals. They're
not mathematical fractals. They're
approximations to fractals. And the same
thing with network degree distributions.
I should point out that scale-free
networks indeed have the small world
property. That is, low average distances
between nodes and high clustering.
In 1999, Albert Lazlo Barabasi and Reka
Albert published a paper in the Journal
Science called "Emergence of Scaling and
Random Networks", in which they introduced
this idea of scale-free networks to a
large audience and this really, similar
to the Watts and Strogatz paper I
mentioned before, this paper sparked a
huge amount of interest in the nascent
field of network science. Barabasi and
Albert looked at many different networks,
looked at the empirical data and proposed
that scale-free degree distributions are
very common in real world complex
networks.
Other papers, such as this one by
Clauset, Shalizi, and Newman, have been a
little bit more skeptical of that claim
that the data is power law distributed.
So I just want you to be aware that there
is some controversy about whether certain
networks are actually scale-free or not.
Evelyn Fox Keller, a biologist who
studies networks, said in 2005, "current
assessments of the commonality of power
laws are probably over estimates" and
that's probably true.
But in general what most people do
accept is that many real world networks,
most real world networks probably, have
long-tailed degree distributions where
"long-tail" just means that it has this
kind of shape. With many nodes having
relatively low degree compared to the
number of nodes that have high degree.
And this is called a tail of a
distribution in probability theory.
Because it kind of looks like a tail.
This is sort of the body and
this is the tail.
And this is called a "long-tailed
distribution" because it's tail keeps
going on and on and on. And that is very
different from what we call "Bell curve"
or normal distributions.
Here's a picture of a normal distribution
of human height. This is a nice example
because people are lined up here by their
heights with the shortest people over
here and the tallest people over here.
And you can see that most people cluster
in the middle. There's a few people who
are very short. There's a few people who
are very tall. Most people are average.
This normal distribution does not have a
long tail. This goes to zero very quickly
on either side. A power law distribution
or a long-tailed distribution does not go
to zero. It has a long tail that drops
off and this has some really important
implications for real world networks.