1
00:00:08,515 --> 00:00:12,437
In this unit, we're going to cover a
different aspect of network structure
2
00:00:12,467 --> 00:00:16,548
often called "scale-free structure, " but
more generally known as
3
00:00:16,569 --> 00:00:19,492
"long-tailed network structure."
4
00:00:19,492 --> 00:00:24,671
Here's the idea. Here's a picture of
typical structure of parts of the world
5
00:00:24,671 --> 00:00:31,996
wide web. Where nodes here are web pages
and links are links between pages and you
6
00:00:31,996 --> 00:00:35,624
can see that they are directional; that
one page links to another page, but not
7
00:00:35,637 --> 00:00:39,853
necessarily in reverse. And you can also
see that there's hubs in this kind of
8
00:00:39,873 --> 00:00:46,741
network. Pages that link to many other
pages and there's also hubs that are
9
00:00:46,761 --> 00:00:50,001
pages that many other pages link to,
like right here.
10
00:00:50,031 --> 00:00:57,039
Ok, so let's look at in contrast a random
network. So this is what a random network
11
00:00:57,049 --> 00:01:01,277
might look like if we just randomly threw
down a bunch of nodes and put random
12
00:01:01,277 --> 00:01:07,323
links between them. So let's look at the
degree distribution of these different
13
00:01:07,333 --> 00:01:09,229
types of networks.
14
00:01:09,239 --> 00:01:13,346
Here's what the degree distribution of a
random network looks like. This is a plot
15
00:01:13,346 --> 00:01:18,627
where I'm not putting numbers on these,
but you can sort of see the shape, this
16
00:01:18,647 --> 00:01:23,818
bell curve shape. This is also called a
"normal distribution." Or a "Gaussian
17
00:01:23,818 --> 00:01:27,649
distribution." Or, to be more exact, this
is really what's called a
18
00:01:27,649 --> 00:01:29,020
"Poisson distribution."
19
00:01:29,020 --> 00:01:32,752
But let's not worry about that. Just be
aware of the general shape of this.
20
00:01:32,752 --> 00:01:38,191
What's interesting about the random
network's distribution is that most nodes
21
00:01:38,191 --> 00:01:43,167
have the same number of links to them.
If we look here in this middle. This
22
00:01:43,177 --> 00:01:50,458
middle degree, sort of average degree
here, most nodes have that degree. There's
23
00:01:50,468 --> 00:01:54,999
a small number of nodes that have very
low degree. A small number of nodes that
24
00:01:55,049 --> 00:01:59,160
have very high degree. And after a certain
numbers out here they quickly go to zero.
25
00:01:59,190 --> 00:02:03,971
Now let's contrast that with the degree
distribution of the world wide web.
26
00:02:03,971 --> 00:02:07,847
Approximately. So here I've drawn a
picture of the degree distribution.
27
00:02:07,847 --> 00:02:13,125
Similarly it has these bars except the
bars here are so narrow that it looks
28
00:02:13,125 --> 00:02:18,897
like a solid black mass. But you can think
of it as this kind of plot, bar graph. But
29
00:02:18,897 --> 00:02:22,153
the bars are really really narrow and
there's a huge number of them. So this
30
00:02:22,173 --> 00:02:28,169
distribution looks really different from
this distribution. There's a very large
31
00:02:28,169 --> 00:02:32,567
number of nodes here that have low
degree and a very small number that
32
00:02:32,567 --> 00:02:37,513
have high degree. There's also some
other interesting things about this shape.
33
00:02:37,513 --> 00:02:45,617
If I plot this from say, a thousand to ten
thousand, and suppose I take this little
34
00:02:45,617 --> 00:02:52,755
square here. Starting at ten thousand
and now plot this little part all the way
35
00:02:52,765 --> 00:02:57,285
to a hundred thousand, but I'm now
going to blow this up to be about the
36
00:02:57,285 --> 00:02:59,421
same size as this. So let's do that.
37
00:02:59,421 --> 00:03:04,489
So here's the ten thousand to a
hundred thousand and I blew the
38
00:03:04,489 --> 00:03:08,823
whole thing up so what we were
seeing before is this little tiny
39
00:03:08,823 --> 00:03:13,949
patch over here becomes larger
here. Now I have this, of course,
40
00:03:13,949 --> 00:03:17,967
is much lower number of nodes
than in the previous plot, but I
41
00:03:17,967 --> 00:03:21,259
just want you to concentrate
on the shape. If we blow up a
42
00:03:21,259 --> 00:03:26,817
little part of this the shape
remains the same. Similarly if I
43
00:03:26,817 --> 00:03:30,555
started at a hundred thousand
and plotted all the way to a million
44
00:03:30,555 --> 00:03:32,697
the shape remains the same.
45
00:03:32,697 --> 00:03:39,048
This might ring some bells in terms of
fractals. We have, a fractal, remember,
46
00:03:39,050 --> 00:03:43,470
is something where if you take a small
part of it and blow it up it looks like
47
00:03:43,480 --> 00:03:50,810
the whole original object. It's self
similar at all levels. And a power law
48
00:03:50,810 --> 00:03:55,270
distribution is in fact a fractal object
in the same way we looked at
49
00:03:55,270 --> 00:03:59,270
cauliflower and trees and so on. In
the sense that if you take a small
50
00:03:59,270 --> 00:04:03,866
part of it and blow it up it has the
same shape as the original. So this
51
00:04:03,866 --> 00:04:06,190
is also called a "scale- free
distribution"
52
00:04:06,190 --> 00:04:13,110
and the reason is that this idea of
scale. At all scales it looks the same
53
00:04:13,110 --> 00:04:15,400
and so it's called "scale-free."
54
00:04:15,400 --> 00:04:19,400
Here's the approximate degree
distribution of the web. This is
55
00:04:19,400 --> 00:04:26,020
empirically what's been reported. That
the number of nodes with degree k is
56
00:04:26,020 --> 00:04:31,540
proportional to 1/(k^2). This kind of
mathematical expression is called a
57
00:04:31,540 --> 00:04:37,483
"power law." Because it involves a
variable raised to some power. Here we
58
00:04:37,483 --> 00:04:48,423
have 1/(k^2), but that's equivalent to
k^(-2). If you ever hear of the term
59
00:04:48,423 --> 00:04:49,623
"scale-free distribution"
60
00:04:49,623 --> 00:04:53,123
you just know that that's
equivalent to a power law distribution.
61
00:04:53,123 --> 00:04:55,383
And what's important to know is what that
exponent is.
62
00:04:55,383 --> 00:05:00,807
So the definition of a "scale-free
network" is a network with a "scale-
63
00:05:00,807 --> 00:05:03,219
free" or "power law" degree distribution.
64
00:05:05,499 --> 00:05:12,039
And we can see that scale-free networks
are fractal-like. Here's an abstract
65
00:05:12,039 --> 00:05:18,139
scale-free network. It's fractal-like in
that you get hubs that have lots of
66
00:05:18,139 --> 00:05:24,879
outgoing links or incoming links. And if
we scale down here we see at this level
67
00:05:24,879 --> 00:05:31,659
little hubs. At each scale we can see
smaller and smaller hubs that have the
68
00:05:31,659 --> 00:05:36,875
same shape as the original network
roughly. So we have self-similarity to
69
00:05:36,875 --> 00:05:41,641
some extent in this network. I'm just
calling it "fractal-like." And that's why
70
00:05:41,641 --> 00:05:46,954
it has a power law-like distribution. And
I say "power law-like" because in the
71
00:05:46,954 --> 00:05:51,464
real world, networks aren't really
scale-free. Just like in the real world
72
00:05:51,464 --> 00:05:54,851
objects aren't really fractals. They're
not mathematical fractals. They're
73
00:05:54,851 --> 00:05:59,369
approximations to fractals. And the same
thing with network degree distributions.
74
00:06:00,321 --> 00:06:03,269
I should point out that scale-free
networks indeed have the small world
75
00:06:03,269 --> 00:06:07,678
property. That is, low average distances
between nodes and high clustering.
76
00:06:07,678 --> 00:06:15,208
In 1999, Albert Lazlo Barabasi and Reka
Albert published a paper in the Journal
77
00:06:15,208 --> 00:06:19,523
Science called "Emergence of Scaling and
Random Networks", in which they introduced
78
00:06:19,523 --> 00:06:25,292
this idea of scale-free networks to a
large audience and this really, similar
79
00:06:25,292 --> 00:06:29,438
to the Watts and Strogatz paper I
mentioned before, this paper sparked a
80
00:06:29,438 --> 00:06:35,148
huge amount of interest in the nascent
field of network science. Barabasi and
81
00:06:35,148 --> 00:06:39,681
Albert looked at many different networks,
looked at the empirical data and proposed
82
00:06:39,681 --> 00:06:44,843
that scale-free degree distributions are
very common in real world complex
83
00:06:44,843 --> 00:06:45,939
networks.
84
00:06:46,364 --> 00:06:50,929
Other papers, such as this one by
Clauset, Shalizi, and Newman, have been a
85
00:06:50,929 --> 00:06:55,538
little bit more skeptical of that claim
that the data is power law distributed.
86
00:06:55,538 --> 00:06:59,889
So I just want you to be aware that there
is some controversy about whether certain
87
00:06:59,894 --> 00:07:02,721
networks are actually scale-free or not.
88
00:07:02,721 --> 00:07:08,777
Evelyn Fox Keller, a biologist who
studies networks, said in 2005, "current
89
00:07:08,777 --> 00:07:13,190
assessments of the commonality of power
laws are probably over estimates" and
90
00:07:13,190 --> 00:07:14,618
that's probably true.
91
00:07:14,618 --> 00:07:21,960
But in general what most people do
accept is that many real world networks,
92
00:07:21,960 --> 00:07:28,919
most real world networks probably, have
long-tailed degree distributions where
93
00:07:28,919 --> 00:07:34,813
"long-tail" just means that it has this
kind of shape. With many nodes having
94
00:07:34,813 --> 00:07:38,872
relatively low degree compared to the
number of nodes that have high degree.
95
00:07:38,872 --> 00:07:43,445
And this is called a tail of a
distribution in probability theory.
96
00:07:43,455 --> 00:07:47,495
Because it kind of looks like a tail.
This is sort of the body and
97
00:07:47,512 --> 00:07:48,550
this is the tail.
98
00:07:48,550 --> 00:07:51,395
And this is called a "long-tailed
distribution" because it's tail keeps
99
00:07:51,395 --> 00:07:57,607
going on and on and on. And that is very
different from what we call "Bell curve"
100
00:07:57,607 --> 00:07:59,052
or normal distributions.
101
00:07:59,052 --> 00:08:03,800
Here's a picture of a normal distribution
of human height. This is a nice example
102
00:08:03,800 --> 00:08:09,118
because people are lined up here by their
heights with the shortest people over
103
00:08:09,118 --> 00:08:13,691
here and the tallest people over here.
And you can see that most people cluster
104
00:08:13,691 --> 00:08:18,166
in the middle. There's a few people who
are very short. There's a few people who
105
00:08:18,166 --> 00:08:22,976
are very tall. Most people are average.
This normal distribution does not have a
106
00:08:22,976 --> 00:08:28,777
long tail. This goes to zero very quickly
on either side. A power law distribution
107
00:08:28,777 --> 00:08:35,625
or a long-tailed distribution does not go
to zero. It has a long tail that drops
108
00:08:35,625 --> 00:08:41,355
off and this has some really important
implications for real world networks.