Mitchell: I'm here with Simon DeDeo, who is a research fellow at SFI.

Simon has a background in theoretical physics, but he's worked on all kinds of different problems.

And he's going to tell us a little bit about his research.

So Simon, how long have you been here at the Institute?

DeDeo: I guess I've been here... I started coming in 2008 or so, and I had a position here in 2010.

I've been here ever since.

Mitchell: OK, so Simon, can you tell us a little bit about what you've been doing?

DeDeo: Yeah, absolutely. As Melanie said, I began my work in physics.

I actually was a cosmologist and when I got into cosmology I wanted to be,

you know, beginning-of-time, origin-of-space... not even stars, but space itself.

And when I got to start my Ph.D. in 2001,

some of the most interesting work actually was not being done in theory,

but at the intersection between theory and data.

And so a lot of the most charismatic people at my Ph.D. program were doing data analysis.

And it introduced me to this idea that you can learn a lot by looking.

So a lot of what I do now, even though I work in a very different field -

I work primarily in biological and social systems,

and I'll talk a little bit about social systems today -

most of what I do is actually,

in some sense, philosophically very similar to what we were doing in cosmology.

You gather a lot of data and you look for signatures of really interesting theoretical properties.

And I'll tell you one thing I'm really excited by, which is the work on Wikipedia.

So let me give you a little autobiography 'cause this is sort of fun.

So many people, they become biologists 'cause they grow up in

West Virginia and they're surrounded by plants and animals.

I grew up in London, which is an enormous city and there's not that many animals

(except, of course, the human animal).

And one thing you realize when you're in a large city;

the city works! And the question as to why it works -

why the candy store is exactly where it needs to be for you to get candy, and

why the streets have the particular pattern they do -

you ask yourself, OK, well, why does it work? Who set that up?

So one system we can study really closely, that has a similar kind of flavor,

is that of Wikipedia.

So Wikipedia (you probably know this, of course),

Wikipedia is the open-source encyclopedia that anyone can edit.

It's remarkably reliable,

and certainly in certain topics it's comparable to things like Encyclopedia Brittanica;

so maybe four errors per article versus five.

So one thing I've looked at, is the structure of cooperation and conflict on Wikipedia.

And in particular we're asking the question as to help people decide whether to undo the work of others.

So if you're familiar with Wikipedia, one thing you know is that you see a page, you can edit it.

And there's essentially two kinds of edits you can do on the most coarse-grained level.

The first kind of edit you can do is, you see a problem, you fix it;

you see a missing page, you make the page.

You see a paragraph that seems a little bit off, you change it.

I was just reading the page on Maxwell's Demon and there was some kooky paragraph and it had just been taken out.

Mitchell: Hey, I wrote that!

DeDeo: D'ah! No, it was about vortices; that's always kooky.

There's a different kind of edit which is what we call a revert.

And in this case what you do is, you take the page as it currently is, and you roll it back to a previous state.

And reverting a page is considered, I wouldn't quite say a hostile act,

because there are certain cases in which you would revert.

So if somebody comes in and just vandalizes, puts a few swear words on a page,

if you're a good person you'll come in and revert it.

By and large, multiple reverts in a sequence is considered to be an antisocial act; it's a sign of conflict.

So I write something about George Bush, you don't like it,

you take it out, and you roll it back to a previous version of the page.

I don't like you so, hey, I'm going to roll you back to a previous version, to the first page I wrote.

So an enormous amount of what happens on Wikipedia can be summarized

by the structure of whether or not, on a particular page,

whether or not the edit was what we'll call a cooperative edit, C; or a revert edit, R.

So that kind of coarse-graining, what we do there is,

we take all the complexity of what people are doing on Wikipedia, all the complexity of the edits -

we're not going to go in and ask deep questions about what's happening in that C or happening in that R,

because sometimes a C is bad and sometimes an R is good -

but in general, what we're going to look at is, we're going to look at these strings that are produced

in the editing of any particular page.

And in fact, just so you know, the most edited page on Wikipedia is the page associated with George W. Bush,

which has at this point about 45,000 discrete edits

ever since the records were being kept on Wikipedia (which actually don't go back to the beginning of Wikipedia,

so presumably a lot was happening beforehand).

But we have a record, pretty much through his rule and the subsequent rules of the Presidents,

of essentially how people are coming to agreement or failing to come to agreement

about the structure of the page at any particular point.

One of the things we're asking is, what kind of underlying process,

what kind of system, could be producing this sequence of edits?

So one thing you might say is, look; people just come to the page, they toss a coin, and it's a biased coin.

But let's say, 75 percent of the time they come in, they're in a bad mood, and they revert.

So what that would be is what we call (well, we called) an IID process.

But what that really means is, just, what happens now has forgotten completely what's just happened before.

So that's one kind of structure that you could see.

It may not surprise you, this is not of course how Wikipedia works.

There's this enormous amount of structure. And so in particular, what happens here

depends a great deal on what happens previous.

And so we built a number of different models. One of the things you see -

and this is a paper we just came out with that's in review right now -

one of the things you see is that the system seems to actually have a non-finite amount of memory.

So we'll call it infinite even though it sounds a bit spooky.

What seems to be the case is, the system is able to store information arbitrarily far back in time.

And in particular, one of the ways we show that is,

we look at the frequency of repeated cooperative events in the encyclopedia.

And so if the system had a finite amount of memory -

and in particular the technical phrase is that if this is a finite state machine -

if you draw... can I do a log-log plot?

Mitchell: Sure.

DeDeo: OK. If I do a log-log plot here,

where I do the probability... (do I want a log-log plot? No, I want a linear-log)

if I do the log of the probability of a repeated sequence of cooperative events,

and on this axis here I'll just do the length of that sequence, so...

there's a pretty high probability that you'll just see one C bracketed by Rs.

The chance of getting two Cs in a row - the chance of two people doing nice things,

or at least apparently nice things under this coarse-graining - is a bit lower.

So now, if this process had finite memory, if it was a finite state machine,

this here would be a straight line. What that means is there's an exponential decay.

The chance of getting another good cooperative event

once you've had a hundred Cs, the chance of getting one more is exponentially suppressed.

What you actually find is, you don't find a straight line.

What you actually find is something that levels out.

So if you imagine this system as essentially a gigantic computer

made of people interacting absolutely (but, you know, so is my machine here.

My machine is just a bunch of circuits interacting; dictated by Steve Jobs's ghost, I suppose; but)

you can ask what kind of computation is that system doing? What's the best way to describe it?

This collective information, this persistence of memory,

the fact that what happens here depends intrinsically on what's happened all the way in the past -

in different ways, in structured ways, so the recent past is more important -

the thing is, is that it doesn't forget back here.

That structure there seems to be something that's not contained within any single person's mind.

There's multiple people involved in the creation of those long cooperative runs.

So there's some kind of social institutional process being built here.

Mitchell: So Simon, a couple questions here.

DeDeo: OK.

Mitchell: First question... what's the ultimate goal of this research?

What are you going to be able to tell us in general?

DeDeo: Ah, good question. One of the things is, is we don't understand, I think, how large systems work.

We have a sense of how individuals behave. We have a good model for how individuals behave in these systems.

But we don't understand the underlying principles for how they behave in groups.

We have a lot of folk rules, we have a lot of toy models.

But one thing we'd like to ask is, well, how does it actually happen in the real world?

Mitchell: So in Wikipedia, people are, the editors are, somewhat anonymous.

And maybe being anonymous has a lot of effect on cooperation versus non-cooperation,

as opposed to a system where everybody can see who everybody is and what they're doing.

DeDeo: That's actually right.

And so in fact, there's actually a major distinction if you zoom in on this computational process.

So if you ask, what's the underlying sort of program that's running this system,

what you see right away is it breaks down into two sort of modular spaces.

One modular space is associated with when the page is what's called protected,

so that only registered users with persistent user names can edit; versus unprotected.

So absolutely, there's two different dynamics happening here;

where, in the case where the page is unprotected, you have actually quite a complicated structure.

In the case where the page is protected, you add another module on.

But forcing people to use a persistent identity seems to, not to rule out certain kinds of behavior,

but to enlarge the space of behavior. So that was sort of a surprise, I think.

We thought that you would have two modules,

and one module would be associated purely with 'anyone can edit';

and one module would be associated purely with 'only trusted people can edit'.

In fact, actually, the structure of cooperation when anyone can edit is simpler than if you restrict it.

So, in some sense, putting a restriction on is actually increasing the complexity of the process.

And for complexity here, I think,

it cashes out explicitly in terms of how many internal states do you have to postulate that the system has?

How many structures, how many bits of memory does it have?

So I'm not sure if that answers your question.

Mitchell: I think that gets to it. Thank you, Simon.

DeDeo: That's great. Thank you very much, this is cool. This is super cool, so thank you very much. Yeah. Excellent.