Alright, welcome back to Foundations and
Applications of Humanities Analytics.

As mentioned by Simon, I'm going to be
talking today about patterns

in both science and the humanities from a
more philosophical perspective,

in particular, a sort of "philosophy of
science" perspective.

So, we're going to start by talking about
patterns in science,

and when talking about patterns in science
you really do have to start

with the work of Daniel Dennett, and in
particular, his 1991 paper Real Patterns,

right?

So one of the central functions of empirical
research in any domain,

we're just sort of taking this as a starting point,

is the recognition of patterns in both
data and in theory.

For empirical research, you're going to be
thinking more about patterns in data,

but it's also true that as you start to 
build models of data,

you recognize patterns within those
theoretical models that are also inches,

right?

And so, such a central task in philosophy
of science,

where we're trying to gain this sort of
methodological

and sort of foundationally grounded
understanding of how science works,

is to come up with the perspicuous and
general understanding

of what constitutes a pattern.

And for Dennett, a pattern is just going
to be a sequence of data points

that is compressible,

where this means that that sequence of data
points admits of a simpler description

than simply repeating the sequence.

So, to give an example, the sequence of
integers starting with 1, then 2, then 3,

all the way up to 100,000: that's going
to be a pattern, since the description

"the sequence of integers" beginning at 1
and ending at 100,000,

where the n-th integer in the sequence is
just the integer "n" is simpler than

the full list with 100,000 integers,
right?

It's simpler to say, it's simpler to write
out,

it's simpler for a computer to spit that
out as well.

Now, of course notions of relative
simplicity are difficult to pin down,

and so different definitions of simplicity
may yield different results about what

constitutes the optimal compression
of a pattern,

and indeed you can have a debate about
whether patterns even ought to be

optimally compressed in the actual practice
of science and pattern recognition.

The important thing for our purposes is
just that as long as a sequence of data points

can be compressed at all, then it's a
pattern.

If it can't be compressed in any way,
then we're just going to call it "noise,"

right?

And so patterns, what they allow for is
explanation and prediction,

at least on Dennett's account,
right?

When you generalize a data set into a
pattern, you can start to explain

why it is that certain data points take
the values that they do,

and you can also start to predict the values
of unobserved data points

coming from the same
data-generating process.

Now note that Dennett here is specifically
interested in a particular application

of pattern recognition, namely the psychology
of intelligent agents

Paradigmatically, that's going to be
human beings,

but it could also be
artificial intelligences,

and indeed some non-human animals.

He's interested in the extent to which,
in these intelligent agents,

basic psychological entities like beliefs
and desires can be understood

as patterns of behavior.

And so this means that while this way of
thinking about patterns isn't formulated

with the study of human culture necessarily
explicitly in mind,

it's obviously related to some degree,
and I would argue to quite a large degree,

since, indeed, culture is something made 
by intelligent agents;

it's part of the behavioral suite
of intelligent agents, right?

Now, for all its sort of importance
within the study of patterns

within philosophy of science,

it's also the case that Dennett's account
can be, you know,

criticized for not going far enough or for
not being sufficiently comprehensive

in its understanding of how
pattern recognition works,

especially within the sort of behavioral
and human sciences,

and indeed within the broader study of
human culture

that would include the humanities,
right?

And so, there's other sides to this notion
of a pattern

that are highlighted in various sort of
criticisms and commentaries on Dennett,

and that includes a really nice paper from
2007 by Victoria McGeer, right?

So McGeer agrees with Dennett that
patterns can play a role

in predicting and explaining behavior,

but she also argues, I think, convincingly
argues,

that patterns also play a regulative role,
I'll say, sometimes a regulatory role,

within human behavior and within the
behavior of intelligent entities more broadly.

And what that means is that agents can
recognize patterns in their own behavior

and start behaving in accordance with
these patterns,

creating a feedback loop, whereby the
pattern not only sort of

explains and predicts the agent's behavior,
but also regulates and controls

that behavior.

We can also use our understanding of
patterns of behavior in agents

to control other agents, right?

We can predict and explain what agents
will do, and in doing so,

control them to some degree; we can also
control ourselves, as I was discussing before,

and this broad capacity for control is
really central, McGeer argues,

to our understanding of agency.

And what I'll argue here is that this
regulatory role of a pattern is central

to understanding how patterns work in the
humanities,

and it's particularly important for the
quantitative study of the humanities,

which we're talking about here,
right?

So, here's an example to sort of try and
drive this point home:

so lets consider topic modelling, one of
the core methodologies used in the

quantitative study of the humanities.

What do topic modeling programs do?
Well, they find groups of words such that

words in the same group are more likely to
appear in the same document

than words in different groups, right?

So, "sock" and "shoe" are more likely to
appear together than "shoe" and "shuffle,"

for instance.

And these groups of words that are more
likely to appear together than separately?

We'll call them the topics, right?

And some documents contain many more words
in one topic than words in another topic.

And that's when we say those documents are
about the topics

in which they're much more likely to
contain words within.

And this amounts to a fairly simple kind
of pattern recognition:

by grouping words into topics and looking
at which documents are more likely to

contain words within those topics,

when we feed a corpus into a topic
modelling algorithm,

that's basically what we're doing.

We're recognizing these patterns within
the corpus that allow us

to identify these topics.

Now, of course, within most corpora of
interest,

the authors of documents are aware,
to some degree, of the kinds of topics

that documents in the corpus tend to be
about, as they have some awareness

of the very patterns that a third-party
observer will come to understand them

as instantiating.

To put it in another way, an author
usually has some understanding

of what kind of document they are producing,
right?

There's a kind of authorial
self-consciousness

wherein the sort of patterns that we'll
eventually recognize in the corpus

aren't just sort of what we're recognizing
after the fact,

rather, what we're discovering are the
very mechanisms of regulation

that produce that corpus
in the first place, right?

So when we observe a pattern
in the textual corpus,

we aren't just getting a better handle
on how to explain its content,

we're also getting a handle on the
mechanisms through which

the production of the corpus is regulated
and controlled.

And that's really going to be the core
argument that I'm making here

in this lecture.

And it's worth noting in this controlling in
a regulatory capacity patterns in text,

conserve both positive, negative, and
neutral ends,

and indeed what kinds of ends they serve
are going to depend on the context, indeed,

on you that research it, right?

So one example that might be taking a
sort of positive use

of this sort of controlling a regulatory
capacity of patterns

is going to be the presence of narrative
in a text:

whether that's a novel, newspaper article,
diary entry, et cetera.

A narrative we can think of is a kind of
mechanism on the part of the author

for controlling the text's production,
right?--

to the extent that we think narrative is
a good thing.

In some texts here, you know, we're happy
to sort of embrace this sort of

regulative role of patterns.

Now on the negative side, you know,

in history you might find self-conscious
attempts on the part of authors

to cloud the historical record,

and this can hinder
historical understanding,

and that's also pattern of regulation that
we're seeing manifest in a given corpus,

and indeed, if we care about sort of
an accurate historical record,

then we might see that as a negatively
valenced role

for this regulatory capacity of patterns.

And then of course there are going to be
regulatory patterns in text,

such as the construction of genre,
for instance,

we might not obviously think how positive
or negative valence,

whether that's sort of a moral valence
or epistemological valence,

and again that's going to depend on the
context; so,

I don't want to give the impression here
that this regulative role of patterns

is all good or all bad, it's all going to
depend on the circumstances.

And so how does this connect to your
research and thinking about yourselves

as sort of quantitative humanities
researchers and professionals?

So, you know, one way of framing a
research question,

in keeping with some of Simon's previous
lectures where he talked a lot about

this sort of central role that research
questions play in a quantitative project,

is to ask, you know, "what sorts of
patterns will we find in this corpus

of documents?"

Now that question simply asked in that way
is probably a little too broad,

rather what you need to do is
narrow this down a little bit,

so the set of patterns actually being
searched for

is still broad enough that you're not
excluding potentially interesting patterns

or pre-determining your results,

but still narrow enough that is amounts
to a tractable project.

A good example that Simon gave is this
question that's sort of, you know,

"what are the tragical elements within
King Lear?" Right?

That's a case where you're looking for
patterns;

you don't know exactly what they are.

They're patterns of text associated with
tragedy.

But we're still keeping an open mind about
exactly what those might look like,

and that's an example of a better research
question

in which patterns are playing
a central role.

Now, as a researcher formulating a
research question,

you can reflect on the extent to which
patterns you find

are going to play either an explanatory,
a predictive, or a regulatory role

within that corpus, right?

And I think, in particular, paying
attention to that regulatory role

is going to be very interesting in terms
of generating some really cool

new Humanities research.

So, to sum this all up, I started from the
sort of premise that

understanding better the general nature of
patterns has been a crucial recent task

for contemporary philosophers of science
that I think is very relevant

to a foundational study of
humanities analytics.

On Dennett's influential account, a pattern
is anything that can be compressed,

and just as a historical note, this is drawn
on early work in information theory

and computer science by people like
Cheklin and Shannon,

although that's not as directly important
for our purposes here.

Dennett understands patterns as playing
a mostly explanatory and predictive role,

but I've argued here, building on work by
an author Victoria McGeer,

that there's also an alternative
understanding of patterns

as playing a regulatory and controlling
role within data sets,

and I think that this particular role is
very important for pattern recognition

in the humanities,
right?

And more broadly, I think that, you know,
understanding humanities projects,

and particularly quantitative humanities
projects in terms of patterns,

can help structure a research program,

but this should still involve plenty of
critical thinking about the role played

by patterns in a given context.

So I'll conclude by just giving some
bibliographical information

on the two main essays that I've drawn
from here.

This will also be available in
course materials.

Thank you very much.