Alright, welcome back to Foundations and Applications of Humanities Analytics. As mentioned by Simon, I'm going to be talking today about patterns in both science and the humanities from a more philosophical perspective, in particular, a sort of "philosophy of science" perspective. So, we're going to start by talking about patterns in science, and when talking about patterns in science you really do have to start with the work of Daniel Dennett, and in particular, his 1991 paper Real Patterns, right? So one of the central functions of empirical research in any domain, we're just sort of taking this as a starting point, is the recognition of patterns in both data and in theory. For empirical research, you're going to be thinking more about patterns in data, but it's also true that as you start to build models of data, you recognize patterns within those theoretical models that are also inches, right? And so, such a central task in philosophy of science, where we're trying to gain this sort of methodological and sort of foundationally grounded understanding of how science works, is to come up with the perspicuous and general understanding of what constitutes a pattern. And for Dennett, a pattern is just going to be a sequence of data points that is compressible, where this means that that sequence of data points admits of a simpler description than simply repeating the sequence. So, to give an example, the sequence of integers starting with 1, then 2, then 3, all the way up to 100,000: that's going to be a pattern, since the description "the sequence of integers" beginning at 1 and ending at 100,000, where the n-th integer in the sequence is just the integer "n" is simpler than the full list with 100,000 integers, right? It's simpler to say, it's simpler to write out, it's simpler for a computer to spit that out as well. Now, of course notions of relative simplicity are difficult to pin down, and so different definitions of simplicity may yield different results about what constitutes the optimal compression of a pattern, and indeed you can have a debate about whether patterns even ought to be optimally compressed in the actual practice of science and pattern recognition. The important thing for our purposes is just that as long as a sequence of data points can be compressed at all, then it's a pattern. If it can't be compressed in any way, then we're just going to call it "noise," right? And so patterns, what they allow for is explanation and prediction, at least on Dennett's account, right? When you generalize a data set into a pattern, you can start to explain why it is that certain data points take the values that they do, and you can also start to predict the values of unobserved data points coming from the same data-generating process. Now note that Dennett here is specifically interested in a particular application of pattern recognition, namely the psychology of intelligent agents Paradigmatically, that's going to be human beings, but it could also be artificial intelligences, and indeed some non-human animals. He's interested in the extent to which, in these intelligent agents, basic psychological entities like beliefs and desires can be understood as patterns of behavior. And so this means that while this way of thinking about patterns isn't formulated with the study of human culture necessarily explicitly in mind, it's obviously related to some degree, and I would argue to quite a large degree, since, indeed, culture is something made by intelligent agents; it's part of the behavioral suite of intelligent agents, right? Now, for all its sort of importance within the study of patterns within philosophy of science, it's also the case that Dennett's account can be, you know, criticized for not going far enough or for not being sufficiently comprehensive in its understanding of how pattern recognition works, especially within the sort of behavioral and human sciences, and indeed within the broader study of human culture that would include the humanities, right? And so, there's other sides to this notion of a pattern that are highlighted in various sort of criticisms and commentaries on Dennett, and that includes a really nice paper from 2007 by Victoria McGeer, right? So McGeer agrees with Dennett that patterns can play a role in predicting and explaining behavior, but she also argues, I think, convincingly argues, that patterns also play a regulative role, I'll say, sometimes a regulatory role, within human behavior and within the behavior of intelligent entities more broadly. And what that means is that agents can recognize patterns in their own behavior and start behaving in accordance with these patterns, creating a feedback loop, whereby the pattern not only sort of explains and predicts the agent's behavior, but also regulates and controls that behavior. We can also use our understanding of patterns of behavior in agents to control other agents, right? We can predict and explain what agents will do, and in doing so, control them to some degree; we can also control ourselves, as I was discussing before, and this broad capacity for control is really central, McGeer argues, to our understanding of agency. And what I'll argue here is that this regulatory role of a pattern is central to understanding how patterns work in the humanities, and it's particularly important for the quantitative study of the humanities, which we're talking about here, right? So, here's an example to sort of try and drive this point home: so lets consider topic modelling, one of the core methodologies used in the quantitative study of the humanities. What do topic modeling programs do? Well, they find groups of words such that words in the same group are more likely to appear in the same document than words in different groups, right? So, "sock" and "shoe" are more likely to appear together than "shoe" and "shuffle," for instance. And these groups of words that are more likely to appear together than separately? We'll call them the topics, right? And some documents contain many more words in one topic than words in another topic. And that's when we say those documents are about the topics in which they're much more likely to contain words within. And this amounts to a fairly simple kind of pattern recognition: by grouping words into topics and looking at which documents are more likely to contain words within those topics, when we feed a corpus into a topic modelling algorithm, that's basically what we're doing. We're recognizing these patterns within the corpus that allow us to identify these topics. Now, of course, within most corpora of interest, the authors of documents are aware, to some degree, of the kinds of topics that documents in the corpus tend to be about, as they have some awareness of the very patterns that a third-party observer will come to understand them as instantiating. To put it in another way, an author usually has some understanding of what kind of document they are producing, right? There's a kind of authorial self-consciousness wherein the sort of patterns that we'll eventually recognize in the corpus aren't just sort of what we're recognizing after the fact, rather, what we're discovering are the very mechanisms of regulation that produce that corpus in the first place, right? So when we observe a pattern in the textual corpus, we aren't just getting a better handle on how to explain its content, we're also getting a handle on the mechanisms through which the production of the corpus is regulated and controlled. And that's really going to be the core argument that I'm making here in this lecture. And it's worth noting in this controlling in a regulatory capacity patterns in text, conserve both positive, negative, and neutral ends, and indeed what kinds of ends they serve are going to depend on the context, indeed, on you that research it, right? So one example that might be taking a sort of positive use of this sort of controlling a regulatory capacity of patterns is going to be the presence of narrative in a text: whether that's a novel, newspaper article, diary entry, et cetera. A narrative we can think of is a kind of mechanism on the part of the author for controlling the text's production, right?-- to the extent that we think narrative is a good thing. In some texts here, you know, we're happy to sort of embrace this sort of regulative role of patterns. Now on the negative side, you know, in history you might find self-conscious attempts on the part of authors to cloud the historical record, and this can hinder historical understanding, and that's also pattern of regulation that we're seeing manifest in a given corpus, and indeed, if we care about sort of an accurate historical record, then we might see that as a negatively valenced role for this regulatory capacity of patterns. And then of course there are going to be regulatory patterns in text, such as the construction of genre, for instance, we might not obviously think how positive or negative valence, whether that's sort of a moral valence or epistemological valence, and again that's going to depend on the context; so, I don't want to give the impression here that this regulative role of patterns is all good or all bad, it's all going to depend on the circumstances. And so how does this connect to your research and thinking about yourselves as sort of quantitative humanities researchers and professionals? So, you know, one way of framing a research question, in keeping with some of Simon's previous lectures where he talked a lot about this sort of central role that research questions play in a quantitative project, is to ask, you know, "what sorts of patterns will we find in this corpus of documents?" Now that question simply asked in that way is probably a little too broad, rather what you need to do is narrow this down a little bit, so the set of patterns actually being searched for is still broad enough that you're not excluding potentially interesting patterns or pre-determining your results, but still narrow enough that is amounts to a tractable project. A good example that Simon gave is this question that's sort of, you know, "what are the tragical elements within King Lear?" Right? That's a case where you're looking for patterns; you don't know exactly what they are. They're patterns of text associated with tragedy. But we're still keeping an open mind about exactly what those might look like, and that's an example of a better research question in which patterns are playing a central role. Now, as a researcher formulating a research question, you can reflect on the extent to which patterns you find are going to play either an explanatory, a predictive, or a regulatory role within that corpus, right? And I think, in particular, paying attention to that regulatory role is going to be very interesting in terms of generating some really cool new Humanities research. So, to sum this all up, I started from the sort of premise that understanding better the general nature of patterns has been a crucial recent task for contemporary philosophers of science that I think is very relevant to a foundational study of humanities analytics. On Dennett's influential account, a pattern is anything that can be compressed, and just as a historical note, this is drawn on early work in information theory and computer science by people like Cheklin and Shannon, although that's not as directly important for our purposes here. Dennett understands patterns as playing a mostly explanatory and predictive role, but I've argued here, building on work by an author Victoria McGeer, that there's also an alternative understanding of patterns as playing a regulatory and controlling role within data sets, and I think that this particular role is very important for pattern recognition in the humanities, right? And more broadly, I think that, you know, understanding humanities projects, and particularly quantitative humanities projects in terms of patterns, can help structure a research program, but this should still involve plenty of critical thinking about the role played by patterns in a given context. So I'll conclude by just giving some bibliographical information on the two main essays that I've drawn from here. This will also be available in course materials. Thank you very much.