Complexity Explorer Santa Few Institute

Foundations & Applications of Humanities Analytics (spring 2022)

Lead instructor:

This course is no longer in session.

13.1 Applications: Linkages » Test Your Knowledge: Explanations

Q1. How are abstracts stored within the HTML code used on the PhilPapers website (philpapers.org)?

A.  As .txt files
B.  As a div class
C.  As a span class
D.  None of the above

Correct Answer: (B)  As shown in the lecture, abstracts are stored within a div class, which is a way of classifying specific paragraphs of text in a web page. By contrast, a span class is used to classify text within a paragraph, rather than entire paragraphs. While the scraping technique used in the project saves text locally as a .txt file, this is not how abstracts are stored on the website PhilPapers.


Q2. What Python package was used to write the code to scrape abstracts from PhilPapers?

A. Natural Language Toolkit (nltk)
B. Matplotlib

C. BeautifulSoup
D. None of the above

Correct Answer: (C)  While all three of the packages listed above are used in the full project, only BeautifulSoup is used for scraping. The Natural Language Toolkit is used for analyzing the text data that is collected using BeautifulSoup, and Matplotlib is used in conjunction with Networkx to visualize data. 


Q3. Which word grouping in the mainstream ethics corpus is most closely linked with discussions of sex and sexuality in the feminist philosophy corpus?
A. Morality
B. Politics
C. Naturalistic Terms
D. None of the above

Correct Answer (A)   Looking at the graphical depiction of the linkage matrix, we can see that the thickest edge between the Sex and Sexuality node and any other node is connected to the node representing Morality terms in the mainstream Ethics corpus. By contrast, Political language in the mainstream Ethics corpus is most closely linked with Core Feminist Concepts, and Naturalistic Terms used in the mainstream corpus are most closely linked with Psychological terms used in the feminist corpus.