Computational Text Analysis for Political Science

[UPDATE: all slides and codes are available here! ]

First draft of the course program. No previous knowledge on programming or natural language processing are required, just be curious.

Goal of the course. Offering a broad overview of natural language processing approaches and tools, together with their applications in the social sciences. Learning how to properly use and evaluate them.


Take-aways. At the end of the course the students will be able to:

a) critically analyse a computational political science paper in all its aspects

b) re-implement the approaches presented in this research-area

c) adopt and adapt NLP approaches for their own research


Evaluation  – two options:

A) presentation of a paper in class (25% grade) + overview of a python library in class (25% grade) + written exam (25% grade) + code (25% grade)

B) written exam (50% grade) + code (50% grade)


1st Day. Overview of the course, intro to Python and web scraping.

Before coming to the first class try to install Jupyter notebook ( I highly recommend to install Anaconda (which contains Jupyter, among many other things that we will need). If you have any problem, just drop me an email.

In the first two weeks we will mainly focus on this paper. Give it a quick initial read, before coming to the first class.


2. Intro to Computational Text Analysis, intro to Python.

The first two weeks are mainly focused on setting up a common ground on topics such as natural language processing, on learning Python syntax and on web scraping a corpus that we will use in the following classes.

Slightly more computer science point of view on Computational Social Science.

Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data [link]

Robust De-anonymization of Large Sparse Datasets [link]


3. Text processing (tokenization, lemmatization, POS-Tagging, NER) 

Foundation of Statistical Natural Language Processing is freely available online – for these two classes I suggest you to skim through chapters 3, 4 and 10.


4. Text processing (tokenization, lemmatization, POS-Tagging, NER)

Legislative amendments and informal politics in the European Union: A text reuse approach [link]


5. Text processing (Word Embeddings and Entities)

Don’t count, predict! [link]

Entity Linking with a Knowledge Base [link]


6. Text processing (Word Embeddings and Entities)


7. Text Classification and Sentiment Analysis

General overview on Classification, Clustering and Extraction. [link]

Sentiment analysis algorithms and applications: A survey [link]


8. Text Classification and Sentiment Analysis

Event Analysis on the 2016 U.S. Presidential Election Using Social Media [link]

Classifying Topics and Detecting Topic Shifts in Political Manifestos [link]

Political Ideology Detection Using Recursive Neural Networks [link]

Predicting Elections with Twitter [link]

From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series [link]

Twitter as a Corpus for Sentiment Analysis and Opinion Mining [link]


9. Clustering and Topic Models

Topic Modeling: A Basic Introduction [link]

Getting Started with Topic Modeling and MALLET [link]


10. Clustering and Topic Models

Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach [link]

A Bayesian Hierarchical Topic Model for Political Texts [link]

Multidimensional Topic Analysis in Political Texts [link]


11. Scaling

Scaling Policy Preferences from Coded Political Texts [link]


12. Scaling

Understanding Wordscores [link]

A Scaling Model for Estimating Time-Series Party Positions from Texts [link]

Unsupervised Cross-Lingual Scaling of Political Texts [link]


13. Information Retrieval and Collection Building

Book: Introduction to Information Retrieval.  Skim through chapters 6, 9, 11.

Can the Internet be Archived? [link]


14. Information Retrieval and Collection Building

The .GOV Internet Archive: A Big Data Resource for Political Science [link]

What does the Web remember of its deleted past? An archival reconstruction of the former Yugoslav top-level domain [link]