Computational Text Analysis – Syllabus

[All slides and code from previous years are available here ]

No previous knowledge on programming or natural language processing are required, just be curious.

Goal of the course. Offering a broad overview of natural language processing approaches and tools, together with their applications in the social sciences. Learning how to properly use and evaluate them.

 

Take-aways. At the end of the course the students will be able to:

a) critically analyse a computational social science paper in all its aspects

b) re-implement the approaches presented in this research-area

c) adopt and adapt NLP approaches for their own research

 

Evaluation:

written exam (6 CTS) + code (4 CTS)

 

1st Day. Overview of the course, intro to Python.

Before coming to the first class try to install Jupyter notebook (http://jupyter.org/install.html). I highly recommend to install Anaconda (which contains Jupyter, among many other things that we will need). If you have any problem, just drop me an email.

Reading list:

  • Grimmer, Justin, and Brandon M. Stewart. “Text as data: The promise and pitfalls of automatic content analysis methods for political texts.” Political analysis 21.3 (2013): 267-297.
  • O’Connor, Brendan, David Bamman, and Noah A. Smith. “Computational text analysis for social science: Model assumptions and complexity.” (2011).

 

2. Intro to Computational Text Analysis, intro to Python.

The first two weeks are mainly focused on setting up a common ground on topics such as natural language processing, on learning Python syntax and on web scraping a corpus that we will use in the following classes.

Reading list:

  • Barberá, Pablo. “Birds of the same feather tweet together: Bayesian ideal point estimation using Twitter data.” Political Analysis 23.1 (2014): 76-91.
  • Narayanan, Arvind, and Vitaly Shmatikov. “Robust de-anonymization of large sparse datasets.” Security and Privacy, 2008. SP 2008. IEEE Symposium on. IEEE, 2008.

 

3. Text processing (tokenization, lemmatization, POS-Tagging, NER) 

Reading list:

  • Foundation of Statistical Natural Language Processing is freely available online – for these two classes I suggest you to skim through chapters 3, 4 and 10.
  • Cross, James P., and Henrik Hermansson. “Legislative amendments and informal politics in the European Union: A text reuse approach.” European Union Politics 18.4 (2017): 581-602.

 

4. Text processing (tokenization, lemmatization, POS-Tagging, NER)

Reading list:

  • Schrodt, Philip A., and David Van Brackle. “Automated coding of political event data.” Handbook of computational approaches to counterterrorism. Springer, New York, NY, 2013. 23-49.
  • O’Connor, Brendan, Brandon M. Stewart, and Noah A. Smith. “Learning to extract international relations from political context.” Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2013.

 

5. Text processing (Word Embeddings and Entities)

Reading list:

  • Baroni, Marco, Georgiana Dinu, and Germán Kruszewski. “Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors.” Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2014.
  • Shen, Wei, Jianyong Wang, and Jiawei Han. “Entity linking with a knowledge base: Issues, techniques, and solutions.” IEEE Transactions on Knowledge and Data Engineering 27.2 (2015): 443-460.

 

6. Text processing (Word Embeddings and Entities)

Reading list:

  • Kraft, P., Jain, H., & Rush, A. M. (2016). An Embedding Model for Predicting Roll-Call Votes. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2066-2070).
  • Glavaš, Goran, Federico Nanni, and Simone Paolo Ponzetto. “Unsupervised Cross-Lingual Scaling of Political Texts.” EACL 2017 (2017): 688.
  • Menini, Stefano, et al. “Topic-based agreement and disagreement in US electoral manifestos.” Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017.

 

7. Text Classification and Sentiment Analysis

Reading list:

  • Allahyari, Mehdi, et al. “A brief survey of text mining: Classification, clustering and extraction techniques.” arXiv preprint arXiv:1707.02919 (2017).
  • Medhat, Walaa, Ahmed Hassan, and Hoda Korashy. “Sentiment analysis algorithms and applications: A survey.” Ain Shams Engineering Journal 5.4 (2014): 1093-1113.

 

8. Text Classification and Sentiment Analysis

Reading list (sentiment analysis):

  • Soroka, S. N. (2006). Good news and bad news: Asymmetric responses to economic information. Journal of Politics, 68(2), 372-385.
  • Young, L., & Soroka, S. (2012). Affective news: The automated coding of sentiment in political texts. Political Communication, 29(2), 205-231.
  • Murthy, D. (2015). Twitter and elections: are tweets, predictive, reactive, or a form of buzz?. Information, Communication & Society, 18(7), 816-831.
  • Soroka, S., Young, L., & Balmas, M. (2015). Bad news or mad news? Sentiment scoring of negativity, fear, and anger in news content. The ANNALS of the American Academy of Political and Social Science, 659(1), 108-121.

Reading list (text classification):

  • Hillard, D., Purpura, S., & Wilkerson, J. (2008). Computer-assisted topic classification for mixed-methods social science research. Journal of Information Technology & Politics, 4(4), 31-46.
  • Hopkins, D. J., & King, G. (2010). A method of automated nonparametric content analysis for social science. American Journal of Political Science, 54(1), 229-247.
  • Conover, M. D., Gonçalves, B., Ratkiewicz, J., Flammini, A., & Menczer, F. (2011). Predicting the political alignment of twitter users. In 2011 IEEE Third Inernational Conference on Social Computing (SocialCom).
  • Zirn, C., Glavaš, G., Nanni, F., Eichorts, J., & Stuckenschmidt, H. (2016). Classifying topics and detecting topic shifts in political manifestos. PolText.

 

9. Clustering and Topic Models

Reading list:

  • Blei, David M., Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. Journal of machine Learning research 3. (2003): 993-1022.
  • Chang, Jonathan, et al. Reading tea leaves: How humans interpret topic models. (2009) Advances in neural information processing systems.
  • Brett, Megan R. “Topic modeling: a basic introduction.” Journal of digital humanities 2.1 (2012): 12-16.
  • Graham, Shawn, Scott Weingart, and Ian Milligan. Getting started with topic modeling and MALLET. The Editorial Board of the Programming Historian, 2012.

 

10. Clustering and Topic Models

Reading list:

  • Grimmer, J. (2009). A bayesian hierarchical topic model for political texts: Measuring expressed agendas in senate press releases. Political Analysis, 18(1), 1-35.
  • Yano, T., Cohen, W. W., & Smith, N. A. (2009). Predicting response to political blog posts with topic models. In NAACL.
  • Roberts, Margaret E., et al. “Structural Topic Models for Open‐Ended Survey Responses.” American Journal of Political Science 58.4 (2014): 1064-1082.
  • Greene, D., & Cross, J. P. (2017). Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach. Political Analysis, 25(1)
  • Menini, Stefano, et al. “Topic-based agreement and disagreement in US electoral manifestos.” Proceedings of EMNLP.

 

11. Scaling

Reading list:

  • Scaling Policy Preferences from Coded Political Texts [link]
  • A Scaling Model for Estimating Time-Series Party Positions from Texts [link]

12. Scaling

Reading list:

  • Understanding Wordscores [link]
  • Unsupervised Cross-Lingual Scaling of Political Texts [link]

 

13. Information Retrieval and Collection Building

Reading list:

  • Schütze, Hinrich, Christopher D. Manning, and Prabhakar Raghavan. Introduction to information retrieval. Vol. 39. Cambridge University Press, 2008. Skim through chapters 6, 9, 11.
  • Sparck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1), 11-21.
  • Ponte, J. M., & Croft, W. B. (1998,). A language modeling approach to information retrieval. SIGIR.
  • Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab.
  • Liu, T. Y. (2009). Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3), 225-331.

 

14. Information Retrieval and Collection Building

Reading list:

  • Lepore, Jill. “The cobweb: Can the Internet be Archived?.” The New Yorker (2015): 33-41.
  • The .GOV Internet Archive: A Big Data Resource for Political Science [link]
  • What does the Web remember of its deleted past? An archival reconstruction of the former Yugoslav top-level domain [link]