[All slides and code from previous years are available here ]
No previous knowledge on programming or natural language processing are required, just be curious.
Goal of the course. Offering a broad overview of natural language processing approaches and tools, together with their applications in the social sciences. Learning how to properly use and evaluate them.
Take-aways. At the end of the course the students will be able to:
a) critically analyse a computational social science paper in all its aspects
b) re-implement the approaches presented in this research-area
c) adopt and adapt NLP approaches for their own research
written exam (6 CTS) + code (4 CTS)
1st Day. Overview of the course, intro to Python.
Before coming to the first class try to install Jupyter notebook (http://jupyter.org/install.html). I highly recommend to install Anaconda (which contains Jupyter, among many other things that we will need). If you have any problem, just drop me an email.
- Grimmer, Justin, and Brandon M. Stewart. “Text as data: The promise and pitfalls of automatic content analysis methods for political texts.” Political analysis 21.3 (2013): 267-297.
- O’Connor, Brendan, David Bamman, and Noah A. Smith. “Computational text analysis for social science: Model assumptions and complexity.” (2011).
2. Intro to Computational Text Analysis, intro to Python.
The first two weeks are mainly focused on setting up a common ground on topics such as natural language processing, on learning Python syntax and on web scraping a corpus that we will use in the following classes.
- Barberá, Pablo. “Birds of the same feather tweet together: Bayesian ideal point estimation using Twitter data.” Political Analysis 23.1 (2014): 76-91.
- Narayanan, Arvind, and Vitaly Shmatikov. “Robust de-anonymization of large sparse datasets.” Security and Privacy, 2008. SP 2008. IEEE Symposium on. IEEE, 2008.
3. Text processing (tokenization, lemmatization, POS-Tagging, NER)
- Foundation of Statistical Natural Language Processing is freely available online – for these two classes I suggest you to skim through chapters 3, 4 and 10.
- Cross, James P., and Henrik Hermansson. “Legislative amendments and informal politics in the European Union: A text reuse approach.” European Union Politics 18.4 (2017): 581-602.
4. Text processing (tokenization, lemmatization, POS-Tagging, NER)
- Schrodt, Philip A., and David Van Brackle. “Automated coding of political event data.” Handbook of computational approaches to counterterrorism. Springer, New York, NY, 2013. 23-49.
- O’Connor, Brendan, Brandon M. Stewart, and Noah A. Smith. “Learning to extract international relations from political context.” Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2013.
5. Text processing (Word Embeddings and Entities)
- Baroni, Marco, Georgiana Dinu, and Germán Kruszewski. “Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors.” Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2014.
- Shen, Wei, Jianyong Wang, and Jiawei Han. “Entity linking with a knowledge base: Issues, techniques, and solutions.” IEEE Transactions on Knowledge and Data Engineering 27.2 (2015): 443-460.
6. Text processing (Word Embeddings and Entities)
- Kraft, P., Jain, H., & Rush, A. M. (2016). An Embedding Model for Predicting Roll-Call Votes. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2066-2070).
- Glavaš, Goran, Federico Nanni, and Simone Paolo Ponzetto. “Unsupervised Cross-Lingual Scaling of Political Texts.” EACL 2017 (2017): 688.
- Menini, Stefano, et al. “Topic-based agreement and disagreement in US electoral manifestos.” Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017.
7. Text Classification and Sentiment Analysis
- Allahyari, Mehdi, et al. “A brief survey of text mining: Classification, clustering and extraction techniques.” arXiv preprint arXiv:1707.02919 (2017).
- Medhat, Walaa, Ahmed Hassan, and Hoda Korashy. “Sentiment analysis algorithms and applications: A survey.” Ain Shams Engineering Journal 5.4 (2014): 1093-1113.
8. Text Classification and Sentiment Analysis
Reading list (sentiment analysis):
- Soroka, S. N. (2006). Good news and bad news: Asymmetric responses to economic information. Journal of Politics, 68(2), 372-385.
- Young, L., & Soroka, S. (2012). Affective news: The automated coding of sentiment in political texts. Political Communication, 29(2), 205-231.
- Murthy, D. (2015). Twitter and elections: are tweets, predictive, reactive, or a form of buzz?. Information, Communication & Society, 18(7), 816-831.
- Soroka, S., Young, L., & Balmas, M. (2015). Bad news or mad news? Sentiment scoring of negativity, fear, and anger in news content. The ANNALS of the American Academy of Political and Social Science, 659(1), 108-121.
Reading list (text classification):
- Hillard, D., Purpura, S., & Wilkerson, J. (2008). Computer-assisted topic classification for mixed-methods social science research. Journal of Information Technology & Politics, 4(4), 31-46.
- Hopkins, D. J., & King, G. (2010). A method of automated nonparametric content analysis for social science. American Journal of Political Science, 54(1), 229-247.
- Conover, M. D., Gonçalves, B., Ratkiewicz, J., Flammini, A., & Menczer, F. (2011). Predicting the political alignment of twitter users. In 2011 IEEE Third Inernational Conference on Social Computing (SocialCom).
- Zirn, C., Glavaš, G., Nanni, F., Eichorts, J., & Stuckenschmidt, H. (2016). Classifying topics and detecting topic shifts in political manifestos. PolText.
9. Clustering and Topic Models
- Blei, David M., Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. Journal of machine Learning research 3. (2003): 993-1022.
- Chang, Jonathan, et al. Reading tea leaves: How humans interpret topic models. (2009) Advances in neural information processing systems.
- Brett, Megan R. “Topic modeling: a basic introduction.” Journal of digital humanities 2.1 (2012): 12-16.
- Graham, Shawn, Scott Weingart, and Ian Milligan. Getting started with topic modeling and MALLET. The Editorial Board of the Programming Historian, 2012.
10. Clustering and Topic Models
- Grimmer, J. (2009). A bayesian hierarchical topic model for political texts: Measuring expressed agendas in senate press releases. Political Analysis, 18(1), 1-35.
- Yano, T., Cohen, W. W., & Smith, N. A. (2009). Predicting response to political blog posts with topic models. In NAACL.
- Roberts, Margaret E., et al. “Structural Topic Models for Open‐Ended Survey Responses.” American Journal of Political Science 58.4 (2014): 1064-1082.
- Greene, D., & Cross, J. P. (2017). Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach. Political Analysis, 25(1)
- Menini, Stefano, et al. “Topic-based agreement and disagreement in US electoral manifestos.” Proceedings of EMNLP.
- Scaling Policy Preferences from Coded Political Texts [link]
- A Scaling Model for Estimating Time-Series Party Positions from Texts [link]
13. Information Retrieval and Collection Building
- Schütze, Hinrich, Christopher D. Manning, and Prabhakar Raghavan. Introduction to information retrieval. Vol. 39. Cambridge University Press, 2008. Skim through chapters 6, 9, 11.
- Sparck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1), 11-21.
- Ponte, J. M., & Croft, W. B. (1998,). A language modeling approach to information retrieval. SIGIR.
- Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab.
- Liu, T. Y. (2009). Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3), 225-331.
14. Information Retrieval and Collection Building