UKParl Dataset

The dataset presented in the paper “UKParl: A Semantified and Topically Organized Corpus of Political Speeches”, written by Federico Nanni, Mahmoud Osman, Yi-Ru Cheng, Simone Paolo Ponzetto and Laura Dietz, is available for download here.

The dataset follows the structure of the original collection, which is divided in three sessions: 2013-14, 2014-15 and 2015-16.

Each session is divided into a set of topics, where for each topic-speech pair we provide i) the original text of the speech; and ii) the list of entities that were identified in text (we use TagMe with standard settings).

Use this file to align topics with the related Wikipedia page. Be careful, this is the first version of the alignment file, which has been created entirely automatically. We are currently working on improving the alignment between topics and related Wikipedia pages in a semi-automatic way; the new alignment will be available soon. The file is structured as follows:

topic \t Wikipedia page \n