This International Replication (IRep) dataset is a human annotated dataset (approximately 15MB in CSV format), which contains a list of items (item_IDs) annotated with a list 30 emotion labels from a set of emotion classes defined in Cowen & Keltner, 2019 plus two additional labels ‘error’ and ‘unsure’. During the data collection process, raters used the set of facial expression labels (FEP labels) to annotate their perception of the facial expression present in a video. As the purpose of this dataset is to illustrate how replications of human labeling experiments can be used in order to determine the overall quality of the resulting annotations, we have omitted the actual video content and the reference (e.g. URL) to the videos themselves. The annotated videos are thus referred to as ‘items’ with a set of indices, e.g. item 1, item 2, etc. To produce the replications for this labeling experiment we have used rater pools across different geo locations (1) International (covering various countries), (2) Budapest, (3) Kuala Lumpur and (4) Mexico city. The data collected in the three geo locales (i.e. Budapest, Kuala Lumpur and Mexico city) are considered to be strict replications of the same experiment, as the only thing that changes in them are the rater pools. The international data, however is not a strict replication, as the data has been collected on a different platform and with slightly different answer options (including unsure vs error.others) from the other three geo locales. We’ll report its xRR and IRR in the table for reference, but will not include it in our further analysis. Every item is annotated in up to 4 rater pools with a maximum of 2 raters from each pool. Raters are referred to as Rater_1 or Rater_2 across all rater pools. Their unique IDs are not stored as part of this dataset. The dataset contains annotations on 41,685 items. There are a total of 223,059 annotations in this dataset (total number of labels on items). -
View it on GitHub