Diversity in datasets is a key component to building responsible AI/ML. Despite this recognition, we know little about the diversity among the annotators involved in data production. Additionally, despite being an indispensable part of AI, data annotation work is often cast as simple, standardized and even low-skilled work. In this talk, I present a series of studies that aim at unpacking the data annotation process with an emphasis on the data worker who lifts the weight of data production. This includes interview studies to uncover both the data annotator’s perspective of their work and the data requestor’s approach to the diversity and subjectivity the workers bring; an ethnographic investigation in data centers to study the work practices around data annotation; a mixed methods study to explore the impact of worker demographic diversity on the data they annotate. While practitioners described nuanced understandings of annotator diversity, they rarely designed dataset production to account for diversity in the annotation process. This calls for more attention to a pervasive logic of representationalist thinking and counting that is intricately woven into the day to day work practices of annotation. In examining structure in which the annotation is done and the diversity is seen, this talk aims to recover annotation and diversity from its reductive framing and seek alternative approaches to knowing and doing annotation.
- Tags
-