1,175
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Hidden humans: exploring perceptions of user-work and training artificial intelligence in Aotearoa New Zealand

, , ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon show all
Pages 443-456 | Received 07 Feb 2023, Accepted 05 May 2023, Published online: 15 May 2023

ABSTRACT

Artificial intelligence systems require large amounts of data to allow them to learn and achieve high performance. That data is increasingly collected in extractive and exploitative ways, which transfer value and power from individuals to AI system owners. Our research focuses on data that is collected from users of digital platforms, through direct and indirect interaction with those platforms, in ways that are not communicated to users, without consent or compensation. This paper presents our findings from a series of interviews and workshops in the Aotearoa New Zealand context to identify common themes and concerns from a variety of perspectives. Reframing this type of interaction as work or labour brings into view an otherwise unrecognised harm of using this data for training AI systems, and illustrates a new class of exploitative data practices that have become normalised in the digital age. We found that participants particularly emphasised moral or ethical justifications for intervention over financial or economic reasons to act.

Introduction

New breakthroughs in artificial intelligence (AI) are now a regular occurrence, from systems that can process images and describe the objects in a scene, to biometrics technologies that recognise individuals based on their voice or face, to chatbots that can maintain convincing conversations with humans. While these technical achievements are very impressive, their development is dependent on massive datasets that allow these systems to be trained (or ‘learn’). There are many issues with AI development and deployment, but in this paper we question the ethical legitimacy of how these training datasets are collected and curated.

Much of the recent attention has been directed towards the use of training data collected (or ‘scraped’) from public and private internet sources, such as Flickr images (licensed under Creative Commons) being used in image datasets to train computer vision systems (Almeida et al. Citation2022), or Clearview AI’s use of images and names from public-facing social media to train facial recognition systems (Hill Citation2022), where ethical debates are focused on privacy and copyrights. However, data of all types (video/images, audio, text, metadata) harnessed from the users of digital platforms, such as Facebook, Spotify, Netflix, and others is comparatively overlooked.

The data that concerns us is supplied by what we refer to as ‘user-work’, where the actions of the user create data of value, which can then be leveraged to train AI systems. Users are often unaware that their actions contribute towards the development of AI systems, and are arguably insufficiently or not compensated for the value that they create. While the broader processes of data collection often raise issues relating to privacy and data protection, our framing focuses the ethical discussion around the question of labour exploitation, although there are overlaps with privacy principles. Our research seeks to understand perspectives on these issues in an Aotearoa New Zealand context, and if the issues are significant enough to warrant intervention or correction. Our use of AI systems, particularly those controlled by the large tech companies, is subject to a cultural hegemony where the users begin to accept that the values and processes of Silicon Valley are the norm – we believe it is useful to explore what an Aotearoa New Zealand approach would look like.

In this paper, we present our preliminary research on understanding the perceptions from individuals with different backgrounds. The research is conducted by a transdisciplinary group of researchers from the University of Auckland and Otago with backgrounds in computer science and engineering, philosophy, anthropology, law, international business, and public policy, providing a wide variety of perspectives in the development of the research programme. The programme involved scoping interviews, a preliminary test workshop, and two full-day workshops with key stakeholders and opinion leaders, seeking their perspectives on the topic and understanding whether or not these issues are important. This paper discusses some of our findings from these workshops and scoping interviews, with the aim of highlighting further avenues of focus and future research, including a recently published paper (Morreale et al. Citation2023). Our hope is that our work in this area can raise awareness and contribute towards decision making, both in terms of public sector policy and at the individual level, to address the surreptitious use of individuals’ data to train AI systems.

Defining user-work on digital platforms

AI systems are extremely powerful, both from a technical point of view as well as from an economical perspective, in that they have the potential to carry out tasks much more effectively and efficiently than humans, sometimes generating significant profit. The use of AI is dramatically and quickly reshaping our society, from the meaning of ‘work’ to the dynamics of power and control between humans and computers. Walton and Nayak (Citation2021) write that ‘AI is a tool of capitalism which transforms our societies … that helps in the expansion of the capitalist model of economic development’.

Our research is focused on data generated by platform users, which is then used to train AI and machine learning systems that emulate human capabilities such as recognition or reasoning. In order to successfully train these systems, huge amounts of high-quality data are needed - for example, the Google Books Ngram dataset is built from text extracted from books published between 1500 and 2019 in multiple languages, and has 468,000,000,000 data points which corresponds to roughly 2.2TB of text (Google Citation2022). The developers at Google (Citation2022) observe that ‘simple models on large data sets generally beat fancy models on small data sets’. Indeed, the largest datasets and AI models are mostly developed or supported by large corporations such as Google/Alphabet, Facebook/Meta, Microsoft, and Apple with access to significant volumes of data, rather than academic research groups who rely on publicly-available data sources and ‘can no longer keep up’ (The Economist Citation2022; Edwards Citation2022).

A critical challenge with collecting such a large amount of data is ensuring that it is of sufficient quality to reliably train an AI system and get good/correct outcomes. Software engineers and developers are an expensive resource, and manually collecting, labelling, or reviewing data is considered a poor use of their time. This has led to the well-established practice of using crowdworking platforms like Mechanical Turk to complete these tasks at lower costs, which is not without its own ethical concerns (Fort et al. Citation2011; Barbosa and Chen Citation2019). OpenAI, the creator of popular generative AI tools ChatGPT and DALL-E, used an outsourcing partner with employees in Kenya to label data at pay rates lower than USD$2 per hour (Perrigo, Citation2023). However, given the massive amounts of data that need to be prepared, even crowdworking could be considered too costly. This has led to even more exploitative practices, such as the use of prison labour in Finland to label data for AI systems (Lehtiniemi and Ruckenstein Citation2022). This has been broadly termed ‘heteromation’, meaning ‘the extraction of economic value from low-cost or free labour in computer-mediated networks’ (Ekbia and Nardi Citation2017).

With these cost pressures, platform users have commonly become part of the data collection pipeline to provide quick, cheap labour. While an individual user may not produce much data, collectively millions of users can produce or process a huge amount. To our knowledge, the most pernicious example of this is the reCaptcha system, which was nominally designed to allow websites to distinguish between human users and automated bots. Users would be presented with images of words, which were generally difficult to read (and intended to be impossible for a computer to read automatically), and asked to type in the words. If they typed the words in correctly, then they would be allowed to progress on the website, whether that was to complete a ticket purchase, submit a form, or just to log in. What users were not told was that while reCaptcha knew the correct answer for some of the images, there were also ‘unlabelled’ images that had no associated answer - the human users would provide the answers for the dataset. In 2007, the reCaptcha algorithm was capturing 60 million entries per day, which subsequently allowed for the digitisation of the entire Google Books archive, one word at a time (Gilbertson Citation2007). This system was then repurposed to ask users to decipher house numbers from Google Street View images (Perez Citation2012), and later to label images of streets for autonomous vehicles - clicking on the images where there are traffic lights, stop signs, or bicycles (Healy and Flores Citation2018).

Systems like this rely on data generated by users with little awareness, consent, or recognition of the work they are performing for AI owners - sharing some of the characteristics of labour exploitation. Users are generally not given a choice about whether or not they would like to participate - through engagement with the platform (and only sometimes ticking a box to accept terms and conditions) their consent is deemed to have been given. While the level of harm experienced by an individual user may be infinitesimally small it is not negligible, and at the collective level there is a significant opportunity cost and transfer of value from the users to the platform owners. Arguably, the data is obtained as a by-product of a task that the user would otherwise be completing online - the user may not be asked to do something ‘extra’ - and this is therefore not an inconvenience to the user. Additionally, it can be argued that the users receive a service in exchange for the data they produce, often for free (although this is not always the case, such as on content platforms like Spotify or Netflix where users may be paying to access the service). However, we posit that there is an ethical imbalance where the surplus value captured by the platform is not equitably or fairly shared with the user-workers who create that value.

While describing this type of interaction as labour exploitation may feel hyperbolic, it is useful to consider a relevant definition. New Zealand is currently developing modern slavery legislation that seeks to make a formal legal distinction between labour exploitation and modern slavery. Labour exploitation is construed as ‘behaviour that causes, or increases the risk of, material harm to the economic, social, physical or emotional well-being of a person’ (MBIE Citation2022). At the collective level, we argue that the extractive relationship between digital platforms and their users could be considered a form of labour exploitation, albeit at the lower end of materiality.

Another useful concept is how employment law in Aotearoa conceives of whether someone is performing ‘work’ for an organisation in situations where they have not signed a formal employment agreement. For instance, one common law test, the ‘integration’ test, is whether someone’s work is part of an organisation’s daily operations or whether they are otherwise ‘integrated’ into that organisation. Another test, the ‘control’ test, asks whether an organisation can control what the prospective worker does. While it is unlikely that individual platform users could succeed in arguing under New Zealand law that they are ‘employees’ that should benefit from rights under employment law, there are elements of integration and control in the relationship between digital platforms and their users, and these tests can help sharpen our analysis of whether user-work could be thought of as ‘labour’.

To be clear, the scope of our research does not focus only on demographic or descriptive about the user, such as their birthdate or e-mail address. While user profiling based on this type of data has been foundational in digital marketing for several decades and comes with its own ethical issues (Wiedmann et al. Citation2002; Eke et al. Citation2019; Vlačić et al. Citation2021), we are focused on data produced by users through engagement with a digital platform, which may be combined with demographic or descriptive data. Further examples of exploitative data collection for AI training include recommendation engines that use activity data from users to curate suggestions for media consumption, social media platforms that harvest user posts to perform sentiment analysis, and autocomplete systems that use text from e-mails and other messages to learn how humans communicate and provide prompts. By engaging with these applications, the users are effectively providing data that, when combined with the data from many other users, becomes powerful and of significant economic value. A much more in-depth discussion on user-work and ‘unwitting labourers’ can be found in our other writing, developed as a result of these workshops (Morreale et al. Citation2023).

Scoping interviews and workshops

As part of our research project, in 2022 we ran two full-day workshops (one each in Auckland and Wellington), each with approximately twenty participants, to explore perspectives on how users are hidden humans and workers in AI systems. These workshops brought together academics, students, policymakers, civic society, computing experts, data scientists, lawyers, and other industry professionals to discuss how data is harnessed for AI platforms. A workshop setting was preferred to give the participants the opportunity to build on each other's ideas, as well as for the researchers to provide inputs and context to facilitate discussion. Prior to these workshops, we completed six scoping interviews with relevant experts across data science, consumer rights, technology education, and public policy, which helped us shape our workshops and ensure we were asking the right questions. We also held a small practice workshop with university colleagues in Auckland to test our activities and get feedback on the preliminary framing of topics for discussion.

Individuals were recruited through outreach to various organisations to ask for representatives, direct connections to the digital technology ethics community, and a public call for participants through social media. Our recruitment strategy was informed by a desire to represent as wide a range of perspectives as possible, while trying to find individuals who have previously engaged with digital technology issues to reduce the learning curve, with the intention of engagement with the broader public in future work. Appropriate human participants ethics applications were made and approved, and relevant COVID-19 protocols were observed.

For the scoping interviews, we began discussions with an explanation of the research project and asked general questions about the interviewee’s perspective on the project. We also explained how we were testing the hypothesis of whether engaging with digital platforms that leads to the creation of data for training AI systems could be framed as ‘work’, and asked the interviewee to reflect on this from their perspective and area of expertise. The conversations also involved general discussions about the state of privacy and the perspectives of Aotearoa New Zealanders towards AI-driven digital platforms.

For the workshops, we used a few exercises to help participants understand the context and issues more deeply, interspersed with discussions on key ideas and topics. After introducing the research team and participants, the first exercise involved participants emulating the role of a recommendation engine. In groups, participants were given a set of data and graphs that described the demographics (age, gender, location, employment status, genre preferences) and behaviours (previous clicks, time spent on certain pages, commenting) for a set of fictional users on a video platform. Each group was then given a set of cards depicting videos with corresponding descriptions, and participants were asked to rank the order in which those videos should be recommended to a particular user. The exercise progressed through multiple rounds with the participants being given more and more data about the users until it was simply overwhelming. This encouraged participants to think through the logical processes used by recommendation engines, and also to appreciate the complexity of those processes when applied at scale to the massive volumes of content on real-world platforms, and the necessity of using algorithms to process that data in real-time.

In the exercise debrief, we discussed the question of what the recommendation engines are optimising towards; in a commercial context this is generally ‘engagement’ to maintain or increase user attention on the platform, but participants were open to making moral judgements, such as formulating low-quality recommendations to encourage a user to stop watching videos and go to sleep. We also discussed how people tend to talk about the ‘algorithm’ making recommendations based on their own characteristics and actions (also known as ‘content-based filtering’), when in reality the behaviour of other users can affect what they are being recommended (also known as ‘collaborative filtering’). This led into a facilitated discussion about the differences between framing our identified issues at the individual-level versus the collective-level, and a separate facilitated discussion about user-work as a concept.

The second exercise involved participants identifying the power dynamics and human influences present in AI systems. These systems are developed by people, incorporate data from people, are owned by people, and affect people with their outputs. There are many financial, political, and social relationships that exist in the exchange of data for a service, and participants were split up into new groups to consider those relationships. Each group was assigned an AI platform (Netflix’s media content platform, Facebook’s social network platform, voice assistants like Google Assistant or Amazon Alexa, and Amazon’s e-commerce platform), and asked to identify the roles of humans behind those platforms through a visual diagram or map. This encouraged participants to consider the multiple motivations and exchanges of value behind these platforms, and how the decision of one user can affect multiple users. Participants discussed their own use of the platform, the value they get out of it, and the way they understand what other people get out of it.

In the exercise debrief, we discussed how the data exchange for one person is only one part that feeds into a whole system, as well as the significance of data accumulation over time. While a single piece of data from a user may be of very little value, that data can be combined with data from many others to create something of significant value. This was discussed in terms of that data being used to train AI systems, which could go beyond the context that the data was originally provided for. We also discussed where power lies, and how different groups of people are affected by data collection and use in different ways. This led to a final facilitated discussion about the issues of user-work in an Aotearoa New Zealand context, what appropriate responses or interventions might be, and where the research team should direct their focus going forward.

In each of our discussions, we faced a tricky challenge of needing to provide context and educational resources without introducing undue researcher bias; the researchers were all skilled facilitators and conscious of this in their approach, encouraging discussion between participants while also prompting quieter participants for their thoughts. Different researchers led each part of the workshop to allow for a variety of styles and the limit the impact of any poor facilitation. A mix of one-on-one, small group, and whole workshop discussions were held to allow for participants to make contributions when they felt comfortable, with a variety of verbal, visual, and tactile tasks, and we encouraged participants to provide written feedback after the workshops as well. We also provided refreshments and breaks to allow for more casual conversation and networking between participants, which helped with establishing trust ahead of the tricky conversation at the end of the day on responses and interventions.

Key discussion themes

Based on the discussions during the scoping interviews and workshops, we conducted a thematic analysis to identify common points and arguments, as well as preserve particularly notable comments even if they were only said by one person. A thematic analysis supports our initial exploratory approach, but is not a replacement for further engagement with broader audiences to understand the relative importance of these themes across varying social contexts. Limitations include the relatively small sample size, the potential for groupthink in a workshop setting, changes in the way we talked about the issues between the scoping interviews and the workshops, and potential researcher bias in how we ran the workshops and subsequently how we grouped the themes together. Below, we present a summary of some key ideas from our analysis.

User-work

We asked participants whether or not they thought users were doing work when companies harness user data derived from engagement with digital platforms to train AI systems - also described as ‘labour through consumption’. There was no consensus, as some thought that it was most certainly work, some felt that it depended on the purpose of the task at hand, and others said that if it did not ‘feel’ like work then it shouldn’t be considered work. In particular, participants with a legal or policy background felt that it did not meet a legal definition of work or labour. However, others argued that the data generated by users is replacing data that would otherwise need to be generated by human workers (e.g. through a crowdworking platform), and if it is genuinely substitutive then it should be considered work.

Participants drew analogies to other forms of labour, such as emotional labour and reproductive labour; there could be parallels to feminist theories of unpaid labour and social reproduction theory (Jarrett Citation2016), and there is interesting work on data feminism (D'Ignazio and Klein Citation2020). It can also be difficult for users to feel that they are doing work when they are socially and politically positioned as ‘consumers’, in a similar way that mothers are positioned as ‘carers’. However, discussion about labour and work was one of the most challenging parts of this research project, as different academic fields and ideologies have different conceptions of those terms. Rather than framing the discussion around work, it was suggested that we focus on the value extraction process to emphasise that there is a problem, regardless of whether we enjoy engaging with the platforms (and therefore provide data) or not.

Saliency

One common theme was the notion of whether people would care that they were producing data that was then harnessed by big technology companies (i.e. how important is this problem really?). Participants suggested that business models relying on these extractive processes had already been accepted by users and governments. Particular attention was given to the idea of whether people would care when they were receiving a product in return, especially where they receive a net benefit. Some participants considered how to make the perceived issue more salient to those consuming online services, but found this challenging as users have many challenges to consider (specific examples included cost of living, climate change) and it was hard to justify this issue as being of significant importance. We observe that this is similar in impact to the ‘privacy paradox’, where individuals intend to protect their online privacy but do not actually take action to do so, leading to a gap between attitude and behaviour (Gerber et al. Citation2018).

There was an astute observation that people’s preparedness to share their data can be strongly linked to that individual’s economic circumstance. Some people can protect their data because they have the wealth and class status to do so, and can ‘afford’ to opt-out of exploitative data practices. For example, a sufficiently wealthy person can purchase DVDs of movies and TV shows rather than having to subscribe to streaming services like Netflix if they want to. Exclusive social media networks cater to the wealthy and balance their need to influence and promote with their desire for privacy - for example, The Marque is an invite-only network where a managed profile costs almost USD$2000 per year (Rigby Citation2020). Other people have to make financial choices about whether they care about keeping their data out of the hands of a supermarket, or using a loyalty card to save money on the bread they use to feed their family. The importance of their data being used to create value is minor relative to their need to survive.

Awareness and consent

Participants also discussed transparency and awareness. Some participants pointed out that most digital platforms are very opaque, making it very difficult for a user of a platform to have any idea of how their data is being used. When users do manage to gain an idea of how their data is being used, and that it goes beyond the reach of what they thought, this often makes them uncomfortable (The Chartered Institute of Marketing Citation2018; Auxier et al. Citation2019). Some participants noted that the pernicious act of user-work is in not informing the user that their usage of the platform leads to the training of AI systems, rather than the lack of compensation. Participants commented on the lack of transparency about data use across both the private and public sectors, and that more should be done to make users aware that they are part of the processes that make AI models powerful. Participants noted that policymakers also needed to be more aware of these issues.

However, we note that even with awareness, users often find themselves ‘locked-in’ and have little choice but to continue to use the platform (for example, because their social network is using the same platform, or because data related to their content consumption preferences are already captured). Many platforms are designed to reduce ‘churn’ (the rate at which users leave the platform) and to maximise user addiction (Seaver Citation2022) through embedding personal data and high switching costs (Prud'homme Citation2019). To simply opt-out of a system might hamper an individual’s ability to fully participate in society. Participants pointed out examples of where individuals are forced to engage with these platforms through work or school. For example, if a workplace uses the Google Workspace suite, then people cannot avoid certain types of data being extracted in that context. Similarly, many schools now use a Microsoft or Google platform, and students have no ability to consent to the use of these technologies in their schools. In both of these cases, text typed by users is being used to train autocomplete and text generation AI systems (Chen et al. Citation2019). In these examples, consent is illusory and cannot be freely given.

Trust and fairness

Participants discussed trust in platforms and the fairness of the exchange when people give their data (knowingly or unknowingly) to platforms. Many agreed that increasing trust around these systems was both needed and valuable. However, trust is a multifaceted issue and there are a wide variety of perspectives. Some people may not trust the government with their data and would trust private industry more, to the extent that they would prefer overseas cloud companies to store their data so that it is out of the jurisdiction of the New Zealand government. Others would prefer that we maintain digital sovereignty physically within our borders, and that we need to build homegrown trust. For example, the idea of a locally operated social media platform, as theorised by Muldoon (Citation2002), whether owned by the government or by a private sector entity, was floated by participants several times, acknowledging that building social licence and trust for such a platform would be a slow and difficult process.

Economic cost and benefit

One topic that we discussed with the participants was framing the issue in terms of financial or economic value at the collective level. While the data derived from a single interaction from an individual user may be worth a fraction of a cent to a platform, the value could be very significant when aggregated across all of the customer’s interactions and across all customers. We may not see or feel the harm because it is meaningless for an individual, and therefore remains invisible. We posited that Aotearoa New Zealanders could be collectively performing millions of dollars of unpaid labour for digital platforms - economic value that is not then recognised locally and does not generate tax revenue. However, many participants thought that framing the issue in terms of collective monetary damage was too limiting without an appeal to morality, and that we might struggle to demonstrate that the financial harm is sufficiently large to justify intervention.

There was also discussion about the balance between the concerns around user-work and the economic value that this type of innovation and data can produce. Data is valuable as an instrument for digital innovations and some felt that Aotearoa New Zealand should be at the forefront of that innovation for national economic interest. Participants from the public sector particularly expressed their interest in the potential of machine learning to make the provision of public services more efficient and reduce financial costs, and were concerned about the potential for regulation to hamper that innovation. However, participants acknowledged that very little of the economic benefit derived by the major platforms would land in Aotearoa New Zealand.

We were also reminded that the benefits that these technologies bring may outweigh the problems created. For example, in Pasifika communities, the use of social media platforms to communicate with family members overseas has significantly reduced the cost of maintaining those social bonds. Not having this option would be immensely damaging to community and society. This also reminded us that the impacts, both positive and negative, may not be equally felt across society.

Responses and interventions

Having discussed the issues widely with the participants, we asked them that if they accepted that the problem was real, what they would want to do about it. Participants agreed that building awareness is the first step, but were in less agreement over whether or not regulatory intervention was justified or what that might look like. Suggestions ranged from introducing stronger privacy protections, establishing a duty of care on digital platforms, targeting taxation towards digital services, using antitrust and competition law, and better media regulation. However, these were generally considered to be relatively controversial interventions that would need to be carefully examined to ensure that they would achieve the intended effect. Explainability to the public was considered an important part of any regulation.

There was significant pushback against the idea that these issues should be simply dealt with at the individual-level through more informed consent. Some participants emphasised that this would lead to those worst impacted being forced to act, because they feel those impacts more strongly than the general public. We agree that responsibility should not fall upon those who have less power, and individuals are relatively powerless against the corporations who are collecting the data (Graef and van der Sloot Citation2022). Participants felt that since the harm often occurs at the community level, any solutions would need to be at that level too. In Aotearoa New Zealand, a Te Tiriti-led approach is also paramount, and Māori Data Sovereignty is a key element (West et al. Citation2020).

In a Te Ao Māori context, we discussed the concept of rāhui with participants, a restriction that is introduced to protect the future wellbeing of people and the environment, with the intent of restoring mauri or life essence (Maxwell and Penetito Citation2007). While generally these have been applied towards the physical world (e.g. prohibiting harvest of certain seafood species to allow stocks to rebuild, or restricting access to an area to prevent the spread of disease), it was suggested that where there is sufficient harm being created by a digital platform that a rāhui could be an appropriate collective action to protect individual users. However, just because a rāhui is in place does not mean that it will be observed by all people, and there was also discussion about using regulation and legislation to strengthen a rāhui in this context.

There was also discussion about whether or not communities or even Aotearoa New Zealand as an entire country have enough power to influence the actions of the large digital platforms. Comparisons were made to the European Union’s introduction of the General Data Protection Regulation (GDPR) as an act of moral leadership, and we noted the subsequent effects of regulatory diffusion globally (Greenleaf Citation2018; Hu Citation2019). Recent efforts in international co-ordination on digital issues such as the Christchurch Call have provided a model for how an appeal to morals and values can lead to greater co-operation. At the same time, the diffusion of regulatory processes is very slow and rely on the goodwill of individual leaders and decisionmakers.

Some participants suggested that we make ‘homegrown’ alternatives to some of the more common platforms, such as a domestic social media network. For example, China has essentially been forced to establish a self-sufficient platform economy (Hermes et al. Citation2020). However, other participants pointed out that this may be tricky to achieve, because it is the aggregation of data and users at scale that leads to the emergence of a good, accurate, and enticing product of value. Therefore, it may be difficult to construct an Aotearoa alternative that can provide the same level of value as one of the larger platforms. Some locally developed platforms do exist, such as Neighbourly, but its popularity relative to international platforms like Facebook indicate that there is something lacking in local offerings.

One of the key themes that appeared multiple times was that this area has little funding to support the development of new responses and interventions. As there has been more interest in recent years, there was hope that this may lead to more funding, but without that it is difficult to develop the depth-of-thinking or expertise needed to produce meaningful, locally-relevant solutions. Participants also found it difficult to identify which government agency would be responsible for these types of issues and funding, acknowledging that there are multiple agencies each doing a little bit at the moment. It was noted that the Algorithm Charter has helped establish common principles between government agencies, but that there is still a lot of work to be done in building capacity to understand these types of issues (Chen Citation2022). There was discussion about establishing a cross-agency community of practice to help raise awareness – at the time of writing several months later, there is work at StatsNZ to establish a Centre for Data Ethics and Innovation that could provide that hub for the community.

There was also some discussion about where the research team should go from here. Some participants suggested that writing a report about the issue of user-work with practical policy suggestions could be helpful for the public sector. Others focused on raising awareness generally, such as by releasing podcasts or other shareable content, and continuing to produce outputs through writing articles and organising workshops. In a separate paper, we aim to further develop the theoretical framework that underpins the relationship of labour exploitation between the digital platform and the user. In terms of translating our work into tangible impact, continuing to engage with the public sector and raising awareness amongst policymakers is critically important.

Conclusions

We have examined the ethical issues associated with how digital platforms use data generated by users to train AI systems to create value, leveraging a lack of transparency and awareness to alter the power dynamics in favour of system owners. We undertook to pragmatically demonstrate how powerful AI systems are with our workshop exercises, and to help participants gain a tactile understanding of how AI systems make decisions differently to humans. We have promoted the idea that interactions with digital platforms may be considered ‘work’ in the most literal sense of the word, and that the harm of unrecognised work is meaningful in aggregate across all users over time. While not all participants agreed with that idea, it prompted useful discussions about the way we describe our interactions with digital platforms and served to illustrate exploitative data practices. Participants focused on values, emphasising moral or ethical reasons for users to be compensated over financial or economic justifications.

Most importantly, we discussed how this type of value transfer affects people in Aotearoa New Zealand, and brought together people to have discussions about what is being done and what can be done. Part of the value of the research was in bringing a diverse range of people into the same room together to discuss these issues. For example, policymakers gained a better understanding of some of these issues, and the research team were also able to benefit from the knowledge and perspectives of people in the room. We acknowledge that we could have improved on our engagement with Māori, in that while we had some interaction with Māori we feel that we did not get an appropriate level of Māori representation in the research. In being able to understand and make decisions that appropriately represent the values of Aotearoa New Zealand, it is critical that all voices are heard.

Due to the nature of a large transdisciplinary research team, we all came into the project with different understandings of this topic. While transdisciplinary research is difficult and can be slow, the most useful thing about incorporating multiple perspectives is how rigorously our ideas and solutions are tested. Aotearoa New Zealand is a country with many and diverse perspectives, and while it is important to have a unified direction with which we can approach these issues, we also need to be aware of all the people we are asking to come on the journey with us, and how they will be affected by policy and collective actions.

Acknowledgements

We thank all of the participants in our scoping interviews and workshops, who shared their time and expertise to have conversations with us and discuss these issues. We also acknowledge the University of Auckland Transdisciplinary Ideation Fund for providing research funding for this project.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work was supported by the University of Auckland Transdisciplinary Ideation Fund.

References

  • Almeida D, Shmarko K, Lomas E. 2022. The ethics of facial recognition technologies, surveillance, and accountability in an age of artificial intelligence: a comparative analysis of US, EU, and UK regulatory frameworks. AI and Ethics. 2:377–387.
  • Auxier B, Rainie L, Anderson M, perrin A, Kumar M, Turner E. 2019 Nov 15. Americans and privacy: concerned, confused and feeling lack of control over their personal information. Washington (D.C.): Pew Research Center. https://www.pewresearch.org/internet/2019/11/15/americans-and-privacy-concerned-confused-and-feeling-lack-of-control-over-their-personal-information/.
  • Barbosa NM, Chen M. 2019. Rehumanized crowdsourcing: a labeling framework addressing bias and ethics in machine learning. 2019 CHI Conference on Human Factors in Computing Systems, (543). doi:10.1145/3290605.3300773.
  • Chen AT-Y. 2022. The Algorithm Charter / He Tūtohi Hātepe mō Aotearoa. In: A. Pendergast, K. Pendergast, editors. More zeros and ones: digital technology, maintenance, and equity in aotearoa New Zealand. Wellington: Bridget Williams Books; p. 135–150.
  • Chen MX, Lee BN, Bansal G, Cao Y, Zhang S, Lu J, Tsay J, Wang Y, Dai AM, Chen Z, et al. 2019, July. Gmail smart compose: real-time assisted writing. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2287–2295.
  • D'Ignazio C, Klein LF. 2020. Data Feminism. Cambridge: MIT Press.
  • Edwards C. 2022. Competition makes big datasets the winners. Communications of the ACM. 65(9):11–13.
  • Ekbia HR, Nardi BA. 2017. Heteromation, and other stories of computing and capitalism. Cambridge: MIT Press.
  • Eke CI, Norman AA, Shuib L, Nweke HF. 2019, September. A survey of user profiling: state-of-the-art, challenges, and solutions. IEEE Access. 7:144907–144924.
  • Fort K, Adda G, Cohen KB. 2011. Amazon mechanical turk: gold mine or coal mine? Computational Linguistics. 37(2):413–420.
  • Gerber N, Gerber P, Volkamer M. 2018, August. Explaining the privacy paradox: a systematic review of literature investigating privacy attitude and behavior. Computers & Security. 77:226–261.
  • Gilbertson S. 2007, May 25. Recaptcha: fight spam and digitize books. WIRED; [accessed 2023 Jan 3]. https://www.wired.com/2007/05/recaptcha-fight-spam-and-digitize-books/.
  • Google. 2022, July 18. The size and quality of a data set | machine learning. Google Developers; [accessed Jan 3 2023]. https://developers.google.com/machine-learning/data-prep/construct/collect/data-size-quality.
  • Graef I, van der Sloot B. 2022. Collective data harms at the crossroads of data protection and competition law: moving beyond individual empowerment. European Business Law Review. 33(4):513–536.
  • Greenleaf G. 2018, June 12. Global convergence of data privacy standards and laws. UNSW Law Research Paper, (18-56).
  • Healy M, Flores M. 2018. Is reCaptcha training robocars? Ceros; [accessed 4 Jan 2023]. https://www.ceros.com/inspire/originals/recaptcha-waymo-future-of-self-driving-cars/.
  • Hermes S, Clemons EK, Schreieck M, Pfab S, Mitre M, Böhm M, Wiesche M, Kremar H. 2020. Breeding grounds of digital platforms: exploring the sources of American platform domination, China's platform self-sufficiency, and Europe's platform gap. Proceedings of the 28th European Conference on Information Systems (ECIS). 132.
  • Hill K. 2022. The secretive company that might end privacy as we know it. In: Martin K, editor. Ethics of data and analytics: concepts and cases. New York: Auerbach Publications.
  • Hu IY. 2019. The global diffusion of the ‘general data protection regulation’. (GDPR) [Master's Thesis]. Erasmus University Rotterdam.
  • Jarrett K. 2016. Feminism, labour and digital media: The digital housewife. New York: Routledge, Taylor & Francis.
  • Lehtiniemi T, Ruckenstein M. 2022. Prisoners Training AI: ghosts, humans, and values in data labour. In: D. Lupton, S. Pink, M. Berg, editors. Everyday automation: experiencing and anticipating emerging technologies. London: Routledge; p. 184–196.
  • Maxwell KH, Penetito W. 2007. How the use of rāhui for protecting taonga has evolved over time. MAI Review. 2:1–15.
  • MBIE (Ministry of Business, Innovation, and Employment). 2022. Consultation on modern slavery and worker exploitation; [accessed Feb 7 2023] https://www.mbie.govt.nz/have-your-say/modern-slavery.
  • Morreale F, Bahmanteymouri E, Burmester B, Chen A, Thorp M. 2023. The unwitting labourer: extracting humanness in AI training. AI & Society: Knowledge, Culture and Communication. (forthcoming)
  • Muldoon J. 2002. Platform socialism: How to reclaim our digital future from big tech. London: Pluto Press.
  • Perez S. 2012, March 29. Google now using ReCAPTCHA to decode street view addresses. TechCrunch; [accessed Jan 4 2023]. https://techcrunch.com/2012/03/29/google-now-using-recaptcha-to-decode-street-view-addresses/.
  • Perrigo B. 2023. Exclusive: OpenAI used Kenyan workers on less than $2 per hour to make ChatGPT less toxic. TIME. Jan 18. [accessed 2023 Jan 3]. https://time.com/6247678/openai-chatgpt-kenya-workers/.
  • Prud'homme D. 2019 Sep. How digital businesses can leverage the high cost for consumers to switch platforms. London: London School of Economics Business Review.
  • Rigby R. 2020, March 3. Society will pay the price if the rich want to buy privacy. Financial Times. https://www.ft.com/content/16ca7362-2272-11ea-b8a1-584213ee7b2b.
  • Seaver N. 2022. Computing taste: algorithms and the makers of music recommendation. Chicago: University of Chicago Press.
  • The Chartered Institute of Marketing. 2018, November 27. Trust in business use of data low despite GDPR. https://www.cim.co.uk/newsroom/release-trust-in-business-use-of-data-low-despite-gdpr/.
  • The Economist. 2022, June 11. Huge “foundation models” are turbo-charging AI progress; [accessed 2023 Jan 4]. https://www.economist.com/interactive/briefing/2022/06/11/huge-foundation-models-are-turbo-charging-ai-progress.
  • Vlačić B, Corbo L, Costa e Silva S, Dabić M. 2021, May. The evolving role of artificial intelligence in marketing: a review and research agenda. Journal of Business Research. 128:187–203.
  • Walton N, Nayak BS. 2021, May. Rethinking of Marxist perspectives on big data, artificial intelligence (AI) and capitalist economic development. Technological Forecasting and Social Change. 166:120576. https://www.sciencedirect.com/science/article/abs/pii/S0040162521000081.
  • West, K., Hudson, M., & Kukutai, T. (2020). Data ethics and data governance from a Māori world view. In: George L, Tauri J, MacDonald L, editors. Indigenous research ethics: claiming research sovereignty beyond deficit and the colonial legacy. Vol. 6. p. 67–81. Bingley: Emerald Publishing Limited.
  • Wiedmann K-P, Holger B, Gianfranco W. 2002, January. Customer profiling in e-commerce: methodological aspects and challenges. Journal of Database Marketing & Customer Strategy Management. 9:170–184.