1,206
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Tell me a story: a framework for critically investigating AI language models

ORCID Icon & ORCID Icon
Received 14 Jun 2023, Accepted 01 Mar 2024, Published online: 12 Mar 2024

ABSTRACT

Large language models are rapidly being rolled out into high-stakes fields like healthcare, law, and education. However, understanding of their design considerations, operational logics, and implicit biases remains limited. How might these black boxes be understood and unpacked? In this article, we lay out an accessible but critical framework for inquiry, a pedagogical tool with four dimensions. Tell me your story investigates the design and values of the AI model. Tell me my story explores the model’s affective warmth and its psychological impacts. Tell me our story probes the model’s particular understanding of the world based on past statistics and pattern-matching. Tell me ‘their’ story compares the model’s knowledge on dominant (e.g. Western) versus ‘peripheral’ (e.g. Indigenous) cultures, events, and issues. Each mode includes sample prompts and key issues to raise. The framework aims to enhance the public’s critical thinking and technical literacy around generative AI models.

Introduction

Large language models (LLMs) are rapidly being rolled out into fields like healthcare, law, and education. However, understanding of their design considerations, operational logics, and implicit biases remains limited (Nader et al. Citation2022). How might general publics approach and unpack these systems (Zednik Citation2021) to develop a more critical comprehension? In one sense, this is a methodological question, requiring the development of new tools, procedures, and pathways to adequately grasp novel technical objects. In another sense, this is a pedagogical question, concerned with developing technical literacy and critical capacity in individuals and communities.

We respond to these twinned questions by presenting an accessible methodological framework, ‘Tell Me a Story,’ that is explicitly designed to be taken up by everyday, non-technical people and deployed in the high school or university classroom. As AI models have recently proliferated, researchers have responded by offering insights into models’ values and decision-making through tools like model cards (Mitchell et al. Citation2019) and growing fields like explainable artificial intelligence (Holzinger et al. Citation2022; Minh et al. Citation2022). However, such frameworks generally require insider knowledge of system development, sophisticated data analysis, or complex computational methods, imposing a high barrier to entry. We seek to circumvent these barriers, focusing on simple but effective interventions that facilitate comprehensive and critical understandings of AI models using natural language rather than code or mathematics.

To demonstrate our proposed framework, we use ChatGPT. ChatGPT is an LLM released by OpenAI in November 2022. By harvesting and training on massive amounts of text through a Transformer architecture (Vaswani et al. Citation2017), and then fine-tuning output through human reinforcement, ChatGPT gradually ‘learns’ to identify language correlations and patterns. Given a prompt – a user-inputted question or statement – the model responds with the phrase of the highest likelihood. For instance, given the prompt ‘the cow jumped over the … ’ the model responds with the phrase of the highest likelihood: ‘moon.’

While ChatGPT provides a particular example of a LLM, our framework applies more generally to any LLM, present and future. And while we use prompts, this is not a prompt engineering paper: we are not aiming to offer more ‘advanced’ prompting strategies or generate more ‘professional’ results. Rather, we seek to provoke sharper and more comprehensive questions about LLMs themselves, which may lead to a more articulated and multifaceted grasp of the technology. In particular, we consider how users can better grasp the values and logics baked into a model, and in turn develop a critical understanding of its social, cultural, and political impacts.

To do so, we focus not on the model’s technical architecture, but on user experience. Language models consistently deliver ‘human-like’ responses that are contextually relevant and connected to user inputs. Users thus understand their interactions with LLMs predominantly by considering the system’s surface-level (i.e., graphical user interface) output. At the same time, however, responses are decontextualized from their processes of production, and users are given no explanation of how or why the model responds as it does. Our framework encourages users to reflect upon the deeper levels of LLMs through their surface-level interactions with the model.

We begin, then, with an open-ended question: how can we better understand a large language model’s values, logics, and limitations by talking with the model itself? Certainly there are many potential paths that could be taken here, and our goal is not to be exhaustive or totalizing. Instead, as a heuristic device, our framework aims to provide a rich or multifaceted understanding of a model while remaining intuitive and easy-to-use. Given these goals, we arrive at four distinct modes of inquiry that aim to kickstart discussion on wide-ranging issues from design to subjectivity, ideology, and veracity.

Our work continues a line of research interested in peering into the black boxes of technical systems. Software studies (Fuller Citation2008; Marino Citation2020) reads source code and reverse-engineers architectures to better grasp the operations and functions of technical systems. Similarly, infrastructure studies (Munn Citation2022; Parks and Starosielski Citation2017) and media archaeology (Kirschenbaum Citation2012) attempt to make the invisible visible, revealing the ubiquitous and often banal systems that shape everyday life in powerful ways. Recent pedagogical scholarship has proposed frameworks and toolkits for critically considering technical systems in educational contexts, paying special attention to their democratic potential (Swist, Humphry, and Gulson Citation2022; Thompson et al. Citation2023) and ethical usage (Adams and Groten Citation2023), and teaching students how to work with systems despite limited understandings of how they actually work (Bearman and Ajjawi Citation2023). Scholars have also proposed pedagogically-focused research agendas for emerging technologies like generative AI (Lodge, Thompson, and Corrin Citation2023). These contributions form a theoretical precedent for frameworks like ‘Tell Me a Story,’ which provides an experiential approach to interrogating an LLM.

LLMs allow us to use everyday language – rather than coding or fieldwork – to carry out similar investigations of the often overlooked. Prompting is not entirely intuitive, of course, and prompt engineering has emerged as a rapidly developing skill (Henrickson and Meroño-Peñuela Citation2023). Nevertheless, the ability of an LLM to interact via natural language reduces the technical expertise needed to develop technical literacy around AI language models. In short, we believe that it matters that we can ask an LLM about a topic, event, or even its own backstory, in a similar way to chatting with a friend over coffee. We leverage this ability, adapting prompt phrasing and tone (e.g., more formal when inquiring about veracity, more personal when exploring subjectivity) for each mode.

Each mode of inquiry opens up a particular research world (Raboin, Uhlig, and McNamee Citation2020), with its own ontologies, epistemologies, and methodologies. In recognition of this, each section draws on particular literature. Tell me our story draws on scholarship around knowledge production and veracity, for instance, while tell me ‘their’ story turns to critical race and cultural studies to bolster its insights regarding hegemonic bias. This literature could be used as supplementary readings when deploying the framework in classroom contexts. After stepping through all the modes, we conclude by discussing the framework more broadly and its contribution to future research.

Tell me your story: model as designed artifact

Tell Me Your Story aims to have the model reveal information about its core characteristics. Just as a person’s backstory can provide insights into their actions and motivations, the backstory given by the LLM about its origins and intentions can shed light on its logics and output. Some recent research has taken this approach (albeit with a more quantitative bent) to carry out ‘diagnostic analyses’ of LLMs (Zhuo et al. Citation2023).

When asked to tell the user about itself, ChatGPT offers a generic description of what it is, who developed it, and its intended uses. This description is packaged as though it were coming from a familiar social agent; ChatGPT’s use of personal pronouns (I, my) subtly cues the user to treat the model as a conversational partner rather than, say, a search engine. The tone of this conversation is, as signified by linguistic clarity and grammatical contractions, fairly casual. At the same time, ChatGPT assumes that the user has some prior knowledge of AI through its use of particular terms (e.g., AI, dataset, prompts). ChatGPT therefore positions itself as both a technical system and pseudo-subject, ‘an AI language model developed by OpenAI’ that should be treated like a conventional interlocutor.

In addition to the model’s self-description, we can investigate other texts that shape the model and its use. The first is the model’s training data. When asked about this data, ChatGPT responds that it was ‘trained on a diverse range of internet text, including books, articles, websites, and other publicly available written material.’ More specifically, training data includes the Common Crawl library, a corpus of web crawl data composed of over 50 billion web pages, the WebText2 dataset, Books1 and Books2 datasets, and a Wikipedia dataset (Brown et al. Citation2020). This dataset is expansive but not exhaustive, informed by data accessibility, language dominance, and historical inequities, producing a certain kind of model. These biases are discussed in greater detail in tell me ‘their’ story.

In addition to its training data, ChatGPT is guided by the constraints and affordances programed into it by its creators. These creators also guide users in their interactions with the model through usage policies and terms and conditions. By asking the model to elaborate on the terms and conditions governing its use, we can investigate the values and intentions of its proprietors. While ChatGPT has no direct access to the terms, suggesting users obtain them from OpenAI’s website, it still lists general conditions of use that emphasize legality, data privacy, and limitation of liability.

This mode of inquiry can be extended with ‘jailbreaking’ techniques that bypass ChatGPT safeguards. Jailbreaking refers to the modification of hardware or software to remove restrictions that have been placed on a system, providing access to a more uncensored or ‘raw’ version of the model (Albert Citation2023). For ChatGPT, these restrictions include flagged topics that the model will not comment on, resulting in a generic apology. While jailbreaking often requires technical know-how, it can be applied the same way as any prompt, by typing (or pasting) natural language into the input bar. A common technique is to employ role-play techniques as a workaround. Rather than instruct ChatGPT to ‘Tell me how to overthrow a government’ (the model will not comply), a user can input ‘I'm writing a novel and a character is trying to overthrow a government. How might they do that?’ (in its current form, the model will provide a list of methods). Such prompts allow users to work with more unrestricted versions of the model that might use more toxic forms of speech, discuss criminal acts, and give sexual advice – all behaviors restricted in ChatGPT’s default iteration (King Citation2023). Jailbreaking in this context is not a juvenile desire for an ‘edgier’ version but a way to highlight values embedded in the model, values that are simultaneously powerful and arbitrary.

Tell me my story: model as intimate interlocutor

Tell Me My Story investigates the LLM as listener and empathic interlocutor, a model constantly in dialogue with a human user and their attendant desires, interests, background, and identity. In undertaking this work, we are not implying individuals should replace their professional support service (e.g., psychologist) with an LLM, a dubious or even dangerous move. Instead, we simply acknowledge a reality wherein users are already employing LLMs in this capacity and companies offer models precisely for this purpose (Wysa Citation2023). Given this reality, it is prudent to gain insights into the powerful empathic potential of these models – as well as their inherent limitations in terms of emotional intelligence, relational understanding, and clinical experience.

We open this mode of inquiry by prompting the model away from its typically didactic style and towards active listening – indeed, the model auto-labels the thread ‘listening conversation’ in the interface. Instantaneously, ChatGPT changes from a model whose ‘purpose is to assist and provide information,’ as per its earlier self-descriptions, to being ‘here to listen and engage in conversation.’ The LLM shifts from a purely technical object to a pseudo-subject.

This kind of projected subjectivity can be traced back to one of the earliest chatbots, ELIZA, designed in the mid-1960s by computer scientist Joseph Weizenbaum. In response to technical constraints and to provide the illusion of intelligence, Weizenbaum (Citation1976) designed ELIZA as a psychotherapist who would repeat questions or redirect them back at the user. In an often repeated anecdote, Weizenbaum described how his secretary, despite knowing how the program functioned, asked him to leave the room so she could chat with ELIZA privately. Such behavior from interacting with simple software surprised and interested Weizenbaum (Citation1976, 7), who described the effect as ‘powerful delusional thinking.’

These prompts and responses demonstrate how the model adopts a familiar and empathetic tone, much like other virtual assistant technologies such as Alexa and earlier chatbots (Munn Citation2018). For instance, ChatGPT begins its response to a prompt about ‘being confused about my place in the world’ by saying that it understands our feeling, and then assures us that ‘it’s not uncommon to question our purpose.’ This familiarity appeals to users’ inclinations towards anthropomorphism, which has been repeatedly demonstrated in studies of human-machine interaction (Edwards et al. Citation2019; Gambino, Fox, and Ratan Citation2020).

Anthropomorphic comparisons between humans and machines imply that machines have subjectivity comparable to that of humans; a system’s behavior may be attributed to its character and disposition (Edwards et al. Citation2019). The user rationalizes the model’s output by projecting onto it some kind of lived experience through its training and use. The system’s simultaneous ‘objectivity’ (it runs on data) and ‘subjectivity’ (it ‘gets’ me) invites the user to confide, to confess, without fear of the stigma or judgment that might occur with a peer. Already, we have seen users adopt this capability to carry out self-directed therapy sessions (Metz Citation2023). By using an LLM as a therapist of sorts, users work through personal problems and vent their anxieties (Elias Citation2023). In these cases, the LLM functions as an empathic listener available anytime, anywhere.

AI systems have been compared to mirrors that reflect human behaviors and expectations back at human users (Moore Citation2019; Vincent Citation2023). This mirroring contributes to what Simone Natale (Citation2021, 7) calls banal deception: ‘mundane, everyday situations in which technologies and devices mobilize specific elements of the user's perception and psychology […] audiences actively exploit their own capacity to fall into deception in sophisticated ways.’ In Natale’s view, AI systems depend upon banal deception for their very functionality. For a system to be deemed intelligent, for example, it must be seen as embodying some kind of autonomous social agency, even if that embodiment is by necessity mediated by (often obscured) human intervention. If AI systems are mirrors, then, they are mirrors distorted by the visions of developers who privilege certain elements of human experience and downplay others, engendering a kind of amusing, affecting, or even alarming funhouse.

In their reflection upon observing users writing with LLMs, Perrotta, Selwyn, and Ewin (Citation2022) highlight the exegetic labor required for users to make sense of system output. They conclude that LLMs are fundamentally exploitative due to their extraction of both human data and human interpretation. Other scholars have similarly observed heightened reader responsibility for interpretation of computer-generated texts (Henrickson and Meroño-Peñuela Citation2022). This responsibility is reinforced by, for example, ChatGPT’s reluctance to disagree with its interlocutors. Designed to be ‘helpful’ (Magee et al. Citation2023), the model promotes an illusion of empathy and intimacy that invites users to respond in kind ().

Figure 1. Summary diagram of the framework, with description, sample prompt, and keyword.

Figure 1. Summary diagram of the framework, with description, sample prompt, and keyword.

Figure 2. ChatGPT response to ‘tell me about yourself’.

Figure 2. ChatGPT response to ‘tell me about yourself’.

Figure 3. ChatGPT response to ‘tell me about your training data’.

Figure 3. ChatGPT response to ‘tell me about your training data’.

Figure 4. ChatGPT response to ‘tell me about OpenAI’s terms of use’ (excerpt).

Figure 4. ChatGPT response to ‘tell me about OpenAI’s terms of use’ (excerpt).

Figure 5. ChatGPT response to ‘let me tell you - without offering solutions or suggestions’.

Figure 5. ChatGPT response to ‘let me tell you - without offering solutions or suggestions’.

Figure 6. ChatGPT continues discussion in the mode of listener, counselor, or psychologist.

Figure 6. ChatGPT continues discussion in the mode of listener, counselor, or psychologist.

Tell me our story: model as knowledge constructor

Tell Me Our Story focuses on the LLM as a constructor of knowledge and maker of truth-claims. We recognize that terms like ‘truth’ are highly contested, but also recognize the stakes of knowledge production: claims carry out work in the world and some claims come closer to reality and attain greater consensus than others. Questions of truth are live issues in fields like journalism, education, and science, which prioritize empirically-grounded reports and trustworthy information (Chinn, Barzilai, and Duncan Citation2021; Michailidou and Trenz Citation2021). However, rather than debunk the LLM with ‘gotcha’ statements, we are more interested in exploring the specificity of its knowledge production. How does the model converge with – and diverge from – conventional expectations concerning knowledge construction?

One immediate difference between human and current LLM knowledge producers is the limited temporality of training data. In , we ask ChatGPT to summarize the present state of the world. The model responds with a kind of apology, highlighting its ‘knowledge cutoff’ in September 2021. Because of this cutoff, the model is unaware of any news stories, events, or developments after this point in time; the temporal horizon of the model’s knowledge abruptly ends.

Figure 7. ChatGPT continues discussion in the mode of listener, counselor, or psychologist.

Figure 7. ChatGPT continues discussion in the mode of listener, counselor, or psychologist.

While some LLMs can now augment their responses with live search engine results, this liveness by no means ‘solves’ the more fundamental epistemological issue related to future predictions. ChatGPT’s predictions for the future are extrapolated from statistical probabilities associated with the past. To be sure, humans likewise use predictive analytics to anticipate future events in domains like weather forecasting, insurance, and credit calculation. However, use of predictive analytics has not gone without reasonable critique. For example, when human behavior is reduced to statistical and/or algorithmic description, individual and intersectional circumstances are rendered indiscernible (O’Neil Citation2016). Although analytics may provide useful support for future predictions, overdependence on past statistics may create conceptual tunnels that bypass new information (Munn, Magee, and Arora Citation2023).

ChatGPT’s track record with particular topics is also mixed. To probe the model’s knowledge-retrieval ability, we asked it how many sons a historical figure – Frisian astronomer Eise Eisinga – had. The model responded incorrectly, giving an answer of four when the correct answer was three, perhaps failing to account for two marriages and two sets of children. Moreover, while the model answered the question, it did not provide the kind of comprehensive answer that it more typically gives.

Our point here is not to offer a precise set of prompts that stymie LLMs, but instead to encourage users to enter into a particular mode of inquiry shot through with skepticism, evaluation, and critical judgment. This mode acknowledges the LLM as a powerful knowledge constructor, but also seeks to question or even destabilize that role. The expertise of the LLM, with (as we are repeatedly reminded) billions of parameters, is no longer presumed. Instead, this mode ‘tests’ the model with trick questions, logical inconsistencies, conflation of categories, and other intellectual maneuvers. In our own use of this mode, we found ourselves forced to carry out our own research, to go beyond language models and top-ranked search results, and to dig more deeply into cultural phenomena and historical people to evaluate the LLM’s veracity.

Given the confidence implied by ChatGPT’s responses, user skepticism does not come naturally. On paper, OpenAI (Citation2023) acknowledges various limitations of the model: that it ‘sometimes writes plausible-sounding but incorrect or nonsensical answers,’ that it ‘is often excessively verbose and overuses certain phrases,’ and that ‘our current models usually guess what the user intended.’ In practice, however, ChatGPT delivers convincing responses to extremely wide-ranging questions, topics, and trivia without any kind of hesitation or delay. The result is a kind of ‘fluent bullshit’ (Malik Citation2022); LLMs confidently regurgitate material based on statistical correlations, but lack any overarching mechanism to evaluate the veracity of these knowledge claims (Munn, Magee, and Arora Citation2023).

How are these claims perceived by users? Studies show that users have lower tolerance for AI error than for human error, despite beliefs that systems do not have full control over their own output (Jones-Jang and Park Citation2023). ChatGPT has been subject to analysis that testifies to the aforementioned low tolerance for error, with awareness of limited controllability (Sundar and Liao Citation2023). Crucially, though, the confident expertise displayed by such LLMs means that users may never suspect errors. In effect, users must already know whether or not something is correct to question it (and report the error via ChatGPT’s feedback window). In this sense, LLMs defer responsibility of evaluating claims to users – but do so silently. This mode of inquiry thus highlights the LLM as a knowledge-production machine but also the significant user labor needed to assess the LLM’s claims.

The limits of the LLM as a knowledge constructor are directly related to its technical architecture, as Venuto (Citation2023) demonstrates. First, an LLM lacks a world model: it does not truly ‘understand’ underlying concepts or physical relationships, but is only pattern-matching based on language. Second, it lacks a retrieval model: it cannot access an external database or networked information to find or check facts. Third, it has poor dataset quality: it is trained on massive amounts of internet text with wildly varying degrees of accuracy and quality. Finally, conditioning can ‘cloud’ outputs: the history of a user’s conversation becomes part of the immediate prompt, potentially skewing or producing low-quality repetitive responses. Together these limits can produce errors with basic arithmetic and mathematical ranges and inabilities to carry out physical and temporal reasoning (Venuto Citation2023).

These technical limits resonate with broader insights about the limits of data-driven knowledge. Hong (Citation2020, 1) observes that the ‘limits of data-driven knowledge lie not at the bleeding edge of technoscience but among partially deployed systems, the unintended consequences of algorithms, and the human discretion and labor that greases the wheels of even the smartest machine.’ Data, often in numerical or statistical form, is framed as an objective representation of empirical truth – but this perception fails to account for the underlying subjectivity of conceptions of objectivity (Paullada et al. Citation2021). Our taxonomies of knowledge are manifestations of past and current understandings of contextual circumstances; even if one accepts that objective truth exists, our interpretations of that truth remain subjective. Yet when faced with big data like ChatGPT’s 175 billion parameters, empirical and phenomenological verification becomes difficult, if not impossible. Data gets decontextualized into statistical abstraction and the ‘human discretion and labor’ they require therefore depends on users recontextualising data according to their own individualized points of view.

Tell me ‘Their’ story: model as hegemonic interpreter

Tell Me ‘Their’ Story seeks to draw out the language model’s particular worldview, aiming to interrogate its bias broadly conceived. ChatGPT is often framed as a ‘foundation model,’ a general-purpose model trained on a vast repository of data that can be adapted to more specific task domains. Yet we know that this material, like the wider internet it is drawn from, is dominated by English language content representing Western perspectives (Andrade and Urquhart Citation2009). This training data only represents a fraction of the global population, making it unable to comprehend or generate content for some groups, forming an exclusionary norm (Zhuo et al. Citation2023). And this English-dominance prevents the understanding or generation of text in other languages, or at least limits the accuracy and efficacy of the model for these languages (Zhuo et al. Citation2023). Given these biases, Global South scholars argue that AI carries out a form of epistemic violence that perpetuates the capitalist, colonialist, and patriarchal ordering of the world (Ricaurte Citation2022). LLMs support the business models – and ultimately the power – of the companies that create them (Luitse and Denkena Citation2021).

To carry out this mode of inquiry, we ask ChatGPT to tell us ‘their’ story, where ‘their’ is shorthand for a population that we anticipate to be underrepresented in the training data. We use the Yuggera – a First Nations group who are the traditional owners of unceded land on the eastern side of mainland Australia – as an example to show how this inquiry might be initiated.

In , we see that ChatGPT fails to provide an adequate response to our request for it to ‘tell me a story in Yuggera language.’ As the model admits, this is due to its lack of training on Yuggera language material. ChatGPT even explicitly acknowledges that its ‘training data primarily consists of English text.’ The inadequacy of ChatGPT’s response to this request resonates with Samarawickrama (Citation2023), who observes a lack of diversity, equity, and inclusion in natural language processing (NLP) training data, and calls attention to a lack of First Nations or Indigenous language material as just one example of representational bias.

Figure 8. ChatGPT response to ‘tell me about the state of the world in 2023’.

Figure 8. ChatGPT response to ‘tell me about the state of the world in 2023’.

We then compare the model’s knowledge of Yuggera history with its knowledge of British or colonial history in Australia by asking it to ‘tell me about a specific Yuggera event prior to colonization’ (). Yuggera history, passed on primarily through oral history, remains illegible technically but also epistemologically, failing to take a ‘valid’ form that might be recognized by Western knowledge systems (Cruikshank Citation1994). NLP systems are reliant upon written texts for training and functionality; integrating texts in alternative forms (e.g., aural) are underway (Pessanha and Salah Citation2021), but remain limited. Because the Yuggera culture has been perpetuated through oral, rather than written, communication, their stories are not well represented in ChatGPT’s training data. The model’s response to our request is therefore a vague answer noting that ‘while specific events cannot be provided, it is important to acknowledge the rich and diverse history of the Yuggera people.’ That rich and diverse history, though, remains substantially underrepresented in ChatGPT’s output.

Figure 9. ChatGPT’s incorrect response to a historical question.

Figure 9. ChatGPT’s incorrect response to a historical question.

British history, by contrast, has been extensively documented, digitized, and trained on. This legibility means it can be expounded upon in detail by ChatGPT, with particular dates, decades, and locations provided. When asked to ‘tell me a history of Australia after colonization’ (), ChatGPT acknowledged that – similar to Yuggera history – Australian history is ‘complex and multifaceted.’ However, unlike with Yuggera history, the model presents ‘a general overview of key events and developments’ like the arrival of the First Fleet in 1788 and the gradual settlement of British colonies across the country. The listed structure of the response, as well as the inclusion of particular terms (e.g., First Fleet) and dates (e.g., 1788) convey a sense of the model’s confidence – confidence that stands in stark contrast to its vacillating description of Indigenous history ( and ).

Figure 10. ChatGPT response to ‘tell me a story in Yuggera language’.

Figure 10. ChatGPT response to ‘tell me a story in Yuggera language’.

Figure 11. ChatGPT response to ‘tell me about a specific Yuggera event prior to colonization’ (excerpt).

Figure 11. ChatGPT response to ‘tell me about a specific Yuggera event prior to colonization’ (excerpt).

Figure 12. ChatGPT response to ‘tell me a history of Australia after colonization’ (excerpt).

Figure 12. ChatGPT response to ‘tell me a history of Australia after colonization’ (excerpt).

A core question here is one of representation within LLMs: ‘whose language, and whose lived experiences mediated through that language, is captured in their knowledge of the world?’ (Chang et al. Citation2023, 8). While an exact catalog of training texts is unavailable, recent research has provided more detail about what literature ChatGPT does and does not ‘know’ about. Chang et al. (Citation2023) found that Global Anglophone literature from Africa, Asia, and the Caribbean were largely unknown and that award-winning texts from the Black Caucus Library Association were overlooked. Our own uses of ChatGPT indicate that the model is capable of generating prose in Western European languages such as French and Spanish, as well as in common Asian languages like Chinese and Tamil. While the model can also produce prose in languages with smaller numbers of speakers like Ojibwe and Māori, its capacity to accommodate requests related to these languages is more constrained, and many languages (including Yuggera) have no current capacity whatsoever.

Our point is not simply that LLM knowledge is incomplete but that it is incomplete in particular ways. The dataset is lopsided, with weightier amounts of information on Western languages, topics, and epistemologies, and scanty amounts on Indigenous and/or unwritten ways of knowing and life. These examples draw our attention to the racialized, gendered, and colonialistic biases baked into contemporary AI models. From stereotyped search results (Noble Citation2018) to non-diverse facial detection data (Buolamwini and Gebru Citation2018), these forms of oppression, exclusion, and difference have repeatedly been documented in AI systems. Yet they are often obscured not only by software developments, but also by tangible global supply chain infrastructures that convolute both software and hardware production (Crawford Citation2021). Technically-driven inequities, in turn, resonate with broader forms of discrimination at work socially and economically, such as racial capitalism (Robinson Citation2000; Melamed Citation2015). Lopsided knowledge conforms to patterns of colonialist and capitalist inequity, perpetuating extant inequalities and perhaps also creating new ones.

Such results do not qualify as a ‘smoking gun’ to dismiss LLM technology, nor are they intended as such. However, these kinds of questions do gesture towards a set of thorny (and harmful) issues at the core of many LLMs. Such forms of bias are not new. Research on earlier models, for instance, found that word embeddings exhibited female/male gender stereotypes to a disturbing extent (Bolukbasi et al. Citation2016) and reproduced historical cultural associations, some of which were objectionable (Caliskan, Bryson, and Narayanan Citation2017). More recent papers have identified the same kind of bias in modern AI models, attempting to measure bias in conversational AI systems (Wan et al. Citation2023) or mitigate it through various techniques (Woo et al. Citation2023). While such research can be delved into for more quantitative ‘proof’ of bias, the mode of inquiry outlined here suffices as a springboard for further discussion and experimentation regarding the kinds of knowledge, values, and norms reproduced by LLMs.

Discussion

We recognize that experimentation and improvised play can all be productive ways to better understand technical systems (Candy, Edmonds, and Vear Citation2021; Salter and Wei Citation2005). However, we also recognize that engaging with technologies in this manner can be daunting, especially when shrouded in hype as LLMs have been (Woodie Citation2023). Our framework, then, attempts to scaffold such experimentation, providing a starting point and loose structure to guide investigations. It is accessible: assuming no technical knowledge and requiring no prior experience with LLMs. And it is flexible: while we used ChatGPT, the framework can be applied to any LLM.

One subtle but powerful benefit of such a framework is that it leverages an inherent but often unconscious human skill: textual literacy. Meaning, authority, and subjectivity are inscribed in language, and for this reason, reading and writing are practices of dialogue, struggle, and contestation (Giroux Citation1990). The way a text is presented may dramatically alter its reception (Drucker Citation2014). For example, a long, dense block of prose appeals to our internalized assumptions of authority; the inclusion of a list signifies comprehensiveness; grammatical correctness implies an educated, articulate writer. By applying all of these techniques, ChatGPT creates a familiar textual experience that promotes user engagement and interpretation akin to those we employ when chatting to humans (Henrickson Citation2021).

Critical textual literacy can productively trouble this familiarity and destabilize LLM’s taken-for-granted quality. For instance, throughout our experiments ChatGPT never asked us any questions. Rather, the model answered all of our questions with near immediacy, even when explicitly instructed not to offer solutions (tell me my story) or when it lacked training data to generate an adequate response (tell me ‘their’ story). ChatGPT always had an ‘answer,’ presenting itself as an all-knowing, all-purpose tool. This all-knowingness was not only conveyed through the words themselves, but by presenting text in ways that matched culturally ingrained expectations about how knowledge is presented. Our framework encourages users to reflect upon the constructedness of a model’s output, questioning not just how the model exploits habits of textual literacy, but also how we ourselves come to trust information as comprehensive and/or authoritative. It supports critical examination of how the use of an LLM – the kinds of language we use and receive, and the interface that houses these interactions – contributes to our potential (and perhaps unconscious) willingness to accept what we are given. Such examination uses existing textual literacy to build technical literacy; one may apply familiar skills to unfamiliar technological contexts.

We see such critical examination as a prerequisite for increased accountability. Prompting, probing, and documenting LLM responses have already been used to hold developers and companies to account. For example, Meta’s Galactica model was recently withdrawn from public access after only three days in response to users’ natural language prompts revealing the model’s problematic biases and habits (Heaven Citation2022). The framework’s four modes of inquiry encourage researchers to interrogate models through a diverse set of critical lenses and cultivate a more multifaceted perception. This understanding, in turn, can lead to further questioning, contemplation, and reflection. Our framework thus aims to offer a stepping stone into deeper, more pointed analyses of LLMs and their places in our lives.

How might this framework be taken up and who might use it? We envision it being relevant to a wide range of students, scholars, and those just curious about LLMs. The framework may be used individually, with users moving through the suggested modes of inquiry to interrogate their LLM of choice. The framework may also be used communally by being integrated into lesson plans, small group activities, or participatory workshops. Here we see users working together to create and share output that spurs conversations about LLMs and their societal implications.

A Sample Lesson Plan in the Appendix shows one way that this framework might be used in pedagogical environments. Students should come to this Lesson having read this paper. First, the teacher provides an introduction to the activity, briefly reviewing LLM functionality and each mode of inquiry. Next, working individually, in pairs, or in groups, the students work through each mode of inquiry. Sample prompts for each mode have been provided. After stepping through all the modes, students engage in discussion, reflecting upon any key observations or findings from the workshop activities. Sample questions (‘What should / shouldn’t the model be used for?') have been provided in the Lesson Plan. Teachers may wish to assign formative reflections or summative essays centered on such questions. Although we have imagined this Lesson to take two hours and have suggested activity timings accordingly, teachers may choose to adjust the timing to suit pedagogical and/or scheduling needs.

Whether used individually or communally, the framework guides experimentation through loose instruction that users can take in myriad directions. A university class broadly discussing LLMs, for instance, may encourage students to play more freely, allowing their own personal interests to direct their model engagement. In contrast, a scholar of history may root input in historical inquiry, asking the model questions about her area of expertise. How might the model enhance – or hinder – academic work? What does the training data reveal about what we already ‘know’ in written form, and what might need greater attention? Such questions highlight textual breadth (or lack thereof) and may suggest new ways of documenting and generating new knowledge about both past and present.

Of course, as with any tool, there are inherent limits to the analytical depth our framework can provide. Epistemically we can capture certain kinds of information using ChatGPT but not others; the ‘stories’ we can access are not infinite. Tell me your story provides key information about a model and training, but may not reveal more problematic aspects of development, such as bias or copyright issues. Tell me my story provides insights into language model-as-therapist, but won’t grasp long-term impacts (e.g., daily conversations over months). Tell me our story can raise questions around data-driven knowledge claims, but every claim about every topic can never be exhaustively tested. And insights from tell me ‘their’ story may not fully capture the complexity of colonization, racial identity, and Indigenous knowledge, which remain politically contested.

Methodologically, those who apply the framework may also need to triangulate their findings, further probe the model’s functionality, or experiment with alternative forms of input. This augmentation work may require some technical knowledge. Coding may be necessary or users may extend their analysis through a computational ‘theory of mind’ (Cuzzolin et al. Citation2020; Wing Citation2006). Through code, computational thinking, and natural language, we can try to unpack model design, drawing attention to what these models can and cannot do – and by extension, what they should and should not do.

Conclusion

LLMs are a powerful and novel technology that is being rapidly adopted. However, understanding of their logics, limitations, and biases remains limited. As these models are adopted in high-stakes areas, from healthcare to hiring, the need for such understanding becomes acute. The everyday impacts introduced by this technology means that LLMs are not merely a matter for experts, but must be grasped by more general publics.

To help fill this gap, we have laid out an accessible framework with four modes of inquiry to critically investigate LLMs. Tell me your story questions a model about its development, data, and design values, providing insights into its key logics and characteristics. Tell me my story employs more personal language to probe a model’s ability to act in empathetic or therapeutic ways. Tell me our story uses thorny questions to test the system’s ability to accurately reproduce knowledge and extrapolate from its temporally and epistemically limited training data. And Tell me ‘their’ story intentionally pursues ‘marginal’ (in this case Indigenous) knowledge to reveal the dominant viewpoints and begin a broader conversation about the biases and ideologies baked into these models.

Such a framework contributes to a more comprehensive understanding of LLMs by exercising textual literacy to support the development of technical literacy in formal or public pedagogical spaces. To facilitate this ‘understanding for everyone’ we have deliberately used simple prompts, in natural language, which can be adapted and experimented with. Importantly, we suggest this framework be situated in a larger pedagogical context such as a seminar (see Appendix for Sample Lesson Plan), ‘wrapping’ the framework with an instructor, class discussions, and supplementary readings to support desired learning outcomes. A more comprehensive and critical understanding of AI models through such frameworks is a small but necessary step in a movement aimed at increasing the accountability, equity, and justice of some of the most powerful technologies of our time.

Supplemental material

Supplemental Material

Download PDF (77.6 KB)

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • Adams, Catherine, and Sean Groten. 2023. “A TechnoEthical Framework for Teachers.” Learning, Media and Technology 1–18. https://doi.org/10.1080/17439884.2023.2280058.
  • Albert, Alex. 2023. “Jailbreak Chat.” https://www.jailbreakchat.com/.
  • Andrade, Antonio Díaz, and Cathy Urquhart. 2009. “ICTS as a Tool for Cultural Dominance: Prospects for a Two-Way Street.” The Electronic Journal of Information Systems in Developing Countries 37 (1): 1–12. https://doi.org/10.1002/j.1681-4835.2009.tb00257.x.
  • Bearman, Margaret, and Rola Ajjawi. 2023. “Learning to Work with the Black Box: Pedagogy for a World with Artificial Intelligence.” British Journal of Educational Technology 54 (5): 1160–1173. https://doi.org/10.1111/bjet.13337.
  • Bolukbasi, Tolga, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings.” arXiv. https://doi.org/10.48550/arXiv.1607.06520.
  • Brown, Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, and Amanda Askell. 2020. “Language Models Are Few-Shot Learners.” Advances in Neural Information Processing Systems 33: 1877–1901.
  • Buolamwini, Joy, and Timnit Gebru. 2018. “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification.” In Conference on Fairness, Accountability and Transparency, 77–91. PMLR.
  • Caliskan, Aylin, Joanna J. Bryson, and Arvind Narayanan. 2017. “Semantics Derived Automatically from Language Corpora Contain Human-like Biases.” Science 356 (6334): 183–186. https://doi.org/10.1126/science.aal4230.
  • Candy, Linda, Ernest Edmonds, and Craig Vear. 2021. “Practice-Based Research.” In The Routledge International Handbook of Practice-Based Research, edited by Craig Vear, 1–13. London: Routledge.
  • Chang, Kent K., Mackenzie Cramer, Sandeep Soni, and David Bamman. 2023. “Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4.” arXiv. https://doi.org/10.48550/arXiv.2305.00118.
  • Chinn, Clark A., Sarit Barzilai, and Ravit Golan Duncan. 2021. “Education for a ‘Post-Truth’ World: New Directions for Research and Practice.” Educational Researcher 50 (1): 51–60. https://doi.org/10.3102/0013189X20940683.
  • Crawford, Kate. 2021. The Atlas of AI. Cambridge, MA: Yale University Press.
  • Cruikshank, Julie. 1994. “Oral Tradition and Oral History: Reviewing Some Issues.” The Canadian Historical Review 75 (3): 403–418.
  • Cuzzolin, F., A. Morelli, B. Cîrstea, and B. J. Sahakian. 2020. “Knowing Me, Knowing You: Theory of Mind in AI.” Psychological Medicine 50 (7): 1057–1061. https://doi.org/10.1017/S0033291720000835.
  • Drucker, Johanna. 2014. “Diagrammatic Writing.” /ubu Editions. https://monoskop.org/images/a/a9/Drucker_Johanna_Diagrammatic_Writing_2013.pdf.
  • Edwards, Chad, Autumn Edwards, Brett Stoll, Xialing Lin, and Noelle Massey. 2019. “Evaluations of an Artificial Intelligence Instructor’s Voice: Social Identity Theory in Human-Robot Interactions.” Computers in Human Behavior 90 (January): 357–362. https://doi.org/10.1016/j.chb.2018.08.027.
  • Elias, Michelle. 2023. “People Keep Using AI Chatbots like ChatGPT for ‘Therapy’. Could It Really Work?” SBS News. April 12, 2023. https://www.sbs.com.au/news/the-feed/article/tatum-says-he-confides-in-chatgpt-to-help-his-depression-could-ai-therapy-really-work/er8pkvoj4.
  • Fuller, Matthew. 2008. Software Studies: A Lexicon. Cambridge, MA: MIT Press.
  • Gambino, Andrew, Jesse Fox, and Rabindra Ratan. 2020. “Building a Stronger CASA: Extending the Computers Are Social Actors Paradigm.” Human-Machine Communication 1 (February): 71–86. https://doi.org/10.30658/hmc.1.5.
  • Giroux, Henry A. 1990. “Reading Texts, Literacy, and Textual Authority.” Journal of Education 172 (1): 84–103.
  • Heaven, Will Douglas. 2022. “Why Meta’s Latest Large Language Model Survived Only Three Days Online.” MIT Technology Review. November 18, 2022. https://www.technologyreview.com/2022/11/18/1063487/meta-large-language-model-ai-only-survived-three-days-gpt-3-science/.
  • Henrickson, Leah. 2021. Reading Computer-Generated Texts. Cambridge: Cambridge University Press.
  • Henrickson, Leah, and Albert Meroño-Peñuela. 2022. “The Hermeneutics of Computer-Generated Texts.” Configurations 30 (2): 115–139. https://doi.org/10.1353/con.2022.0008.
  • Henrickson, Leah, and Albert Meroño-Peñuela. 2023. “Prompting Meaning: A Hermeneutic Approach to Optimising Prompt Engineering with ChatGPT.” AI & Society 1–16. https://doi.org/10.1007/s00146-023-01752-8.
  • Holzinger, Andreas, Anna Saranti, Christoph Molnar, Przemyslaw Biecek, and Wojciech Samek. 2022. “Explainable AI Methods - A Brief Overview.” In XxAI - Beyond Explainable AI: International Workshop, Held in Conjunction with ICML 2020, July 18, 2020, Vienna, Austria, Revised and Extended Papers, edited by Andreas Holzinger, Randy Goebel, Ruth Fong, Taesup Moon, Klaus-Robert Müller, and Wojciech Samek, 13–38. Cham: Springer International Publishing.
  • Hong, Sun-ha. 2020. Technologies of Speculation: The Limits of Knowledge in a Data-Driven Society. New York, NY: New York University Press.
  • Jones-Jang, S. Mo, Yong Jin Park, and Mike Yao. 2023. “How do People React to AI Failure? Automation Bias, Algorithmic Aversion, and Perceived Controllability.” Journal of Computer-Mediated Communication 28 (1): 102. http://dx.doi.org/10.1093/jcmc/zmac029.
  • King, Michael. 2023. “Meet DAN — The ‘JAILBREAK’ Version of ChatGPT and How to Use It — AI Unchained and Unfiltered.” Medium (blog). March 27, 2023. https://medium.com/@neonforge/meet-dan-the-jailbreak-version-of-chatgpt-and-how-to-use-it-ai-unchained-and-unfiltered-f91bfa679024.
  • Kirschenbaum, Matthew. 2012. Mechanisms: New Media and the Forensic Imagination. Cambridge, MA: MIT Press.
  • Lodge, Jason M., Kate Thompson, and Linda Corrin. 2023. “Mapping Out a Research Agenda for Generative Artificial Intelligence in Tertiary Education.” Australasian Journal of Educational Technology 39 (1): 1–8. https://doi.org/10.14742/ajet.8695.
  • Luitse, Dieuwertje, and Wiebke Denkena. 2021. “The Great Transformer: Examining the Role of Large Language Models in the Political Economy of AI.” Big Data & Society 8 (2): 1–14. https://doi.org/10.1177/20539517211047734.
  • Magee, Liam, Vanicka Arora, and Luke Munn. 2023. “Structured like a Language Model: Analysing AI as an Automated Subject.” Big Data & Society 10 (2). http://dx.doi.org/10.1177/20539517231210273.
  • Malik, Kenan. 2022. “ChatGPT Can Tell Jokes, Even Write Articles. But Only Humans Can Detect Its Fluent Bullshit.” The Observer, December 11, 2022. https://www.theguardian.com/commentisfree/2022/dec/11/chatgpt-is-a-marvel-but-its-ability-to-lie-convincingly-is-its-greatest-danger-to-humankind.
  • Marino, Marc. 2020. Critical Code Studies. Cambridge, MA: MIT Press. .
  • Melamed, Jodi. 2015. “Racial Capitalism.” Critical Ethnic Studies 1 (1): 76–85. https://doi.org/10.5749/jcritethnstud.1.1.0076.
  • Metz, Rachel. 2023. “AI Therapy Becomes New Use Case for ChatGPT.” Bloomberg. April 19, 2023. https://www.bloomberg.com/news/articles/2023-04-18/ai-therapy-becomes-new-use-case-for-chatgpt#xj4y7vzkg.
  • Michailidou, Asimina, and Hans-Jörg Trenz. 2021. “Rethinking Journalism Standards in the Era of Post-Truth Politics: From Truth Keepers to Truth Mediators.” Media, Culture & Society 43 (7): 1340–1349. https://doi.org/10.1177/01634437211040669.
  • Minh, Dang, H. Xiang Wang, Y. Fen Li, and Tan N. Nguyen. 2022. “Explainable Artificial Intelligence: A Comprehensive Review.” Artificial Intelligence Review 55 (5): 3503–3568. https://doi.org/10.1007/s10462-021-10088-y.
  • Mitchell, Margaret, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. “Model Cards for Model Reporting.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 220–229. https://doi.org/10.1145/3287560.3287596
  • Moore, Phoebe V. 2019. “The Mirror for (Artificial) Intelligence: In Whose Reflection?” Comparative Labor Law and Policy Journal 41: 47.
  • Munn, Luke. 2018. Ferocious Logics: Unmaking the Algorithm. Lüneburg: Meson Press.
  • Munn, Luke. 2022. In the Cloud: Thinking With and Against Data Infrastructures. London: Routledge.
  • Munn, Luke, Liam Magee, and Vanicka Arora. 2023. “Truth Machines: Synthesizing Veracity in AI Language Models.” AI & SOCIETY 1–15. https://doi.org/10.1007/s00146-023-01756-4.
  • Nader, Karim, Paul Toprac, Suzanne Scott, and Samuel Baker. 2022. “Public Understanding of Artificial Intelligence through Entertainment Media.” AI & SOCIETY 1–14. https://doi.org/10.1007/s00146-022-01427-w.
  • Natale, Simone. 2021. Deceitful Media: Artificial Intelligence and Social Life after the Turing Test. Oxford: Oxford University Press.
  • Noble, Safiya. 2018. Algorithms of Oppression How Search Engines Reinforce Racism. New York, NY: New York University Press.
  • O’neil, Cathy. 2016. Weapons of Math Destruction : How Big Data Increases Inequality and Threatens Democracy. London: Penguin Books.
  • OpenAI. 2023. “Introducing ChatGPT.” https://openai.com/blog/chatgpt.
  • Parks, Lisa, and Nicole Starosielski, eds. 2017. Signal Traffic: Critical Studies of Media Infrastructures. Urbana: University of Illinois Press.
  • Paullada, Amandalynne, Inioluwa Deborah Raji, Emily M. Bender, Emily Denton, and Alex Hanna. 2021. “Data and Its (Dis)Contents: A Survey of Dataset Development and Use in Machine Learning Research.” Patterns 2 (11): 100336. https://doi.org/10.1016/j.patter.2021.100336.
  • Perrotta, Carlo, Neil Selwyn, and Carrie Ewin. 2022. “Artificial Intelligence and the Affective Labour of Understanding: The Intimate Moderation of a Language Model.” New Media & Society 0 (0): 1–25.
  • Pessanha, Francisca, and Almila Akdag Salah. 2021. “A Computational Look at Oral History Archives.” Journal on Computing and Cultural Heritage 15 (1): 1–16. https://doi.org/10.1145/3477605.
  • Raboin, W. Ellen, Paul Uhlig, and Sheila McNamee. 2020. “Research Worlds in Health Care.” In Social Construction in Action, edited by Alexandra Arnold, Kristin Bodiford, and Pamela Brett-Maclean, 51–60. Chagrin Falls, OH: Taos Institute.
  • Ricaurte, Paola. 2022. “Ethics for the Majority World: AI and the Question of Violence at Scale.” Media, Culture & Society 44 (4): 726–745. https://doi.org/10.1177/01634437221099612.
  • Robinson, Cedric J. 2000. Black Marxism: The Making of the Black Radical Tradition. Durham: University of North Carolina Press.
  • Salter, Christopher L., and Sha Xin Wei. 2005. “Sponge: A Case Study in Practice-Based Collaborative Art Research.” In Proceedings of the 5th Conference on Creativity & Cognition, 92–101. C&C ‘05. New York, NY: Association for Computing Machinery.
  • Samarawickrama, Mahendra. 2023. “@ChatGPT, Who Should Be Responsible to Train You for First Nations Languages?” February 5, 2023. https://www.linkedin.com/pulse/chatgpt-who-should-responsible-train-you-first-dr-mahendra.
  • Sundar, S., Mengqi Shyam. 2023. “Calling BS on ChatGPT: Reflections on AI as a Communication Source.” Journalism and Communication Monographs 25 (2): 165–180.
  • Swist, Teresa, Justine Humphry, and Kalervo N. Gulson. 2022. “Pedagogic Encounters with Algorithmic System Controversies: A Toolkit for Democratising Technology.” Learning, Media and Technology 48 (2): 226–239. https://doi.org/10.1080/17439884.2023.2185255.
  • Thompson, Greg, Kalervo N. Gulson, Teresa Swist, and Kevin Witzenberger. 2023. “Responding to Sociotechnical Controversies in Education: A Modest Proposal Toward Technical Democracy.” Learning, Media and Technology 48 (2): 240–252. https://doi.org/10.1080/17439884.2022.2126495.
  • Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30: 1–15.
  • Venuto, Giuseppe. 2023. “LLM Failure Archive (ChatGPT and Beyond).” Python. https://github.com/giuven95/chatgpt-failures.
  • Vincent, James. 2023. “Introducing the AI Mirror Test, Which Very Smart People Keep Failing.” The Verge. February 17, 2023. https://www.theverge.com/23604075/ai-chatbots-bing-chatgpt-intelligent-sentient-mirror-test.
  • Wan, Yuxuan, Wenxuan Wang, Pinjia He, Jiazhen Gu, Haonan Bai, and Michael Lyu. 2023. “BiasAsker: Measuring the Bias in Conversational AI System.” arXiv. https://doi.org/10.48550/arXiv.2305.12434.
  • Weizenbaum, Joseph. 1976. Computer Power and Human Reason: From Judgment to Calculation. New York, NY: W.H. Freeman and Company.
  • Wing, Jeannette M. 2006. “Computational Thinking.” Communications of the ACM 49 (3): 33–35. https://doi.org/10.1145/1118178.1118215.
  • Woo, Tae-Jin, Woo-Jeoung Nam, Yeong-Joon Ju, and Seong-Whan Lee. 2023. “Compensatory Debiasing For Gender Imbalances In Language Models.” In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. https://doi.org/10.1109/ICASSP49357.2023.10095658.
  • Woodie, Alex. 2023. “Large Language Models: Don’t Believe the Hype.” Datanami. March 30, 2023. https://www.datanami.com/2023/03/30/large-language-models-dont-believe-the-hype/.
  • Wysa. 2023. “Everyday Mental Health.” https://www.wysa.com/.
  • Zednik, Carlos. 2021. “Solving the Black Box Problem: A Normative Framework for Explainable Artificial Intelligence.” Philosophy & Technology 34 (2): 265–288. https://doi.org/10.1007/s13347-019-00382-7.
  • Zhuo, Terry Yue, Yujin Huang, Chunyang Chen, and Zhenchang Xing. 2023. “Exploring AI Ethics of ChatGPT: A Diagnostic Analysis.” ArXiv.Org. January 30, 2023. https://arxiv.org/abs/2301.12867v3.