From manual to machine: assessing the efficacy of large language models in content analysis: Communication Research Reports: Vol 0, No 0

ABSTRACT

This study compares the performance of Large Language Models (LLMs) and human coders in predicting relational uncertainty from textual data. Employing various LLMs (gpt-4.0-turbo, gpt-3.5-turbo, Claude 2, llama7b-v2-chat, and llama13b-v2-chat), we found that these models perform comparably to human coders, with only minor differences in Mean Squared Error (MSE) values. However, not all LLMs performed equally, underscoring the importance of model selection. Our findings highlight the potential of LLMs as a scalable tool for content analysis, but also emphasize their nuanced application based on the specific research context. The study advances the discourse on the use of LLMs in content analysis and provides insights for future research in this rapidly evolving field.

KEYWORDS:

Acknowledgments

The authors would like to thank the late Klaus Krippendorff for his earlier comments on the start of this project. “If I have seen further, it is by standing on the shoulders of Giants.”—Isaac Newton

Disclosure statement

No potential conflict of interest was reported by the author(s).

data availability

The data will be made available upon request by contacting the corresponding author

Open scholarship

This article has earned the Center for Open Science badges for Open Data and Open Materials through Open Practices Disclosure. The data and materials are openly accessible at https://colab.research.google.com/drive/1PO_hJJ6_tGPhYXaswSjuI3UHevUz-_rZ?usp=sharing and https://colab.research.google.com/drive/1PO_hJJ6_tGPhYXaswSjuI3UHevUz-_rZ?usp=sharing

Notes

1. Below is the exact instructions used for the qualitative relational uncertainty question. “In this text box, we would like you to describe the status of your relationship. For instance, it’s normal for partners to have questions about their relationships. Individuals can experience uncertainty about their own thoughts, feelings, and behaviors in their relationship. They can have questions about their partner’s thoughts, feelings, and behaviors in their relationship. They can be unsure about the nature of the relationship itself. On the other hand, there are also things that they are certain about (e.g., their commitment, their goals, feelings, etc.). Our goal on this page is to identify the issues partners are uncertain and certain about. In one paragraph (at least 200 words to receive payment), please describe your relationship status by describing the things you are uncertain and certain about.”

2. The following procedures were used for each LLM.

GPT: We utilized a Python script to access its API, as documented in our materials. This allowed for an automated and efficient process to run our data through the model.
Llama Models: Initially, we downloaded the Llama models to run them locally. However, we quickly encountered limitations due to GPU memory constraints. To overcome this, we utilized a service called Replicate, which provided access to their more powerful GPUs. This approach, while costly, enabled us to process our data using the Llama models.
Claude: Our attempt to gain access to the Claude API was unsuccessful. Nevertheless, given our relatively small sample size, we were able to manually upload our data in a csv file and provide the necessary prompt to the Claude system for classification.

GPT: We utilized a Python script to access its API, as documented in our materials. This allowed for an automated and efficient process to run our data through the model.

Llama Models: Initially, we downloaded the Llama models to run them locally. However, we quickly encountered limitations due to GPU memory constraints. To overcome this, we utilized a service called Replicate, which provided access to their more powerful GPUs. This approach, while costly, enabled us to process our data using the Llama models.

Claude: Our attempt to gain access to the Claude API was unsuccessful. Nevertheless, given our relatively small sample size, we were able to manually upload our data in a csv file and provide the necessary prompt to the Claude system for classification.

3. Data, analysis, and example prompt. It is important to note that 7 participants were removed from the previous as study as their responses to the text were found to be nonsensical and not related to relational uncertainty. http://tiny.cc/w84fwz.

Additional information

Funding

This work was supported by the University of Kentucky [Research and Creative Activities Program].

Notes on contributors

Andrew Pilny

Andrew Pilny is an Associate Professor in the Department of Communication at the University of Kentucky. Dr. Pilny’s research interests include social networks and artificial intelligence.

Kelly McAninch

Kelly McAninch is an Associate Professor in the Department of Communication. Dr. McAninch’s research interests include interpersonal communication and romantic relationships.

Amanda Slone

Amanda Stone is a doctoral candidate in the Graduate Program in Communication and a Faculty Lecturer for the School of Information Science (SIS) in the College of Communication and Information at the University of Kentucky (UK). Her research interests lie at the intersection of organizational and instructional communication, namely training and development.

Kelsey Moore

Kelsey Moore is a Lecturer in the Communication Studies at Texas A&M. Dr. Moore’s research focuses on using communication principles from instructional communication and persuasion to address real world issues in the context of the college classroom, training, and public health.

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 53.00 Add to cart

* Local tax will be added as applicable

From manual to machine: assessing the efficacy of large language models in content analysis

Notes on contributors

Andrew Pilny

Kelly McAninch

Amanda Slone

Kelsey Moore

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

From manual to machine: assessing the efficacy of large language models in content analysis

ABSTRACT

Acknowledgments

Disclosure statement

data availability

Open scholarship

Notes

Additional information

Funding

Notes on contributors

Andrew Pilny

Kelly McAninch

Amanda Slone

Kelsey Moore

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature