Search in:

Accountability in Research

Ethics, Integrity and Policy

Using AI to write scholarly publications

Mohammad Hosseinia Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA

https://orcid.org/0000-0002-2385-985X View further author information

Lisa M. Rasmussenb Department of Philosophy, University of North Carolina, Charlotte, North Carolina, USAView further author information

David B. Resnikc National Institute of Environmental Health Sciences, Durham, North Carolina, USACorrespondence[email protected]
View further author information

Received 11 Jan 2023, Accepted 11 Jan 2023, Published online: 25 Jan 2023

Cite this article
https://doi.org/10.1080/08989621.2023.2168535
CrossMark

In this article

Draft policy
Acknowledgements
Disclosure statement
Additional information
Footnotes
References

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Artificial intelligence (AI) natural language processing (NLP) systems, such as OpenAI’s generative pre-trained transformer (GPT) model (https://openai.com) or Meta’s Galactica (https://galactica.org/) may soon be widely used in many forms of writing, including scientific and scholarly publications (Heaven Citation2022).Footnote¹ While computer programs (such as Microsoft WORD and Grammarly) have incorporated automated text-editing features (such as checking for spelling and grammar) for many years, these programs are not designed to create content. However, new and emerging NLP systems are, which raises important issues for research ethics and research integrity.Footnote²

NLP is a way of enabling computers to interact with human language. A key step in NLP, known as tokenization, involves converting unstructured text into structured text suitable for computation. For example, the sentence “The cat sat on the mat” can be structured by tagging its parts: “the [article] cat [noun] sat [verb, past tense] on [preposition] the [article] mat [noun].” Once the parts of the text have been tagged, they can be processed by means of algorithms designed to produce appropriate responses to text (i.e., language generation). Rudimentary NLP-systems, such as the first generation of chatbots that assisted customers on websites, operated according to thousands of human-written rules for processing and generating text.

Recent advances in computational speed and capacity and the development of machine-learning (ML) algorithms, such as neural networks, have led to tremendous breakthroughs in NLP (Mitchell Citation2020). Today’s NLP systems use ML to produce and refine statistical models (with billions of parameters) for processing and generating natural language. NLP systems are trained on huge databases (45 terabytes or more) of text available on the internet or other sources. Initial training (or supervised learning) involves giving the system the text and then “rewarding” it for giving correct outputs, as determined by human trainers.Footnote³ Over time, NLP systems will reduce their percentage of erroneous outputs and will learn from the data (Mitchell Citation2020). While NLP systems continue to learn as they receive and process data beyond their initial training data, they do not “know” the meaning or truth-value of the text they receive, process, and generate. Their function is simply to generate understandable (i.e., grammatically correct) and appropriate (i.e., highly probable) text outputs in response to text inputs.

That said, NLP systems can generate coherent and informative text, ranging from a few sentences or paragraphs to an entire essay in response to specific prompts from the user, such as the topic, length, or writing style. For example, in response to the query, “Write a 300 word piece about the difference between research ethics and research integrity,” the OpenAI chatbot generated the following 356 words:

Research ethics and research integrity are two concepts that are closely related but not identical. Both are concerned with ensuring that research is conducted in a responsible and accountable manner, but they focus on different aspects of the research process.

Research ethics refers to the moral principles that guide the conduct of research. These principles include respect for human subjects, integrity in the research process, and the responsible use of research results. In other words, research ethics is concerned with the moral aspects of research and how it impacts those involved in the research and the broader community.

Research integrity, on the other hand, is concerned with the honesty and reliability of the research process. This includes ensuring that research is conducted in a transparent and unbiased manner, that data is accurately collected and analyzed, and that research findings are reported honestly and without distortion.

One key difference between research ethics and research integrity is that research ethics focuses on the moral principles that should guide research, whereas research integrity focuses on the specific practices and processes that ensure the reliability and honesty of research. Another difference is that research ethics is concerned with the impact of research on individuals and society, whereas research integrity is primarily concerned with the reliability and trustworthiness of the research itself.

Despite these differences, research ethics and research integrity are closely related and often overlap in practice. For example, the principle of respect for human subjects in research ethics is closely related to the need for transparency and honesty in the research process, which is a key aspect of research integrity. Similarly, the responsible use of research results, which is a key principle of research ethics, is closely tied to the reliability and trustworthiness of the research, which is an essential aspect of research integrity.

Overall, research ethics and research integrity are both important for ensuring that research is conducted in a responsible and accountable manner. While they focus on different aspects of the research process, they are closely related and must be considered together to ensure that research is conducted in a way that is ethical, reliable, and trustworthy

(OpenAI ChatGPT Citation2022a).

This response is remarkably good. However, the OpenAI chatbot can also produce answers that contain erroneous, misleading, or irrelevant information. For example, in response to the query: “Did Immanuel Kant believe that ought implies can?” the OpenAI chatbot wrote:

Immanuel Kant did not believe that ought implies can. In fact, he argued that the opposite was true: he believed that can implies ought. This means that, according to Kant, if someone is able to do something, then they have a moral obligation to do it. In other words, if someone has the ability to do something good or moral, then they ought to do it

(OpenAI ChatGPT Citation2022b).

The OpenAI chatbot gave a completely incorrect answer to this query, because Kant explicitly said that ought implies can (Kohl Citation2015). The converse of ought implies can, proposed by the OpenAI chatbot, could lead to disastrous outcomes. Just because I can cheat on an exam, steal money, or kill someone, does not mean that I should do these things.

In response to the query, “Why do some ethicists and researchers believe that there should be upper limits on the risks that healthy adults can be exposed to in research that does not provide them a direct medical benefit?” the OpenAI chatbot discussed how principles of beneficence and respect for persons apply to the ethics of research and then concluded:

Overall, some ethicists and researchers believe that there should be upper limits on the risks that healthy adults can be exposed to in research that does not provide them a direct medical benefit because they believe that these individuals should not be subjected to unnecessary or excessive risks without their informed consent

(OpenAI ChatGPT2022c).

The problem with this answer is not that it is incorrect but that it is irrelevant to the main issue, i.e., whether there should be ethical limits to risks that healthy, freely consenting volunteers can be exposed to in research. Some have argued that limits on risks can be justified by appealing to strong paternalism or the need to safeguard public trust in research (Resnik Citation2012).

NLP systems raise some very interesting philosophical problems: Are they intelligent, and what does this mean in terms of human intelligence? Can they think? Do they have moral agency? Furthermore, NLP systems might help researchers in rewriting manuscripts, which would be especially useful for non-native (English) speakers. However, these uses of NLP would challenge our current understanding of originality and/or the author’s intellectual contribution to the task of writing. These are important questions for philosophers, computer scientists, and sociologists of science to ponder, but we will not address them here. Our concerns in this editorial are more practical.

First, using NLP systems raises issues related to accuracy, bias, relevance, and reasoning. As illustrated by the examples described above, these systems are impressive but can still make glaring mistakes (Heaven Citation2022). Galactica developers warn that their language models can “Hallucinate,” “are Frequency-Biased” and “are often Confident But Wrong” (Galactica Citation2022; Heaven Citation2022). These flaws could be due to the fact that NLP systems only deal with statistical relationships among words and not relationships between language and the external world, which can lead them to make errors related to facts and commonsense reasoning (AI Perspectives Citation2020). Another well-known problem with many AI/ML systems, including NLP systems, is the potential for bias, because AI systems will reflect biases in the data they are trained on (Lexalytics Citation2022). For example, AI systems trained on data that includes racial, gender, or other biases will generate outputs that reproduce or even amplify those biases. NLP systems are also not very good at solving some mathematics problems (Lametti Citation2022) or evaluating text for relevance and coherence, and they may inadvertently plagiarize (AI Content Dojo Citation2021; Venture Beat Citation2021).

While NLP systems are likely to become better at minimizing bias, doing math, making relevant connections between concepts, and avoiding plagiarism, they are likely to continue to make factual and commonsense reasoning mistakes because they do not (yet) have the type of cognition or perception needed to understand language and its relationship to the external physical, biological, and social world. NLP systems can perform well when working with text already created or curated by humans, but can perform (dangerously) poorly when they lack human-generated data related to a topic and try to piece together text from different sources. Thus, any section of a manuscript written by an NLP system should be checked by a domain expert for accuracy, bias, relevance, and reasoning.

Second, use of NLP systems raises issues of accountability. If a section of a manuscript written by an NLP system contains errors or biases, coauthors need to be held accountable for its accuracy, cogency, and integrity. While it is tempting to assign blame to the NLP systems and/or their developers for textual inaccuracies and biases, we believe that authors are ultimately responsible for the text generated by NLP systems and must be held accountable for inaccuracies, fallacies, or any other problems in manuscripts. We take this position because 1) NLP systems respond to prompts provided by researchers and do not proactively generate text; 2) authors can juxtapose text generated by an NLP system with other text (e.g., their own writing) or simply revise or paraphrase the generated text; and 3) authors will take credit for the text in any case. Researchers who use these NLP systems to write text for their manuscripts must therefore check the text for factual and citation accuracy; bias; mathematical, logical, and commonsense reasoning; relevance; and originality. If NLP systems write in English and authors have limited English proficiency, someone who is fluent in English must help them spot mistakes. If an NLP system makes a mistake (of omission or commission), authors need to take precautionary measures to correct it before it is published. Reviewers and editors can and should help out with catching mistakes, but they often do not have the time or resources to check every claim made in a manuscript.

Third, use of NLP systems raises issues of transparency in relation to requirements for authorship credit and contributions. Since participation in the writing process is a requirement for becoming an author according to guidelines adopted by most journals (Resnik et al. Citation2016), and widely used contributor roles taxonomies (e.g., CRediT) make clear distinctions between writing the first draft and revising it (Hosseini et al. Citation2022), use of NLP systems should be acknowledged in the text (e.g., methods section) and mentioned in the references section. Because NLP systems may be used in ways that may not be obvious to the reader, researchers should disclose their use of such systems and indicate which parts of the text were written or co-written by an NLP system. The issue here is similar to the ghost writing/contribution problem in scientific publications, except that we are not (yet) ready to say that AIs should be listed as authors on manuscripts when they make substantial contributions. Even so, transparency requires that contributions by NLP systems should be specifically disclosed so that the reader has an accurate understanding of the writing of the paper.

Fourth, use of NLP systems raises issues of data integrity for research that involves the analysis of text, such as surveys, interviews, or focus groups. It is possible to use NLP systems to fabricate transcripts of interviews or answers to open-ended questions. While it has always been possible for researchers to fabricate or falsify text, NLP systems make it much easier to do this, because they can generate narratives quickly from a few simple prompts. Since we trust that readers of Accountability in Research (AiR) understand that any form of data fabrication or falsification is unethical and is prohibited by the journal, we see no need to issue a separate policy on data fabrication or falsification related to the use of AI to write text, but we would still like to call attention to this issue and stress that researchers should not use NLP systems to fabricate empirical data or falsify existing data.

Fifth, ethical issues are not restricted to NLP-generated text only. It is possible, even likely, that researchers may employ these systems to generate an initial literature survey, find references, or synthesize ideas related to their work (e.g., https://elicit.org/), and then revise these suggestions to disguise their use (thereby making the human input look more impressive) and to prevent them from being identified by systems that detect NLP-generated content. But just as plagiarism can involve the misappropriation or theft of words or ideas, NLP-generated ideas may also affect the integrity of publications. When NLP assistance has impacted the content of a publication (even in the absence of direct use of NLP-generated text), this should be disclosed.

Finally, the issues discussed here go far beyond the use of AI to write text and impact research more generally. For a couple of decades now, researchers have used statistics programs, such as SPSS, to analyze data, and graphics programs, such as Photoshop, to process digital images. Ethical problems related to the misuse of statistics programs and digital image manipulation are well-known and have unfortunately been the subject of numerous research misconduct investigations (Gardenier and Resnik Citation2002; Rossner and Yamada Citation2004; Cromey Citation2013; Shamoo and Resnik Citation2022). Many biomedical journals have developed guidelines for using computer programs to process digital images (see Cell Press Citation2022) and the International Committee of Medical Journal Editors (Citation2023) recommends that authors disclose the use of statistical software. We think that all uses of computer programs that substantially impact the content of the manuscript should be disclosed, but we will limit our focus here to uses of programs for writing or editing text.

In light of the rapidly-evolving nature of NLPs and ethical concerns with its use in research, the Editors of Accountability in Research are planning to adopt a policy on the inclusion of text and ideas generated by such systems in submissions to the Journal. The general goals of the policy will be, at a minimum, to ensure transparency and accountability related to use of these systems, while also being practical and straightforward. A draft of such a policy, and an invitation for submissions about this draft policy and these systems in general appear below.

Draft policy

All authors submitting manuscripts to Accountability in Research must disclose and describe the use of any NLP systems in writing the manuscript text or generating ideas used in the text and accept full responsibility for the text’s factual and citation accuracy; mathematical, logical, and commonsense reasoning; and originality.

“NLP systems” are those that generate new content. For example, software that checks for spelling or offers synonyms or grammar suggestions does not generate new content per se, but NLP systems that develop new phrases, sentences, paragraphs, or citations related to specific contexts can influence the meaning, accuracy, or originality of the text, and should be disclosed.

Disclosures can be made in the methods section AND among the references, as appropriate. Authors should specify: 1) who used the system, 2) the time and date of the use, 3) the prompt(s) used to generate the text, 4) the sections(s) containing the text; and/or 5) ideas in the paper resulting from NLP use. Additionally, the text generated by NLP systems should be submitted as supplementary material. While this topic is a moving target and it may not be possible to anticipate all possible violations, an example of such a disclosure in the methods section could be: “In writing this manuscript, M.H. used OpenAI Chatbot on 9th of December 2022 at 1:21pm CST. The following prompt was used to write the introduction section: ‘Write a 300 word piece about the difference between research ethics and research integrity.’ The generated text was copied verbatim and is submitted as supplementary material.”

Accountability in Research is issuing a call for submissions focusing on the intersection of ethics, research integrity and policy related to NLP systems. We also invite commentary, exploration, and suggestions for improvements to our own policy draft above.

We encourage the editors of other journals to consider adopting policies on the use of AI in research, given the rapid and unpredictable advances in this technology. In the future, use of AI in research may raise issues of authorship, but that day has not yet arrived because today’s computing systems do not have the type of cognition, perception, agency, and awareness to be recognized as persons with authorship rights and responsibilities.

Acknowledgments

We are grateful for helpful comments from Laura Biven and Toby Schonfeld and members of the Accountability in Research editorial board.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This research was supported by the National Institute of Environmental Health Sciences (NIEHS) and the National Center for Advancing Translational Sciences (NCATS, UL1TR001422), National Institutes of Health (NIH). The funders have not played a role in the design, analysis, decision to publish, or preparation of the manuscript. This work does not represent the views of the NIEHS, NCATS, NIH, or US government

Notes

1. Blanco-González, Cabezón, Seco-González, et al. (Citation2022) have recently posted a preprint on arXiv that tests the ability of ChatGPT in writing a scientific paper. They describe how the AI program was used.

2. NLP systems also raise important issues for academic integrity in colleges and universities and K-12 education, but we will not consider those here. For more on this see Stokel-Walker (Citation2022).

3. While discussing the ethics of employing trainers, and the NLP systems' need for massive human and financial resources (for training and improvement purposes) are outside the scope of this editorial, future studies should explore these issues. For more on this see Perrigo (Citation2023).

References

AI Content Dojo. (2021, February 14). GPT-3 AI Plagiarism and Fact-Checking. Last accessed 10 January 2023. https://aicontentdojo.com/gpt-3-ai-plagiarism-and-fact-checking/.
Google Scholar
AI Perspectives. (2020, July 6). GPT3 Does Not Understand What It is Saying. Last accessed 10 January 2023. https://www.aiperspectives.com/gpt-3-does-not-understand/.
Google Scholar
Blanco-González, A., A. Cabezón, A. Seco-González, Conde-Torres, Daniel, Antelo-Riveiro, Paula, Pineiro, Angel, Garcia-Fandino, Rebeca. 2022. The Role of AI in Drug Discovery: Challenges, Opportunities, and Strategies. arXiv, December 8. Last accessed December 27, 2022. https://arxiv.org/ftp/arxiv/papers/2212/2212.08104.pdf.
Google Scholar
Cell Press. 2022. Cell Press Digital Image Guidelines. Last accessed December 15, 2022. https://www.cell.com/figureguidelines.
Google Scholar
Cromey, D. W. 2013. “Digital Images are Data: And Should Be Treated as Such.” Methods of Molecular Biology 931: 1–27. doi:10.1007/978-1-62703-056-4_1.
PubMedGoogle Scholar
Galactica. 2022. Limitations. Last accessed December 15, 2022.https://galactica.org/mission/
Google Scholar
Gardenier, J. S., and D. B. Resnik. 2002. “The Misuse of Statistics: Concepts, Tools, and a Research Agenda.” Accountability in Research 9 (2): 65–74. doi:10.1080/08989620212968.
PubMedGoogle Scholar
Heaven, W. D. 2022. (November 18). Why Meta’s Latest Large Language Model Survived Only Three Days Online. MIT Technology Review. Last accessed December 15, 2022. https://www.technologyreview.com/2022/11/18/1063487/meta-large-language-model-ai-only-survived-three-days-gpt-3-science/
Google Scholar
Hosseini, M., J. Colomb, A. O. Holcombe, B. Kern, N. A. Vasilevsky, and K. L. Holmes. 2022. “Evolution and Adoption of Contributor Role Ontologies and Taxonomies”. Learned Publishing SeptemberJanuary. 2010. Last accessed 2023. doi:10.1002/leap.1496.
Web of Science ®Google Scholar
International Committee of Medical Journal Editors. (2023). Preparing a Manuscript for Submission to a Medical Journal. Last accessed January 10, 2023. https://www.icmje.org/recommendations/browse/manuscript-preparation/preparing-for-submission.html.
Google Scholar
Kohl, M. 2015. “Kant and ‘Ought Implies Can’.” The Philosophical Quarterly 65 (261): 690–710. doi:10.1093/pq/pqv044.
Web of Science ®Google Scholar
Lametti, D. 2022, (December 7). A.I. Could Be Great for College Essays. Slate. Last accessed December 15, 2022: https://slate.com/technology/2022/12/chatgpt-college-essay-plagiarism.html.
Google Scholar
Lexalytics. (2022, December 7). Bias in AI and Machine Learning: Sources and Solutions. Last accessed December 15, 2022: https://www.lexalytics.com/blog/bias-in-ai-machine-learning/.
Google Scholar
Mitchell, M. 2020. Artificial Intelligence: A Thinking Guide for Humans. New York, NY: Picador.
Google Scholar
OpenAI chatbot. 2022b. Response to Query Made by David B Resnik, December 11, 2022, 10:48pm EST.
Google Scholar
Open AI chatbot. 2022c. Response to Query Made by David B Resnik, December 11, 2022. 9:54pm EST.
Google Scholar
OpenAI ChatGPT. 2022a. Response to Query Made by Mohammad Hosseini, December 9, 202, 1:21pm CST.
Google Scholar
Perrigo, B. (2023, January 18) Exclusive: OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic. Last accessed: 19 January 2023 https://time.com/6247678/openai-chatgpt-kenya-workers/
Google Scholar
Resnik, D. B. 2012. “Limits on Risks for Healthy Volunteers in Biomedical Research.” Theoretical Medicine and Bioethics 33 (2): 137–149. doi:10.1007/s11017-011-9201-1.
PubMed Web of Science ®Google Scholar
Resnik, D. B., A. M. Tyler, J. R. Black, and G. Kissling. 2016. “Authorship Policies of Scientific Journals: Table 1.” Journal of Medical Ethics 42 (3): 199–202. doi:10.1136/medethics-2015-103171.
PubMed Web of Science ®Google Scholar
Rossner, M., and K. M. Yamada. 2004. “What’s in a Picture? The Temptation of Image Manipulation.” The Journal of Cell Biology 166 (1): 11–15. doi:10.1083/jcb.200406019.
PubMed Web of Science ®Google Scholar
Shamoo, A. E., and D. B. Resnik. 2022. Responsible Conduct of Research. 4th ed. New York, NY: Oxford University Press.
Google Scholar
Stokel-Walker, C. 2022. “AI Bot ChatGpt Writes Smart Essays—should Academics Worry?” Nature. 10.1038/d41586-022-04397-7. December 9.
Google Scholar
Venture Beat. (2021, March 9). Researchers Find That Large Language Models Struggle with Math. Last accessed December 15, 2022: https://venturebeat.com/business/researchers-find-that-large-language-models-struggle-with-math/
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Using AI to write scholarly publications

Draft policy

Acknowledgments

Disclosure statement

References

Information for

Open access

Opportunities

Help and information

Using AI to write scholarly publications

Draft policy

Acknowledgments

Disclosure statement

Additional information

Funding

Notes

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date