1,487
Views
20
CrossRef citations to date
0
Altmetric
Articles

Using Supervised Machine Learning in Automated Content Analysis: An Example Using Relational Uncertainty

, , &
Pages 287-304 | Published online: 08 Aug 2019
 

ABSTRACT

The goal of this research is to make progress towards using supervised machine learning for automated content analysis dealing with complex interpretations of text. For Step 1, two humans coded a sub-sample of online forum posts for relational uncertainty. For Step 2, we evaluated reliability, in which we trained three different classifiers to learn from those subjective human interpretations. Reliability was established when two different metrics of inter-coder reliability could not distinguish whether a human or a machine coded the text on a separate hold-out set. Finally, in Step 3 we assessed validity. To accomplish this, we administered a survey in which participants described their own relational uncertainty/certainty via text and completed a questionnaire. After classifying the text, the machine’s classifications of the participants’ text positively correlated with the subjects’ own self-reported relational uncertainty and relational satisfaction. We discuss our results in line with areas of computational communication science, content analysis, and interpersonal communication.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1. Given the popularity of trace-data, especially in communication research (Choi, Citation2018), it is important to determine if the website prohibits the use of crawling agents to collect data. Both terms of service for both websites were carefully reviewed. Neither website made explicit statements regarding robot.txt or web-scraping policies. As such, we conclude that collecting data from these two sites did not violate any of their terms of service policies.

2. IDF for any term (t) is defined by logNDFt, where N is the number of documents and DF is the number of documents that contain the term (t). The transformation process is called TF-IDF weighting: TFIDFt,d=TFt,dXIDFt, where it assigns a higher weight to a term (t) in a document (d) when it occurs often, but only in a small number of documents. On the other hand, lower weights are assigned to terms that occur often, but in a high number of documents.

3. Precision is defined by truepositivestruepositives+falsepositives. Recall is defined by truepositivestruepositives+falsenegatives. The F-Measure is defined by 2* precisionrecallprecision+recall.

Additional information

Funding

This work was supported by the University of Kentucky [Research and Creative Activities Program].

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 258.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.