1,239
Views
9
CrossRef citations to date
0
Altmetric
Methodological Studies

Text as Data Methods for Education Research

, , &
Pages 707-727 | Received 01 Aug 2018, Accepted 02 Jun 2019, Published online: 06 Dec 2019
 

Abstract

Recent advances in computational linguistics and the social sciences have created new opportunities for the education research community to analyze relevant large-scale text data. However, the take-up of these advances in education research is still nascent. In this article, we review the recent automated text methods relevant to educational processes and determinants. We discuss both lexical-based and supervised methods, which expand the scale of text that researchers can analyze, as well as unsupervised methods, which allow researchers to discover new themes in their data. To illustrate these methods, we analyze the text interactions from a field experiment in the discussion forums of online classes. Our application shows that respondents provide less assistance and discuss slightly different topics with the randomized female posters, but respond with similar levels of positive and negative sentiment. These results demonstrate that combining qualitative coding with machine learning techniques can provide for a rich understanding of text-based interactions.

Acknowledgments

We thank June John for her assistance in data collection.

Notes

1 In keeping with the literature, we use the term “document” to refer to one observation of text (e.g., one essay, one text message, or one discussion board post).

2 Hand-coding is a traditional method frequently employed in qualitative research and content analysis in which a person reads and manually codes words or phrases with specific themes.

3 The harmonic mean is the reciprocal of the arithmetic mean of reciprocals and is more conservative than using the arithmetic mean (i.e., it produces a lower F1-measure).

4 The R package “readme2” implements this; see https://github.com/iqss-research/readme-software.

5 With computer-assisted clusterings, the researcher can explore many possible computer-generated clusterings (Grimmer & King, Citation2011).

6 We should note that topic models are multimodal, meaning that topics can be sensitive to the starting values used. One way to account for this is using a spectral initialization, which is deterministic and globally consistent (Roberts, Stewart, & Tingley, Citation2016).

7 The FREX measure was developed by Roberts, Stewart, and Airoldi (Citation2016), and is the weighted harmonic mean of the word’s rank in terms of exclusivity and frequency.

8 Some methods also account for word order in a circumscribed manner through using “bigrams” or “trigrams” as opposed to “unigrams.” For example, the sentence “This class is hard” could produce unigrams (“this,” “class,” “is,” “hard”), bigrams (“this class,” “class is,” “is hard”), or trigrams (“this class is,” “class is hard”).

9 Popular packages in R for analyzing text data will do these for you. The tidytext and text mining (“tm”) packages in R are particularly popular. Some analysis packages will also preprocess the data before conducting the analysis (e.g., the STM package in R; Roberts, Stewart, & Tingley, Citation2018).

10 These steps depend on the analysis being conducted. Researchers interested in linguistic style, for instance, may be primarily interested in function words instead of content words.

11 We received institutional review board approval for this study, and we worked in close consultation with them to determine the number of comments placed in each course in order to minimize the costs placed on field participants.

12 We used names that were recently used in studies that have experimentally manipulated perceptions of race and gender (e.g., Bertrand & Mullainathan, Citation2004; Milkman, Akinola, & Chugh, Citation2015; Oreopoulos, Citation2011). We chose a set of four first names and four last names for each gender-race combination (128 names in total).

13 For comparison purposes, we convert the sentiment scores into positive/negative/neutral classifications. For LIWC, posts that have a higher percentage of positive terms than negative terms are classified as positive, posts that have a higher percentage of negative terms than positive terms are classified as negative, and posts that have an equal percentage of positive and negative terms are classified as neutral. For SEANCE, messages with a positive score above 0 and a negative score less than or equal to 0 are classified as positive, messages with a negative score above 0 and a positive score less than or equal to 0 are classified as negative, and all other posts are classified as neutral.

14 See pages 4–5 for an introduction to the confusion matrix.

15 The performance for the negative codes is worse, likely due to how rare the negative codes were (3% of posts were negative).

16 Meyer and Mittag (Citation2017) also show how to estimate the degree of bias due to measurement error in binary dependent variables without having the true variable (i.e., the variable without measurement error).

17 Recall that there are 241 positive posts and only 26 negative posts in our corpus (out of 798 posts).

18 This scale is designed for student-to-student confirmation, and we apply it to both students and instructors. The instructor confirmation scale includes response to questions (which is very similar to the combination of assistance and acknowledgment and includes answering students’ questions fully and indicating that they appreciate student questions), demonstrating interest in the student (which is very similar to individual attention), teaching style (which does not apply in our online discussion forum context), and disconfirmation (which is included in the student-to-student confirmation scale; Ellis, Citation2000).

19 We subset our sample to Black and White posters because a substantial number of MOOC participants were from the United States and may not be able to discern gender in Asian names.

20 See page 6 for an introduction to k-fold cross-validation.

21 Disconfirmation may also be a more complex task to predict. Disconfirmation sometimes includes clear terms of disagreement, like “no” and “not really,” but often is more complex. For instance, one disconfirming post simply counters “highly subjective” when a fictitious poster complained about the lectures, and another states that it is the “calm before the storm” when a fictitious poster stated that they were feeling confident about the course. None of these terms (subjective, calm, storm) shows up in any of the other disconfirming posts.

22 As noted earlier, this is primarily an issue with binary variables, which do not exhibit classical measurement error.

23 See pages 7–8 for an introduction to topic models.

Additional information

Funding

This research was supported from a grant from the Institute of Education Sciences [Award No. R305B140009].

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.