5,178
Views
36
CrossRef citations to date
0
Altmetric
Articles

Content Analysis by the Crowd: Assessing the Usability of Crowdsourcing for Coding Latent Constructs

, &
 

ABSTRACT

Crowdsourcing platforms are commonly used for research in the humanities, social sciences and informatics, including the use of crowdworkers to annotate textual material or visuals. Utilizing two empirical studies, this article systematically assesses the potential of crowdcoding for less manifest contents of news texts, here focusing on political actor evaluations. Specifically, Study 1 compares the reliability and validity of crowdcoded data to that of manual content analyses; Study 2 proceeds to investigate the effects of material presentation, different types of coding instructions and answer option formats on data quality. We find that the performance of the crowd recommends crowdcoded data as a reliable and valid alternative to manually coded data, also for less manifest contents. While scale manipulations affected the results, minor modifications of the coding instructions or material presentation did not significantly influence data quality. In sum, crowdcoding appears a robust instrument to collect quantitative content data.

This article is part of the following collections:
Communication Methods and Measures Article of the Year Award

Funding

This research is conducted under the auspices of the Austrian National Election Study (AUTNES), sponsored by the Austrian Science Fund (FWF): S10908-G11.

Notes

1 If the data are already gathered, “bad” responses can also be filtered out through an examination of response patterns (Zhu & Carterette, Citation2010) and the duration of the task (Kittur, Chi, & Suh, Citation2008). We analyzed response patterns and used a CrowdFlower setting tool that automatically refuses the work of contributors that fall below a minimum time specification for a task.

2 Although it would have been preferable to use coding instructions identical to the AUTNES study, we were not able to implement the AUTNES coding procedure into the crowdsourcing task’s design considering its length and complexity and crowdworkers’ possible unfamiliarity with the process of content analysis. Therefore, instructions that differed from the AUTNES-original regarding the fixation of the target of opinion and ease of language were used to guide the online contributors.

3 Variable V31 (Schönbach et al., Citation2016).

4 ICR assessment for the AUTNES data is those reported for variable V31 “object evaluation” in the AUTNES documentation (Kleinen-von Königslöw et al., Citation2016). For AUTNES overall more than 50,000 sentences were coded by seven coders, the ICR measures were calculated based on their coding of 790 sentences.

5 The crowdworkers’ trust scores ranged from 0.73 to 1 (= 0.90; SD = 0.07), all offline coders received a trust score of 1. As a consequence, 4,538 annotations (Crowdworkers weighed with trust scores) instead of 5,059 annotations (Crowdworkers unweighted) were taken into account for the chi-square test and 3,361 annotations (Crowdworkers weighted with trust scores) instead of 3,744 annotations (Crowdworkers unweighted) for the t-test.

6 In 54 cases we had to dismiss the AUTNES results as source of comparison, as the rating differed in terms of the target of opinion.

7 We pasted the remark “Important: This question is a test to check if the sentences are read carefully. Please select “I do not know”, to pass the test. If you select “evaluation” or “neutral-no evaluation”, you will have to finish the job without extra payment” directly after a sentence that was assumingly meant to be rated.

8 All 30 sentences were rated by 505 of 510 workers, hence 15,150 annotations. Five workers together annotated the remaining 74 sentences. Their judgements are part of the sample since they passed the test (see endnote 7) before they canceled the job.

Additional information

Funding

This research is conducted under the auspices of the Austrian National Election Study (AUTNES), sponsored by the Austrian Science Fund (FWF): S10908-G11.