Abstract
This study aims to predict audience-rated news quality with journalistic values and linguistic/formal features of news articles, based on the theoretical rationales derived from information processing models, journalism and news consumption literature, and linguistic studies. We employed a traditional social science survey of over 7,800 news audiences and implemented natural language processing, text-mining, and neural network analyses for 1,500 news articles concerning public affairs. Results suggest that the journalistic values of news articles are stronger predictors of audience-rated news quality than their linguistic/formal features. The impact of journalistic values overrode that of the news audience attributes which served as a baseline for comparison. Specifically, believability, depth, and diversity were more important in predicting audience-rated news quality than readability, objectivity, factuality, and sensationalism. Regarding linguistic/formal features, bylines, sources, subjective expressions, and article similarities were influential. This study provides an additional support that news audiences regard journalistic values highly as substantial factors of news quality. It also provides empirical evidence for the normative news reporting guidelines. Methodologically, it serves as an example of integrating computational and textual methods with traditional social science approach.
Disclosure Statement
No potential conflict of interest was reported by the author(s).
Notes
1 Note that we are interested in explaining and quantifying news quality evaluated by news audiences, neither in establishing a gold standard of what so-called “quality journalism” should pursue nor in proposing the quantification of “quality journalism” per se. As illustrated in the Introduction section, the role of news audiences in distributing news articles is becoming more important in a news environment mediated by digital platforms. Given this circumstance, we aim to identify the factors that predict audience-rated news quality.
2 The MAIN here represents Modality, Agency, Interactivity, and Navigability, the four types of affordance that digital technologies may provide (Sundar Citation2008). For reference, this model puts emphasis on the influence of technological affordance on the credibility judgment of Web sites, along with its incorporation of dual information processing.
3 In this paper, the term “journalistic value” does not refer to news values or news worthiness such as unexpectedness, prominence, controversy, and human interest. By journalistic values, we mean normative values that journalists should keep in mind when writing news articles. Here, these are individual-level, rather than organizational-level, concepts.
4 Retrieved from the homepage of the American Society of Newspaper Editors on January 25, 2017: http://asne.org/content.asp?pl=24&sl=171&contentid=171
5 Retrieved from the homepage of the BBC on January 25, 2017: http://downloads.bbc.co.uk/aboutthebbc/insidethebbc/howwework/reports/pdf/neil_report.html
6 Retrieved from the homepage of the Canadian Broadcasting Corporation on January 25, 2017: www.cbc.radio-canada.ca/en/reporting-to-canadians/acts-and-policies/programming/journalism/
7 Retrieved on July 8, 2019 from: https://www.journalism.org/topics/state-of-the-news-media/2005/
8 KOSAC (Shin and Kim Citation2013; Shin et al. Citation2012) was developed by one of the coauthors of the present study, based on the Multiperspective Question Answering Opinion Corpus (Wiebe Citation2002). It contains fine-grained annotations for the 7,744-sentence corpus of Korean news articles. Because the present analyses used Korean news articles, it was necessary to employ KOSAC, which adequately reflects the characteristics of the Korean language.
9 K-LIWC (Lee, Sim, and Yoon Citation2005) is a Korean version of the Linguistic Inquiry and Word Count software developed by Pennebaker, Francis, and Booth (Citation2001). It conducts a dictionary-based, automated text analysis and classifies words into cognitive and emotional categories.
10 These 11 issues—the most recent, attention-grabbing issues at the time of data collection—were as follows: minimum wage policy, revision of comprehensive real estate holding tax, the first South–North Korean summit conference, the Yemeni refugee problem on Jeju island, the president’s constitutional amendment proposal and rejection, fine-dust policy measures, the secret agreement on sexual slavery with the Japanese government, the resumption of the Shin-kori nuclear power plant construction, the College Scholastic Ability Test reform, the repeal of the abortion law, and the Constitutional Court’s decision on conscientious objection to military service.
14 Because of the space limits, we cannot illustrate all linguistic and formal features here. For more detailed information, please contact the corresponding author.
15 In conducting an online survey, we systematically set a minimum time required for reading news articles to prevent respondents from skipping or scanning articles. The order of the 10 news articles displayed onscreen for each individual respondent was randomized to avoid any potential unknown order effect.
16 Even with this single-item measurement, respondents ended up having to answer 80 questions about journalistic values and news quality because they had to answer all eight questions after reading each article and had to repeat this procedure 10 times.
17 Please refer to Appendix B (Supplementary material) for detailed explanations.
18 The full issue-knowledge question lists can be provided upon request.
19 Further explanation about the neural network can be found in Prakash and Rao (Citation2017) and Wiley (Citation2016). In Appendix C (Supplementary material), we provided more detailed description of ANN’s general mechanism.
20 Among traditional statistics, polynomial regression can handle the curvilinear relationships among variables but has to satisfy its basic residual assumptions of independence, normality, and homoscedasticity. Having 99 predictors, the present study cannot satisfy the requirements of traditional regression analysis.
21 The optimal number of hidden layers in this study is determined by comparing the performance of many models built with different hyperparameter options (see Appendix D, Supplementary material).
22 Among other traditional methods, the hierarchical linear model (or mixed model) can include different levels of unit of analysis (e.g., individuals [level 1] and groups [level 2] composed of those individuals). The present study has two different levels of variables (i.e., news article level [level 1] and news article evaluator level [level 2]). However, the level 1 variables overlap with the level 2 variable because each news article is evaluated by around 50 evaluators. Thus the observations at level 2 are not independent of each other, preventing us from adopting the hierarchical linear model.
23 When the number of outcome variable classes is balanced, the log loss expected by random chance is calculated using the following formula: Logloss = -ln(1 / N) (N: the number of classes of the outcome variable). In the present case, the log loss expected by random chance was 1.61 because N = 5.