ABSTRACT
In recent years, the volume of clickstream and user data collected by news organizations has reached enormous proportions. As a result, news organizations—as well as journalism scholars—face novel methodological challenges to describe and analyze this wealth of information. To move forward, we demonstrate a computational approach to understand the news journeys Web users take to find the news they want to read. We propose the use of Markov chains. These models provide an effective and compact way to discover meaningful patterns in clickstream data. In particular, they capture the sequentiality in news use patterns. We illustrate this approach with an analysis of more than 1 million Web pages, from 175 websites (news websites, search engines, social media), collected over 8 months in 2017/18. The analysis of such data is of high interest to journalism scholars, but can also help news organizations to design sales strategies, provide more personalized content, and find the most effective structure for their website.
Disclosure Statement
No potential conflict of interest was reported by the author(s).
ORCID
Susan Vermeer http://orcid.org/0000-0002-9829-8057
Damian Trilling http://orcid.org/0000-0002-2586-0352
Notes
1 More information on the Python module see https://github.com/uvacw/df2markov.
2 Besides tracking their online media use, respondents also filled out an online survey: 48.5% were male, mean age was 47.2 (SD = 19.2), and 15.7% had a low level of education (e.g., primary school), 38.3% had a medium level of education (e.g., college), and 44.6% had a high level of education (e.g., university).
3 To guarantee respondents’ privacy as much as possible, we filtered the raw data to exclude sensitive information. We stored the data in an Elasticsearch database on a server that is not directly available for the researchers. Instead, Robout, a Python library is made available on another secured server to complement Robin. We conducted the analyses using Robout and a Elasticsearch database on the second server so no sensitive data would leave the environment.
4 Examining the probability of users changing from one website to another website (e.g., social media → tabloid → tabloid → broadsheet) or the probability of users changing from one Web page to another Web page within the same website (e.g., homepage → section page → news article → news article).
5 More information see https://doi.org/10.6084/m9.figshare.7314896.v1.
6 We are grateful to an anonymous reviewer for their suggestion.