ABSTRACT
This study aims to extend the infodemiology framework by postulating that effective use of digital data sources for cancer communication should consider four components: (a) content: key topics that people are concerned with, (b) congruence: how interest in cancer topics differ between public posts (i.e., tweets) and private web searches, (c) context: the influence of the information environment, and (d) information conduits. We compared tweets (n = 36, 968) and Google web searches on breast, lung, and prostate cancer between the National Cancer Prevention Month and a non-cancer awareness month in 2018. There are three key findings. First, reliance on public tweets alone may result in lost opportunities to identify potential cancer misinformation detected from private web searches. Second, lung cancer tweets were most sensitive to external information environment – tweets became substantially pessimistic after the end of cancer awareness month. Finally, the cancer communication landscape was largely democratized, with no prominent conduits dominating conversations on Twitter.
Notes
1. Tweet-retweet network consists of all the nodes in a given network, with direction edges that indicate if a node retweets another node. In other words, if there is a directional edge formed from node A to node B, it means that node A has retweeted a post from node B.
2. While we make a conceptual distinction between the three types of cancer information conduits, we are not claiming that they are mutually exclusive categories as a highly influential Twitter user could also be an important information broker.
3. The six corpora were: (a) February breast cancer (n = 11,482); (b) February lung cancer (n = 4,104)); (c) February prostate cancer (n = 3,207); (d) March breast cancer (n = 11,526); (e) March lung cancer (n = 3,854); (f) March prostate cancer (n = 2,795).
4. Document-term matrix is a way of representing textual data for LDA, where rows are documents (i.e., tweets), and columns are terms (i.e., individual words), and each cell in the matrix shows how frequent each term would appear in each document (Welbers et al., Citation2017).