1,378
Views
2
CrossRef citations to date
0
Altmetric
Full Research Papers

Risk-informed decisions for epidemics

&

Abstract

Social media, an open and free platform containing large volume of user-generated content (UGC) is an ideal data source to achieve risk-informed decisions for epidemics. The probability and predictive value of how social systems deal with epidemics can be conceptually and empirically studied by monitoring social media data for formulating risk-informed decisions in improving preparedness and response to epidemics. ILI (influenza-like illness) surveillance by monitoring social media data offers opportunity to provide early warning signs for improving public health interventions. In this research, we monitored Weibo, a Chinese social media data on swine flu in 2011 to analyse the post content, the correlation with official surveillance data as well as geography distribution in order to verify whether Weibo is an effective platform for conducting risk-informed decision for epidemics.

Introduction

With the advancement of communication information technology, online social networks have become important for people to share their daily life supporting interact with others. Self-presentation and self-disclosure are considered as key elements of social media (Kaplan & Haenlein, Citation2010). Many people post their daily life and feelings in social websites, including health-related data. The large quantities of self-health information reported by users provides us significant volume of data for making casual inferencing about major significance of physical factors (i.e., numbers of people, distance, and time) in social processes (Barnes & Wilson, Citation2014). Detecting early risk signals informed by users on social media is vital for improving response to epidemics.

Swine flu is a respiratory disease caused by swine influenza virus (SIV), which is a kind of zoonotic infection for humans. It is usually common among pig populations but the virus can be transmitted to people as well. People who have more exposure to pigs are at higher risk of swine flu. Swine influenza virus includes a family of influenza virus, mainly known as H1N1, H3N2, and H1N2 etc. The worldwide outbreak in 2009 is due to the H1N1 virus. Swine flu is contagious by patients within the period of symptom development. It is therefore important to conduct timely surveillance on swine flu to contain it early for preventing the outbreak of large-scale influenza. Since the outbreak of pandemic influenza H1N1 in 2009, governments and related authorities began to recognise the importance of ILI (influenza-like illness) surveillance using social media.

In China, ILI surveillance is conducted by all national sentinel hospitals and network laboratories throughout the whole year. Weekly reports on surveillance data are produced and disseminated by the Chinese National Influenza Centre. Besides the official sentinel surveillance, digital surveillance through social media has emerged in recent years. Weibo is launched in September of 2009, as the most popular social media in China, which has about 500 million users with 30 million posts every day making it useful for conducting online surveillance in China. In this study, we monitor a sample data of Weibo posts on swine flu in 2011 to test the reliability of Weibo as a platform in providing reference information for conducting risk-informed decisions for epidemics in China. How web base mediated environment such as Weibo can provide early risk signs which can help make decisions on preparedness and response to epidemics in a timely manner guides this study. Our research is developed as follows: (i) review on risk-informed decision, (ii) description on data collection and data preprocess, including data cleaning and data classification; (iii) data analysis and results, containing correlation analysis, geography distribution and content analysis; and, (iv) research conclusions and future directions. Our aim is to use the Weibo data on swine flu as a test to see whether we can make risk-informed decisions for epidemics based on Weibo in China.

Risk-informed decision

Risk can be conceptualised as probabilistic and severity of an adverse event having consequences on human life, health burden, natural and built environment. Our societies are increasingly faced with natural and deliberate attack related disasters which requires a trans-disciplinary understanding with an aim to improve our effort for dealing with preparedness and response in proactive manner. An observation of the recent disasters preparedness and response such as Christchurch and Nepal Earthquake, Tianjin fire explosion, Australian Bush fire, recent West African Ebola, South Korean Middle Eastern Respiratory Syndrome to name a few suggest greater needs to develop risk informed decision criterion for making the social and organisational systems far more resilient for dealing with disaster preparedness and response. Risk-Informed Decisions or RID therefore, needs to balance the social, environmental, ecological and economic dimension when dealing with disaster preparedness and response to natural and deliberate attack related disasters.

Risk-informed decision-making or RIDM takes a paradigm shift from that traditional risk based decision-making by calling for a participatory procedure in which stakeholders perform risk analysis as a conditional predicament for the characterisation of risks prior to formal assessment (Amendola, Citation2002). Within the context of medical decision-making, RIDM refers to patient understanding of the disease or condition being addressed and comprehension of the service and risk associated for their decisions (Rimer, Briss, Zeller, Chan, & Woolf, Citation2004). RIDM therefore, offers opportunity to balance the social, environmental, ecological and economic dimension when dealing with disaster preparedness and response to natural and deliberate attack related disasters.

From 1996 until now, with the development of internet and computer technology, disease surveillance gradually moved from traditional hierarchical reporting systems or laboratory data by harnessing internet as a medium for collecting and sharing information to achieve real-time and accurate surveillance for epidemics. The advent of concept of ‘big data’ brought by internet also promotes the development of digital disease surveillance. For example, Lampos, De Bie, and Cristianini (Citation2010) analysed Twitter content to track the prevalence of influenza-related illnesses (ILI) in some regions of the UK. Signorini, Segre, and Polgreen (Citation2011) tracked public sentiment of H1N1 through Twitter demonstrating the usefulness to measure public’s concern about health-related events. Also, Aramaki, Maskawa, and Morita (Citation2011) used Twitter to detect the flu. The high correlation proxy in the outcome showed that Twitter texts can reflect real event efficiently and Natural Language Processing (NLP) technology is applicable for Twitter content extraction. Twitter as one of the most notable micro-blogging services serve as medium of researches on social networks, in which disease surveillance on social networks is based on Twitter data (Aramaki et al., Citation2011).

Within the Chinese context, there are also a few researches on health focusing on Sina Weibo for disease surveillance, which has a large number of users in China (Chun-Hai Fung & Wong, Citation2013; Vong & Feng, Citation2014; (Wang, Paul, & Dredze, Citation2014)). Corley, Mikler, Singh, and Cook (Citation2009) tracked the blogs mentioned about flu and pointed out that the trends of influenza-related blogs have a significant correlation with the US fall flu spread situation. Polgreen (Citation2009) used twitter data to track the evolving public sentiment with regard to swine flu. Lampos et al. (Citation2010) presented an automatic tool for tracking the spreading situation of ILI (influenza-like illness) in UK through the contents of Twitter. Sadilek, Kautz, and Silenzio (Citation2012) developed a framework to accurately mark sick individuals based on the content of online communications.

Computational modelling of social, environmental, ecological and economic behaviour needs to be guided by use of formal methods drawn from social physics for the characterisation and optimization of interactions. This takes place within the social and organisational systems interacting to support the RIDM during disasters. Social physics provides new opportunity to conduct empirical observation of the major significance of physical factors such as numbers of people, distance, and time in social processes (Stewart, Citation1950). Social media therefore, could be a good option to acquire the related data and conduct the risk-informed decisions for epidemic.

Data collection

We collected two sample sets of data, including Weibo data on Swine flu and official influenza data from Chinese national influenza centre. Weibo is the most popular social media in China, with around 127 million active users and millions of self-reported information. The health-related data on Weibo could be the effective proxy for disease surveillance in China. Our data consists of 2000 posts on swine flu in 2011 as a sample data-set of online surveillance data.

To test the reliability of Weibo platform, we further collected the official surveillance data from Chinese national influenza centre to analyse the correlation between two data sets. Chinese national influenza Centre (http://www.cnic.org.cn/eng/) is the official influenza surveillance system in China. In each national sentinel hospitals, staff of surveillance departments record ILI cases and total cases for outpatient and emergency departments by age group. Departments collect the data weekly and input the data into Chinese influenza surveillance information system every Monday. The sentinel surveillance data is reliable and authoritative.

For the Weibo data, we performed preliminary processing to screen out the spam information and manually classify the data to prepare for the correlation analysis. We retrieve the sample data-set using ‘swine flu’ as the keyword. After screening out the spam information unrelated to swine flu disease, we classified all posts into four categories: (i) symptom description (i.e., headache, cough, fever, cold); (ii) feelings and sentiment expression (i.e., terrible, afraid, worried, careless, humorous); (iii) objective expression (i.e., news report, treatment, prevention knowledge); and, (iv) reviewing past experience (i.e., mention own experience during the outbreak of swine flu in 2009). After the basic statistics on all data, we display the geography distribution of all posts to show the top heat in different provinces in China, which can also indicate the area distribution of swine flu across provinces. Then, we regard the posts on symptom description as self-health reported data and test the correlation with official surveillance data. Finally, we conduct content analysis on the category data of feelings and sentiment expressions to discover public sentiment on swine flu.

Data analysis and results

We crawled 2000 Weibo posts on swine flu in 2011 as a sample data-set and collected user name, content, time, devices and location of each record. For all crawled data, 92% posts are posted through mobile termination. In the data-set, there are 321 spam posts, which is unrelated to swine flu disease. Some user name is swine flu or some posts refer to swine flu as a stunt to attract others’ attention. We screened out all the spam posts and classify the left 1680 posts into four categories. The monthly tendency of all useful data in 2011 is shown on Figure . The number of posts shows an upward tendency. In March and October, the trend significantly increases and keeps stable in relatively less number from April to August.

Figure 1. Tendency of number of sample data on Swine flu in 2011.

Figure 1. Tendency of number of sample data on Swine flu in 2011.

Among all effective posts, posts on feelings and sentiment expressions take up the largest percentage with 796 posts. Then, the number of posts on objective expressions comes to the second place, with 574 posts. There are 238 self-health reports on symptom description and 72 posts on reviewing their own suffering experience during the swine flu in 2009. Figure below presents the percentage of four categories of Weibo posts in 2011. A further investigation into the posts on feelings and sentiment expression could be analysed using in sentiment analysis to monitor public concerns and rumours of epidemics. Similarly, the objective expressions including news and treatment and prevention knowledge can be used for improving the information diffusion to wider audience. The posts on reviewing past experience can be used to develop effective public health intervention strategies. In this research, we focus our data analysis effort on the symptom containing headache, cough, fever and cold as it forms the basis for risk informed decision.

Figure 2. Pie chart of four-category posts.

Figure 2. Pie chart of four-category posts.

Geography distribution

According to the number of posts in each province, we visualise the geography distribution in different colour to represent different level of public attention (shown in Figure ). Guangdong province in red mark with 586 posts lie in the first rank. Beijing comes after Guangdong province with 321 posts. From the figure, it can be been seen that most public attention on swine flu are mainly from southern coastal provinces compare to inland cities. In fact, coastal provinces in China have more infected cases during the epidemics outbreak according to the official data (http://www.cnic.org.cn/eng/) because the large larger flowing population and different weather condition. In the past epidemics in China, Guangdong has always been the triggered place and severs infected area in China. The geography distribution of posts can reflect the different public attention across provinces in China and it can also reflect severity levels of different areas during the epidemics, which is a good way to monitor the severity of epidemics spread across areas.

Figure 3. Geography distribution of pots in China.

Figure 3. Geography distribution of pots in China.

Correlation analysis

Among the four categories of Weibo posts, posts on symptom description are regarded as self-health reports or risk-informed data. The attributes of interaction and user-generation contribute to the large volume of self-reporting information contained in Weibo. We collect the monthly official surveillance data from Chinese National Influenza Centre to analyse the correlation with Weibo data to further test the reliability of Weibo data as the monitor index for epidemics in China. From the correlation efficiency (shown in Figures and below), the Weibo data has significant positive correlation with official surveillance data, indicative of the self-health reported data as surrogate for risk-informed information demonstrating an effective index to conduct epidemic surveillance in China.

Figure 4. Tendency of official surveillance data and Weibo sample data on swine flu in 2011.

Figure 4. Tendency of official surveillance data and Weibo sample data on swine flu in 2011.

Figure 5. Correlation efficiency between official surveillance data and Weibo sample data.

Figure 5. Correlation efficiency between official surveillance data and Weibo sample data.

Conclusions

We analysed a sample data-set of Weibo posts on swine flu in 2011 which shows that most people prefer to express their feelings or sentiment on swine flu and mobile application. From the geography distribution of Weibo posts, most public attention on swine flu are mainly from southern coastal provinces compare to inland cities, which is consistent with the realistic distribution of infected cases across different provinces. The significant correlation efficiency between official surveillance data and Weibo data indicate that Weibo could be a reliable social media platform to conduct epidemic surveillance based on the self-health reported data. Furthermore, Weibo with large volume of self-health reported data showing the attributes of interaction and user-generation content could serve as an effective platform for epidemic surveillance on geography distribution and spread tendency in China. It also suggests that health-related data generated by public is a valuable data source for conducting disease surveillance and early detection for improving public health preparedness and response. In the future research, we can test a larger data-set to further estimate the effect of Weibo data on disease surveillance in China. This is beneficial to Chinese government to consider using Weibo as an effective and supplementary channel for tracking influenza in China combing with the current sentinel surveillance to achieve real-time monitor, early detection and timely response.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

  • Amendola, A. (2002). Recent paradigms for risk informed decision making. Safety Science, 40, 17–30.
  • Aramaki, E., Maskawa, S., & Morita, M. (2011, July). Twitter catches the flu: detecting influenza epidemics using Twitter. In Proceedings of the conference on empirical methods in natural language processing (pp. 1568–1576). Association for Computational Linguistics.
  • Barnes, T. J., & Wilson, M. W. (2014). Big data, social physics, and spatial analysis: The early years. Big Data & Society, 1, 1–14.
  • Chun-Hai Fung, I., & Wong, K. (2013). Efficient use of social media during the avian influenza A (H7N9) emergency response. Western Pacific Surveillance and Response Journal, 4(4), 1–3.
  • Corley, C., Mikler, A. R., Singh, K. P., & Cook, D. J. (2009, July). Monitoring Influenza Trends through Mining Social Media. In BIOCOMP (pp. 340–346).
  • Kaplan, A. M., & Haenlein, M. (2010). Users of the world, unite! The challenges and opportunities of Social Media. Business Horizons, 53, 59–68.
  • Lampos, V., De Bie, T., & Cristianini, N. (2010). Flu detector-tracking epidemics on twitter[m]/machine learning and knowledge discovery in databases. Springer, Berlin Heidelberg, 2010, 599–602.
  • Polgreen, P. M. (2009, October). The use of twitter to track public concerns about novel h1n1 influenza. In 47th Annual Meeting. Friday, October 30, 2009, Idsa.
  • Rimer, B. K., Briss, P. A., Zeller, P. K., Chan, E. C., & Woolf, S. H. (2004). Informed decision making: What is its role in cancer screening? Cancer, 101, 1214–1228.
  • Sadilek, A., Kautz, H. A., & Silenzio, V. (2012, June). Modeling Spread of Disease from Social Interactions. In ICWSM- Sixth International AAAI Conference on Weblogs and Social Media, June 4-8, Trinity College in Dublin, Ireland.
  • Signorini, A., Segre, A. M., & Polgreen, P. M. (2011). The use of twitter to track levels of disease activity and public concern in the us during the influenza a h1n1 pandemic. PLoS ONE, 6, e19467, 1–10.
  • Stewart, J. Q. (1950). The development of social physics. American Journal of Physics, 18, 239–253.
  • Vong, S., & Feng, Michael (2014). TEarly response to the emergence of influenza A (H7N9) virus in humans in China: The central role of prompt information sharing and public communication. Bulletin of the World Health Organization, 92, 303–308.
  • Wang, S., Paul, M. J., & Dredze, M. (2014). Exploring health topics in Chinese social media: An analysis of Sina Weibo. AAAI Work World Wide Web Public Heal Intell. Retrieved from http://www.cs.jhu.edu/~mdredze/publications/2014_w3phi_weibo.pdf

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.