800
Views
0
CrossRef citations to date
0
Altmetric
Book Review

Subjective Well-Being and Social Media: Reconciling Big Data and Statistics

by Stefano M. Iacus and Giuseppe Porro. Chapman and Hall/CRC, Taylor & Francis Group. Boca Raton, FL, 2021, ISBN 9781138393929, xii + 206 pp., $99.95 (Hardback).

The book is devoted to development of the concept and measures for the status of good health and good fortune which produce a general feeling of wellness. The project on sentiment analysis and well-being index has been performed since 2012 in universities of Italy and Japan. The book is organized in five chapters, each with multiple sections and subsections.

Chapter 1 of “Subjective and Social Well-Being” presents different approaches to estimation of well-being, particularly, by macro-economic definitions, survey analysis, big data, and social networks. Subjective well-being is estimated by measures of life satisfaction and mood as cognitive and emotional components of evaluative and hedonic domains, respectively, and also in the so-called eudaimonic dimension related to sense of purpose and meaning in life. The emotional component of well-being corresponds to short-run feelings, life satisfaction—to medium or long-run horizon, and eudaimonic—to long-run and forward-looking perspective. Wide-range aggregated indicators of social and subjective well-being include the gross domestic product (GDP), Subjective Well-Being (SWB) index, Gross National Happiness (GNH) index adopted in Bhutan (note: this country is called Buthan in the book: p.13 and further – S.L.), Human Development Index (HDI) of the UN, Better Life Index (BLI) of the Organization for Economic Cooperation and Development (OECD), Happy Planet Index (HPI) of the New Economics Foundation in England, index BES used in Italy, Canadian Index of Well-Being (CIW), Australian National Development Index (ANDI), and many others by different countries. A wide range of indicators is produced in various surveys to estimate subjective well-being, for just a few examples: Gallup World Poll, World Database of Happiness, World Values Survey (WVS), Gallup-Sharecare Well-Being Index, British Household Panel Survey, European Social Survey (ESS), Eurobarometer, European Quality of Life Survey(EQLS), Global Health & Well-Being Survey, National Child Development Survey, Survey of Well-Being of Young Children (SWYC), Social-Emotional Well-Being (SEW) Survey, GA Releases Graduate Student Happiness & Well-Being Report, and more. All these aggregated indices include multiple socio-economics and demographic other characteristics. Self-reported evaluations are elicited by survey-based Experience Sampling Method (ESM), Daily Reconstruction Method (DRM), and Event Recall Method (ERM). Social media data are available via the Social Networking Sites (SNS) which allow to researchers and policy makers to know in real-time how people perceive the quality of life, without interaction between researchers and respondents by means of surveys and questionnaires which could induce the so-called observer bias. Sentiment analysis is performed by monitoring public opinion on well-being, consists in systematic extraction of the texts posted autonomously on different internet platforms, including blogs, forums, and SNS messages. These methods are unsupervised and automated by using ontological dictionaries with words and expressions appearing, or not, in the text. “Websites such as Twitter, Facebook and Instagram or services like Google Trends provide a huge amount of potentially new information to the assessors of social sentiment” (p. 32). For evaluation of subjective well-being, they use language patterns, for instance, given in the closed dictionary of the Linguistic Inquiry and Word Count (LIWC) which in 2015 version contains almost 6400 words, word stems, and emoticons. Gross National Happiness index was used by Facebook and Twitter, and other happiness indicators called “Hedonometer” and Satisfaction with Life Scale (SWLS) were applied to study well-being of Twitter users. More advanced, or supervised, methods are also based on the automated classification work, but with intervention of human coders who can appreciate nuances of meaning, informal speech, jargons, paradoxical or ironic expressions, that often are misinterpreted in a totally automated analysis.

Chapter 2 of “Text and Sentiment Analysis” reviews the most commonly used techniques and describes a newly proposed method for the integrated Sentiment Analysis (iSA) that allows for extracting meaning from SNS texts without relying only on the automatic computer classification rules but assigning a crucial role to human supervision. The problem is that any text can be understood differently depending on a context, irony, special terms, slang, semantics, semiotics, etc. Several principles of the adequate attitude to meaningful interpretation of a text are: every quantitative linguistic model is wrong, but some can be useful; quantitative methods help, but cannot replace human; there exists not best or ideal technique of text analysis; validation of the analysis, method, and every model is needed by the data itself. Transformation of documents, or corpora, into numbers for machine-digestible form of the so-called Document-Term matrices is described, together with preparation of social networks data. Unsupervised learning algorithms used for individual estimation are presented in detail, including Corpora approach, NLP, WordFish, topic models, word2vec, clustering methods, etc. Machine learning methods are supervised methods based on a training set and a test set. They include Support Vector Machine (SVM), Decision Trees and Random Forests, Artificial Neural Networks, Deep Learning, WordScores, and others for individual classification, while ReadMe and iSA are employed for aggregated classification. Each of the methods is presented in the formulae and graphs. The advantages of iSA and iSAX algorithm for sequential sampling are discussed in detail. Performances of various machine learning approaches in confidence intervals are considered via empirical comparisons by different datasets. It is concluded that machine learning methods for individual classification are good when the classical assumptions of statistical inference hold about random sampling, correctly specified reference population, no noise in the data, and large training set. However, in big data coming from social media, the reference population is often misspecified, data contain noise, and tagging is made sequentially until a certain number of data, and the length of this process depends on the level of noise in the data. For the well-being measurements, the iSA estimation of aggregated distribution of opinions is preferable.

Chapter 3 of “Extracting Subjective Well-Being from Textual Data” deals with SNS data collected from Twitter as data source. Several methods and algorithms designed to categorize and evaluate the ranking of many thousand words and to measure average happiness scores are described. They include the Hedonometer, the Gross National Happiness index, the World Well-Being Project, and the Twitter Subjective Well-Being Index. The last index on which the authors have been working and base this book is described in various features. For example, keywords for selecting the training data, include such aspects of personal well-being as emotional component, satisfying life and vitality, resilience and self-esteem, and many more. Coding rules are described, gathering of positive, neutral, negative, and off-topic tweets, applying the iSA algorithm to daily test sets of data, and constructing the needed index as a portion of positive within positive and negative components. Such an index belongs to 0-1 interval and measures for each day and region a degree of well-being topic under consideration. Averaging in time and country yields, for instance, such indices as Subjective Well-Being SWB-I and SWB-J, for Italy and Japan, respectively. These indexes, together with some others, are compared and analyzed by their weekly or yearly behavior across several last years. Regression and structural equation modeling (SEM) for the cross-country analysis are performed and the results interpreted.

Chapter 4 of “How to Control for Bias in Social Media” focuses on the problem of representativity of the data elicited from SNS platforms, because such samples represent rather the users of those social media and these data cannot be directly extended to the whole population. Different types of bias are considered, and various models of data adjustment are described, including application of penetration rate as a proxy of the representativeness, small area estimation (SAE) methods, sample balancing, propensity score weighting, and spatial-temporal SAE model with weights. The last approach is applied to estimating measures of well-being at work on Italian data, and the results are compared with official statistics.

Chapter 5 of “Subjective Well-Being and the COVID-19 Pandemic” is devoted to assessment of the life perception after the virus outbreak in 2020. Studies in Italy and Japan, and also in USA and South Africa on the pandemic effects on feeling and mood, health status and mental health are considered for general population and for special categories, such as health care workers, pregnant women, vulnerable categories, etc. Potential economic losses caused by the global pandemic impact are discussed in different scenarios. Big negative peaks of the lockdown effects on the Gross National Happiness Index are described by Hedonometer, SWB-I and SWB-J, and other indices built by data from Tweeter, Google, Facebook, and other sources. Correlation analysis of the indices with potentially explanatory covariates, monthly regression analysis of the indices by their predictors, classical LASSO, dynamic elastic net of a time varying window, and the SEM are performed. The results of all modeling for Italy and Japan are presented in many tables and charts. They show that the high frequency indicators of well-being are needed for adequate description and understanding of the humanity self-feeling and its dynamics.

Finally, the bibliography of hundreds of the most recent sources given in 25 pages and the comprehensive index are supplied. The book comes with two sets of R scripts and data for implementation of the techniques described in Chapter 2 and replication of the analysis given in other Chapters 3–5. All the data and scripts will be available at https://github.com/siacus/swb-book.

Besides considering the problem of well-being estimation per se, the book presents a great compendium of methods helpful for students and specialists working on various projects which need getting big data from the net sources for statistical research in social studies.

Several references to the considered topics can be also found within the books reviewed in the references given below.

Stan Lipovetsky
Minneapolis, MN

References

  • Lipovetsky, S., (2019), “The Palgrave Handbook of Indicators in Global Governance,” Technometrics, 61, 278–280.
  • Lipovetsky, S., (2019), “Applied Data-Centric Social Sciences: Concepts, Data, Computation, and Theory,” Technometrics, 61, 424–425.
  • Lipovetsky, S., (2020), “Lexical Collocation Analysis: Advances and Applications,” Technometrics, 62, 137.
  • Lipovetsky, S., (2021), “Big Data and Social Science: Data Science Methods and Tools for Research and Practice,” Technometrics, 63, 421–423.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.