ABSTRACT
The relation between mental health and the language employed by social media users has been the focus of a wide range of studies in Natural Language Processing and related fields intended to predict early signs of a mental disorder. In many cases, however, text data alone is considered to be a sufficient source of learning features, and other possible sources also available from social media are often neglected. To shed light on these issues, the present work addresses the use of social media connections information — represented by the identities of Twitter friends, followers and mentions — to predict depression and anxiety disorder. Using a large multimodal dataset of over 31K unique Twitter timelines (555 million words) and associated network data, we built a number of network-based models (using network embeddings representations), and text-based alternatives (bag-of-words and LIWC baseline models) to these tasks. Results suggest that not only network connections may act as strong predictors of mental health, but that these models may actually outperform standard baseline alternatives that rely on text data alone, which may be seen as a first step towards more sophisticated architectures that may actually combine textual and non-textual information.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 The self-report statement itself is not included in the corpus data, as this could render the classification task trivial.
2 As discussed in Section 3.3, our current dataset keeps a 7-1 Control-Diagnosed class distribution.
3 Code available from https://github.com/rlagedo/ExtraLinguistic_SetembroBR