ABSTRACT
This research introduces a method for segmenting customers using Structural Topic Modelling (STM), a text analysis tool capable of capturing topical content and topical prevalence differences across customers while incorporating metadata. This approach is particularly suitable for contexts in which textual data is either a critical component or is the only data available for segmentation. The ability to incorporate metadata by using STM provides better clustering solutions and supports richer segment profiles than can be produced with typical topic modelling approaches. We empirically illustrate the application of this method in two contexts: 1) a context in which related metadata is readily available; and 2) a context in which metadata is virtually non-existent. The second context exemplifies how ad-hoc generated metadata can increase the utility of the method for identifying distinct segments.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
1. Note that even if an individual-level response is not needed, such data may still be valuable for ‘representative’ segment identification – e.g., characterising and understanding the proportion of the market talking about X; identifying topics that are commonly discussed, etc.
2. The data in our two empirical analysis examples, Empirical Example 1 and Empirical Example 2, contain, in fact, many topics that are correlated.
3. We also tested the complete linkage clustering method. The CPCC value was almost identical to Ward’s method, 0.66.
4. STM commonly employs ‘k’ to refer to the total number of topics, while clustering methods also employ ‘k’ to refer to the total number of clusters. To avoid any confusion, this study uses ‘k’ (lower case) to refer to the total number of topics and ‘K’ (upper case) to refer to the total number of clusters.
5. Note that showing the topic words for all the clusters would consume too much space.
6. Retweets were not removed and tweets from unique individuals were not combined. Instead, in this example we segmented Twitter entries. If desired, researchers could easily remove tweets from duplicated individuals or aggregate tweets at the individual level by employing the Twitter user’s screenname.
7. Note that standard text data pre-processing dropped two documents that contained only a link to a website in each of them.
8. The pattern of increasing CPCC values using the complete linkage method was consistent with that found using Ward’s method.
Additional information
Funding
Notes on contributors
Jorge E. Fresneda
Jorge E. Fresneda is an Assistant Professor of Digital Marketing and Marketing Analytics in the Martin Tuchman School of Management at New Jersey Institute of Technology. Dr Fresneda holds an MS in Applied Statistics from UNED, an MA in Marketing and Sales Management from EAE Business School, and a PhD in Marketing from Drexel University. His research explores several areas of digital marketing, such as the role of information influencing online consumers, social media marketing, or online accessibility. An important part of his research includes the development of methods to analyse unstructured data. His research has been published in the Journal of Consumer Affairs, Information and Management, Decisions Support Systems, and Frontiers in Psychology. Dr Fresneda has also authored two book chapters of the book “Practical Text Analytics: Maximizing the Value of Text Data”.
Thomas A. Burnham
Thomas A. Burnham is an Assistant Professor of Marketing in the College of Business at the University of Nevada, Reno. He holds a BA in Managerial Studies from Rice University and a PhD in Marketing from the University of Texas at Austin. His research investigating consumer switching costs won the best article of the year award in the Journal of the Academy of Marketing Science. His current research explores firm learning from customer feedback and customer feedback metrics, motivations and types, with an emphasis on consumer suggestion sharing.
Chelsey H. Hill
Chelsey H. Hill is an Assistant Clinical Professor in the Decision Sciences and MIS Department of the LeBow College of Business at Drexel University. She holds a BA in Political Science from the College of New Jersey, an MS in Business Intelligence from Saint Joseph’s University, and a PhD in Business Administration with a concentration in Decision Sciences from Drexel University. Her research interests include consumer product recalls, online consumer reviews, safety and security, public policy, and humanitarian operations. Her research has been published in the Journal of Informetrics and the International Journal of Business Intelligence Research. Dr Hill is the co-author of the book “Practical Text Analytics: Maximizing the Value of Text Data”.