850
Views
10
CrossRef citations to date
0
Altmetric
General Articles

Structural topic modelling segmentation: a segmentation method combining latent content and customer context

ORCID Icon, ORCID Icon & ORCID Icon
Pages 792-812 | Received 26 Mar 2020, Accepted 07 Dec 2020, Published online: 11 Feb 2021
 

ABSTRACT

This research introduces a method for segmenting customers using Structural Topic Modelling (STM), a text analysis tool capable of capturing topical content and topical prevalence differences across customers while incorporating metadata. This approach is particularly suitable for contexts in which textual data is either a critical component or is the only data available for segmentation. The ability to incorporate metadata by using STM provides better clustering solutions and supports richer segment profiles than can be produced with typical topic modelling approaches. We empirically illustrate the application of this method in two contexts: 1) a context in which related metadata is readily available; and 2) a context in which metadata is virtually non-existent. The second context exemplifies how ad-hoc generated metadata can increase the utility of the method for identifying distinct segments.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1. Note that even if an individual-level response is not needed, such data may still be valuable for ‘representative’ segment identification – e.g., characterising and understanding the proportion of the market talking about X; identifying topics that are commonly discussed, etc.

2. The data in our two empirical analysis examples, Empirical Example 1 and Empirical Example 2, contain, in fact, many topics that are correlated.

3. We also tested the complete linkage clustering method. The CPCC value was almost identical to Ward’s method, 0.66.

4. STM commonly employs ‘k’ to refer to the total number of topics, while clustering methods also employ ‘k’ to refer to the total number of clusters. To avoid any confusion, this study uses ‘k’ (lower case) to refer to the total number of topics and ‘K’ (upper case) to refer to the total number of clusters.

5. Note that showing the topic words for all the clusters would consume too much space.

6. Retweets were not removed and tweets from unique individuals were not combined. Instead, in this example we segmented Twitter entries. If desired, researchers could easily remove tweets from duplicated individuals or aggregate tweets at the individual level by employing the Twitter user’s screenname.

7. Note that standard text data pre-processing dropped two documents that contained only a link to a website in each of them.

8. The pattern of increasing CPCC values using the complete linkage method was consistent with that found using Ward’s method.

Additional information

Funding

No funding has been received.

Notes on contributors

Jorge E. Fresneda

Jorge E. Fresneda is an Assistant Professor of Digital Marketing and Marketing Analytics in the Martin Tuchman School of Management at New Jersey Institute of Technology. Dr Fresneda holds an MS in Applied Statistics from UNED, an MA in Marketing and Sales Management from EAE Business School, and a PhD in Marketing from Drexel University. His research explores several areas of digital marketing, such as the role of information influencing online consumers, social media marketing, or online accessibility. An important part of his research includes the development of methods to analyse unstructured data. His research has been published in the Journal of Consumer AffairsInformation and ManagementDecisions Support Systems, and Frontiers in Psychology. Dr Fresneda has also authored two book chapters of the book “Practical Text Analytics: Maximizing the Value of Text Data”.

Thomas A. Burnham

Thomas A. Burnham is an Assistant Professor of Marketing in the College of Business at the University of Nevada, Reno. He holds a BA in Managerial Studies from Rice University and a PhD in Marketing from the University of Texas at Austin. His research investigating consumer switching costs won the best article of the year award in the Journal of the Academy of Marketing Science. His current research explores firm learning from customer feedback and customer feedback metrics, motivations and types, with an emphasis on consumer suggestion sharing.

Chelsey H. Hill

Chelsey H. Hill is an Assistant Clinical Professor in the Decision Sciences and MIS Department of the LeBow College of Business at Drexel University. She holds a BA in Political Science from the College of New Jersey, an MS in Business Intelligence from Saint Joseph’s University, and a PhD in Business Administration with a concentration in Decision Sciences from Drexel University. Her research interests include consumer product recalls, online consumer reviews, safety and security, public policy, and humanitarian operations. Her research has been published in the Journal of Informetrics and the International Journal of Business Intelligence Research. Dr Hill is the co-author of the book “Practical Text Analytics: Maximizing the Value of Text Data”.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 222.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.