Browse
We’re here to help

Find guidance on Author Services

Search
Browse
We’re here to help

Find guidance on Author Services

Home
All Journals
Journal of Business Analytics
List of Issues
Volume 3, Issue 2
Assessing text mining algorithm outcomes

Search in:

Advanced search

Journal of Business Analytics Volume 3, 2020 - Issue 2

Submit an article Journal homepage

528

Views

CrossRef citations to date

Altmetric

Original Article

Assessing text mining algorithm outcomes

Triss Ashtona Department of Management, Tarleton State University, Stephenville, TX, USACorrespondence[email protected]

https://orcid.org/0000-0002-5473-1461

Nicholas Evangelopoulosb University of North Texas, Denton, TX, USA

Audhesh Paswanb University of North Texas, Denton, TX, USA

Victor R. Prybutokb University of North Texas, Denton, TX, USA

https://orcid.org/0000-0003-3810-9039

Robert Pavurb University of North Texas, Denton, TX, USA

Pages 107-121 | Received 17 Mar 2020, Accepted 16 Jun 2020, Published online: 25 Jun 2020

Cite this article
https://doi.org/10.1080/2573234X.2020.1785342
CrossMark

Sample our Economics, Finance,Business & Industry journals, sign in here to start your access, latest two full volumes FREE to you for 14 days

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/2573234X.2020.1785342?needAccess=true

ABSTRACT

There is a surge in the development of decision-oriented analysis tools intended to extract actionable information from text. These tools integrate various text-mining methods that were performance tested in a manner that was often biased toward the new system. Those tests primarily utilised descriptive measurement criteria and test datasets that are inconsistent with most business corpora. We propose and test a user-oriented judgment approach that allows testing under controlled customer-oriented corpora and generates effect size measures. To illustrate the approach, customer relations data was analysed by latent semantic analysis and latent Dirichlet analysis with results evaluated by prospective business analysts. Reporting includes comparisons of results with published literature. While the research centres on the context-region text-mining systems, literature comparisons include word-embedding methods. The analysis concludes that none of the systems reviewed possess a repeatable statistical advantage over the others. Instead, distribution attributes, algorithm configuration, and the evaluation task drive results.

KEYWORDS:

Text mining
algorithm testing
model development
latent semantic analysis
latent Dirichlet allocation

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1. Kakkonen et al. (Citation2005, Citation2006) data had 2 large effects and 11 medium effect sizes that are suspect because the samples were exceptionally small (n < 150).

2. Hofmann (Citation2001) reports a version of 1999a.

3. Hofmann (Citation1999a) also reports and compares to cos+tf and cos+tfidf baselines which are not included here. Further, comparisons among the pLSA variants is possible, but were not computed.

4. Xu et al. (Citation2003) also report mutual information.

5. Note that while we applying (1) and report p-values, those results alone are inappropriate because the sample sizes are 7,803 and 9,494. If the sample is reduced to 750 and assuming the same accuracy scores, the number of significant p-values is reduced to 39.

6. Effect size computations are not affected by sample size.

Kakkonen, T., Myller, N., Timonen, J., & Sutinen, E. (2005). Automatic essay grading with probabilistic latent semantic analysis. Proceedings of the 2nd workshop on building educational applications using NLP (pp. 29–36). Ann Arbor, MI.

Google Scholar

Kakkonen, T., Myller, N., & Sutinen, E. (2006). Applying latent Dirichlet allocation to automatic essay grading. Proceedings of the 5th international conference on natural language processing (pp. 110–120). Turku, Finland.

Google Scholar

Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1/2), 177–196. https://doi.org/10.1023/A:1007617005950

Web of Science ®Google Scholar

Hofmann, T. (1999a). Probabilistic latent semantic indexing. Proceedings of the twenty-second annual international SIGIR conference (pp. 50–57). New York, NY.

Google Scholar

Xu, W., Liu, X., & Gong, Y. (2003). Document clustering based on non-negative matrix factorization. Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval (pp. 267–273). Toronto, Canada.

Google Scholar

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Username Password

Forgot password?

Keep me logged in (not suitable for shared devices).

You will otherwise be logged out automatically, after a limited period, and will need to log in again.

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later Item saved, go to cart

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 61.00 Add to cart

PDF download + Online access - Online Checkout

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 187.00 Add to cart

Issue Purchase - Online Checkout

* Local tax will be added as applicable

Share icon
Back to Top

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Information for

Authors
R&D professionals
Editors
Librarians
Societies

Open access

Overview
Open journals
Open Select
Dove Medical Press
F1000Research

Opportunities

Reprints and e-prints
Advertising solutions
Accelerated publication
Corporate access solutions

Help and information

Help and contact
Newsroom
All journals
Books

Keep up to date

Sign me up

Taylor and Francis Group Facebook page

Taylor and Francis Group X Twitter page

Taylor and Francis Group Linkedin page

Taylor and Francis Group Youtube page

Taylor and Francis Group Weibo page

Registered in England & Wales No. 3099067
5 Howick Place | London | SW1P 1WG

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research