Views

CrossRef citations to date

Altmetric

Original Article

Assessing text mining algorithm outcomes

Triss Ashtona Department of Management, Tarleton State University, Stephenville, TX, USACorrespondence[email protected]

https://orcid.org/0000-0002-5473-1461

Nicholas Evangelopoulosb University of North Texas, Denton, TX, USA

Audhesh Paswanb University of North Texas, Denton, TX, USA

Victor R. Prybutokb University of North Texas, Denton, TX, USA

https://orcid.org/0000-0003-3810-9039

Robert Pavurb University of North Texas, Denton, TX, USA

ABSTRACT

There is a surge in the development of decision-oriented analysis tools intended to extract actionable information from text. These tools integrate various text-mining methods that were performance tested in a manner that was often biased toward the new system. Those tests primarily utilised descriptive measurement criteria and test datasets that are inconsistent with most business corpora. We propose and test a user-oriented judgment approach that allows testing under controlled customer-oriented corpora and generates effect size measures. To illustrate the approach, customer relations data was analysed by latent semantic analysis and latent Dirichlet analysis with results evaluated by prospective business analysts. Reporting includes comparisons of results with published literature. While the research centres on the context-region text-mining systems, literature comparisons include word-embedding methods. The analysis concludes that none of the systems reviewed possess a repeatable statistical advantage over the others. Instead, distribution attributes, algorithm configuration, and the evaluation task drive results.

KEYWORDS:

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1. Kakkonen et al. (Citation2005, Citation2006) data had 2 large effects and 11 medium effect sizes that are suspect because the samples were exceptionally small (n < 150).

2. Hofmann (Citation2001) reports a version of 1999a.

3. Hofmann (Citation1999a) also reports and compares to cos+tf and cos+tfidf baselines which are not included here. Further, comparisons among the pLSA variants is possible, but were not computed.

4. Xu et al. (Citation2003) also report mutual information.

5. Note that while we applying (1) and report p-values, those results alone are inappropriate because the sample sizes are 7,803 and 9,494. If the sample is reduced to 750 and assuming the same accuracy scores, the number of significant p-values is reduced to 39.

6. Effect size computations are not affected by sample size.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Assessing text mining algorithm outcomes

Information for

Open access

Opportunities

Help and information

Assessing text mining algorithm outcomes

ABSTRACT

Disclosure statement

Notes

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature