1,078
Views
0
CrossRef citations to date
0
Altmetric
Research Article

WordPPR: A Researcher-Driven Computational Keyword Selection Method for Text Data Retrieval from Digital Media

, , ORCID Icon &

Figures & data

Figure 1. The workflow of the WordPPR method.

Figure 1. The workflow of the WordPPR method.

Table 1. The WordPPR Algorithm.

Figure 2. Accuracy of the frequency and exclusivity measures of the WordPPR algorithm for five document sample sizes (“m,” x-axis), three document lengths (“lambda,” colors and shapes), and three teleportation constants (“tau,” panel columns). Each point represents the average accuracy of including the top 25 words. The error bars indicate two times the standard deviation of accuracy across 30 repeated experiments.

Figure 2. Accuracy of the frequency and exclusivity measures of the WordPPR algorithm for five document sample sizes (“m,” x-axis), three document lengths (“lambda,” colors and shapes), and three teleportation constants (“tau,” panel columns). Each point represents the average accuracy of including the top 25 words. The error bars indicate two times the standard deviation of accuracy across 30 repeated experiments.

Figure 3. Comparison of different keyword ranking methods (TF, TFIDF, TextRank, entropy, DiscPower, and WordPPR) based on the ROC curve.

Figure 3. Comparison of different keyword ranking methods (TF, TFIDF, TextRank, entropy, DiscPower, and WordPPR) based on the ROC curve.

Table 2. Comparison of different keyword ranking methods based on the AUC measure (area under the ROC curve).

Table 3. Top 100 hashtags and bigrams related to #MeToo using the WordPPR algorithm: round one of data collection.

Table 4. Top 100 hashtags and bigrams related to #MeToo using the WordPPR algorithm: round two of data collection.

Data availability statement

The method and application code files as well as the supplementary materials are available at https://osf.io/pcybz/.