1,263
Views
0
CrossRef citations to date
0
Altmetric
Theory and Methods

Crowdsourcing Utilizing Subgroup Structure of Latent Factor Modeling

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 1192-1204 | Received 31 Aug 2021, Accepted 02 Feb 2023, Published online: 16 Mar 2023
 

Abstract

Crowdsourcing has emerged as an alternative solution for collecting large scale labels. However, the majority of recruited workers are not domain experts, so their contributed labels could be noisy. In this article, we propose a two-stage model to predict the true labels for multicategory classification tasks in crowdsourcing. In the first stage, we fit the observed labels with a latent factor model and incorporate subgroup structures for both tasks and workers through a multi-centroid grouping penalty. Group-specific rotations are introduced to align workers with different task categories to solve multicategory crowdsourcing tasks. In the second stage, we propose a concordance-based approach to identify high-quality worker subgroups who are relied upon to assign labels to tasks. In theory, we show the estimation consistency of the latent factors and the prediction consistency of the proposed method. The simulation studies show that the proposed method outperforms the existing competitive methods, assuming the subgroup structures within tasks and workers. We also demonstrate the application of the proposed method to real world problems and show its superiority. Supplementary materials for this article are available online.

Supplementary Materials

The supplementary materials provide the methodology for binary crowdsourcing, simulation results for binary crowdsourcing, and proofs of theorems and corollaries.

Acknowledgments

The authors thank the Editor, Associate Editor, and the anonymous reviewers for their insightful suggestions and helpful feedback which improved the article significantly.

Disclosure Statement

The authors declare no financial or nonfinancial interest that has arisen from the direct applications of this research.

Notes

Additional information

Funding

This work is supported by NSF grants DMS 2210640, DMS 1952406, HK RGC grants GRF-11304520, GRF-11301521, GRF-11311022, and CUHK Startup grant 4937091.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.