388
Views
6
CrossRef citations to date
0
Altmetric
Articles

Identifying Promising Items: The Use of Crowdsourcing in the Development of Assessment Instruments

, , &

References

  • Armstrong, A. W., Harskamp, C. T., Cheeney, S., Wu, J., & Schupp, C. W. (2012). Power of crowdsourcing: Novel methods of data collection in psoriasis and psoriatic arthritis. Journal of the American Academy of Dermatology, 67, 1273–1281.
  • Azzam, T., & Jacobson, M. R. (2013). Finding a comparison group: Is online crowdsourcing a viable option? American Journal of Evaluation, 34, 372–384.
  • Baker, F. B. (1985). The basics of item response theory. Portsmouth, NH: Heineman.
  • Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12, 387–415.
  • Barger, P., Behrend, T. S., Sharek, D. J., & Sinar, E. F. (2011). IO and the crowd: Frequently asked questions about using Mechanical Turk for research. The Industrial-Organizational Psychologist, 49, 11–17.
  • Behrend, T. S., Sharek, D. J., Meade, A. W., & Wiebe, E. N. (2011). The viability of crowdsourcing for survey research. Behavior Research Methods, 43, 800–813.
  • Birch, K. E., & Heffernan, K. J. (2014). Crowdsourcing for Clinical Research–An Evaluation of Maturity. Proceedings of the Seventh Australasian Workshop on Health Informatics and Knowledge Management (HIKM 2014), 3–11.
  • Brandt, M. J. (2011). Sexism and gender inequality across 57 societies. Psychological Science, 22, 1413–1418.
  • Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon's Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3–5.
  • Cai, L. (2008). SEM of another flavour: Two new applications of the supplemented EM algorithm. British Journal of Mathematical and Statistical Psychology, 61, 309–329.
  • Chandler, J., Mueller, P., & Paolacci, G. (2013). Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods, 1–19.
  • Cole, F., Sanik, K., DeCarlo, D., Finkelstein, A., Funkhouser, T., Rusinkiewicz, S., & Singh, M. (2009). How well do line drawings depict shape? In ACM Transactions on Graphics (TOG) 28 (p. 28). New York, NY: ACM.
  • DuVernet, A. M., Wright, N. A., Meade, A. W., Coughlin, C., & Kantrowitz, T. M. (2014). General mental ability as a source of differential functioning in personality scales. Organizational Research Methods, 17, 299–323.
  • Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861–874.
  • Gierl, M. J., & Ackerman, T. (1996). Software review: XCALIBRE™ Marginal Maximum-Likelihood Estimation Program, Windows™ Version 1.10. Applied Psychological Measurement, 20, 303–307.
  • Halloun, I. A., & Hestenes, D. (1985). The initial knowledge state of college physics students. American Journal of Physics, 53, 1043–1055.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
  • Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143, 29–36.
  • Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30, 141–158.
  • Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression. New York, NY: Wiley & Sons.
  • Hossain, M. (2012, May). Users' motivation to participate in online crowdsourcing platforms. In Innovation Management and Technology Research (ICIMTR), 2012 International Conference on (pp. 310–315). Washington, DC: IEEE.
  • Jesior, J. C., Filhol, A., & Tranqui, D. (1994). FOLDIT (LIGHT): An interactive program for Macintosh computers to analyze and display Protein Data Bank coordinate files. Journal of Applied Crystallography, 27, 1075.
  • Kazai, G., Kamps, J., & Milic-Frayling, N. (2013). An analysis of human factors and label accuracy in crowdsourcing relevance judgments. Information Retrieval, 16, 138–178.
  • Keating, M., Rhodes, B., & Richards, A. (2013). Crowdsourcing: A flexible method for innovation, data collection, and analysis in social science research. In J. Murphy, C. A. Hill, & E. Dean (Eds.), Social Media, Sociality, and Survey Research (pp. 179–201). Hoboken, NJ: Wiley.
  • Kim, A. E., Lieberman, A. J., & Dench, D. (2014). Crowdsourcing data collection of the retail tobacco environment: case study comparing data from crowdsourced workers to trained data collectors. Tobacco Control, 24, e6–e9. doi:10.1136/tobaccocontrol-2013-051298.
  • Kraut, R., Olson, J., Banaji, M., Bruckman, A., Cohen, J., & Couper, M. (2004). Psychological research online: Report of Board of Scientific Affairs' Advisory Group on the Conduct of Research on the Internet. American Psychologist, 59, 105.
  • Lintott, C. J., Schawinski, K., Slosar, A., Land, K., Bamford, S., Thomas, D.,…Vandenberg, J. (2008). Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Monthly Notices of the Royal Astronomical Society, 389, 1179–1189.
  • Lord, F. M., Novick, M. R., & Birnbaum, A. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
  • Lowry, C. S., & Fienen, M. N. (2013). CrowdHydrology: Crowdsourcing hydrologic data and engaging citizen scientists. Groundwater, 51, 151–156.
  • Muthén, B., Kao, C.-F., & Burstein, L. (1991). Instructional sensitivity in mathematics achievement test items: Applications of a new IRT-based detection technique. Journal of Educational Measurement, 28, 1–22.
  • National Research Council. (Ed.). (1996). National Science Education Standards. Washington, DC: National Academy Press.
  • Ogan-Bekiroglu, F. (2009). Assessing assessment: Examination of pre-service physics teachers' attitudes towards assessment and factors affecting their attitudes. International Journal of Science Education, 31, 1–39.
  • Oh, J., & Wang, G. (2012). Evaluating crowdsourcing through Amazon Mechanical Turk as a technique for conducting music perception experiments. In Proceedings of the 12th International Conference on Music Perception and Cognition.
  • Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S-X2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27, 289–298.
  • Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd ed., Monograph 161: Quantitative Applications in the Social Sciences). Thousand Oaks, CA: Sage.
  • Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5, 411–419.
  • Peer, E., Vosgerau, J., & Acquisti, A. (2013). Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior Research Methods, 46, 1023–1031.
  • Peterson, W. W., Birdsall, T. G., & Fox, W. (1954). The theory of signal detectability. Transactions of the IRE Professional Group on Information Theory, 4, 171–212.
  • Polikoff, M. S. (2010). Instructional sensitivity as a psychometric property of assessments. Educational Measurement: Issues and Practice, 29, 3–14.
  • Rand, D. G., Dreber, A., Ellingsen, T., Fudenberg, D., & Nowak, M. A. (2009). Positive interactions promote public cooperation. Science, 325, 1272–1275.
  • Rebello, N. S., & Zollman, D. A. (2004). The effect of distracters on student performance on the force concept inventory. American Journal of Physics, 72, 116–125.
  • Reckase, M. D. (1979). Unifactor latent trait models applied to multi-factor tests: Results and implications. Journal of Educational Statistics, 4, 207–230.
  • Reeve, B. B., Hays, R. D., Bjorner, J. B., Cook, K. F., Crane, P. K., Teresi, J. A.,…Cella, D. (2007). Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Medical Care, 45(5), S22–S31.
  • Ruiz-Primo, M. A., Shavelson, R. J., Hamilton, L., & Klein, S. (2002). On the evaluation of systemic education reform: Searching for instructional sensitivity. Journal of Research in Science Teaching, 39, 369–393.
  • Sabou, M., Bontcheva, K., & Scharl, A. (2012, September). Crowdsourcing research opportunities: Lessons from natural language processing. In Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies (p. 17). New York, NY: ACM.
  • Sadler, P. M. (1998). Psychometric models of student conceptions in science: Reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in Science Teaching, 35, 265–296.
  • Sadler, P., Coyle, H., Cook-Smith, N., Miller, J., Mintzes, J., Tanner, K., & Murray, J. (2013). Assessing the life science knowledge of students and teachers represented by the K-8 National Science Standards. CBE Life Science Education, 12, 553–575.
  • Sadler, P., Coyle, H., Miller, J., Cook-Smith, N., Dussault, M., & Gould, R. (2009). The Astronomy and Space Science Concept Inventory: Development and validation of an assessment instrument aligned with the National Standards. Astronomy Education Review, 8, 1–26.
  • Saunders, D. R., Bex, P. J., & Woods, R. L. (2013). Crowdsourcing a normative natural language dataset: a comparison of Amazon Mechanical Turk and in-lab data collection. Journal of Medical Internet Research, 15(5), e100.
  • Scagnelli, J. M. (2013). What the crowd yields: Considerations when crowdsourcing. Survey Practice, 6(3). Retrieved from http://surveypractice.org/index.php/SurveyPractice/article/view/240.
  • Smucker, M. D., & Jethani, C. P. (2011, July). The crowd vs. the lab: A comparison of crowdsourced and university laboratory participant behavior. In Proceedings of the SIGIR 2011 Workshop on Crowdsourcing for Information Retrieval, Beijing (Vol. 194). Retrieved from http://www.mansci.uwaterloo.ca/∼msmucker/publications/smucker-sigir-cir2011-crowd-vs-lab.pdf.
  • Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210.
  • Truell, A. D., Bartlett, J. E., & Alexander, M. W. (2002). Response rate, speed, and completeness: A comparison of Internet-based and mail surveys. Behavior Research Methods, Instruments, & Computers, 34, 46–49.
  • Trabin, T. E., & Weiss, D. J. (2014). The person response curve: Fit of individuals to item response theory models. In D. J. Weiss (Ed.), New horizons in testing (pp. 83–108). New York, NY: Academic Press.
  • van den Berg, S. M., Glas, C. A., & Boomsma, D. I. (2007). Variance decomposition using an IRT measurement model. Behavior Genetics, 37, 604–616.
  • Wood, C., Sullivan, B., Iliff, M., Fink, D., & Kelling, S. (2011). eBird: Engaging birders in science and conservation. PLoS Biology, 9(12), e1001220.
  • Woods, C. M., Cai, L., & Wang, M. (2013). The Langer-Improved Wald Test for DIF testing with multiple groups evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73, 532–547.
  • Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4, 223–233.
  • Zweig, M. H., & Campbell, G. (1993). Receiver-operating characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine. Clinical Chemistry, 39, 561–577.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.