Publication Cover
Victims & Offenders
An International Journal of Evidence-based Research, Policy, and Practice
Volume 16, 2021 - Issue 4
812
Views
2
CrossRef citations to date
0
Altmetric
Original Articles

Forecasting Identity Theft Victims: Analyzing Characteristics and Preventive Actions through Machine Learning Approaches

ORCID Icon, & ORCID Icon
Pages 465-494 | Published online: 17 Aug 2020
 

ABSTRACT

Researchers in criminology and criminal justice have been making increasing use of the machine learning approach to investigate questions involving large amounts of digital data. We make use here of survey data on over 220,000 respondents drawn from three waves of the National Crime Victimization Survey Identity Theft Supplement (NCVS-ITS) conducted by the Bureau of Justice Statistics (BJS) in 2012, 2014, and in 2016. We use three distinct machine learning algorithms to analyze these data: 1) logistic regression; 2) decision tree; and, 3) random forest. We assess the efficacy of these approaches against these evaluative criteria: the overall percentage of correct classification, receiver operating characteristics (ROC), the area under the ROC curve (AUC), and feature criticality. Our findings indicate that the logistic regression algorithm performs best in predicting overall identity theft victimization, misuse of credit cards, misuse of financial accounts of other types, and the opening of new accounts; the random forest algorithm performs best in predicting misuse of checking/saving accounts. Our findings suggest that the respondent’s age, educational level, and online shopping frequency are significantly related to identity theft victimization. Additionally, frequently checking credit reports and changing passwords of financial accounts are strong predictors of identity theft victimization. We draw out the implications of our work for our collective understanding of identity theft, and for informing our judgment as to the potential utility of the use of machine learning approaches in criminology and criminal justice.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1. Notably, many outcomes of the current study suggest the highest predictive accuracy coming from the logistic regression algorithm, which is still not completely free from the various distributional and covariance assumptions of a regression model.

2. Notably, decision tree modeling and random forest modeling are identified as classifier processes, and they can both be used when a dependent variable is categorical or continuous.

3. In the field of machine learning, the overall percentage of correct classification is presented by a confusion matrix. Also known as an error matrix, the confusion matrix is a table that visualizes algorithm performance (Stehman, Citation1997). Each column shows the counts in a true label and each row shows the counts in a predicted label, and vice versa (Powers, Citation2011). For example, Figure 1 suggests that in the logistic regression model the number of correct classifications is 41,006 (40,989 + 17) and the number of wrongful classifications is 3,904 (3,891 + 13). Therefore, the overall percentage of correct classification is 91.31%.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 234.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.