331
Views
2
CrossRef citations to date
0
Altmetric
Articles

U-statistics for estimating performance metrics in forensic handwriting analysis

, &
Pages 1082-1117 | Received 17 Apr 2019, Accepted 09 Jan 2020, Published online: 24 Feb 2020
 

Abstract

A class of computationally efficient approximations to a set of natural U-statistics and related U-processes that arise in forensic and biometric comparisons have been developed. This paper details the asymptotic characterization of the natural U-statistics, the development of computationally efficient approximations and expected error bounds of said approximations. The developed statistical methods are presented in the context of forensic handwriting comparisons and are used to estimate the rate at which the random match probability (RMP) decreases as a function of the amount of writing available in the comparison samples. Although presented in a forensic handwriting comparison context, similar problems arise in machine learning or pattern recognition where the task is to classify a batch of exchangeable objects that arise from the same class. Similarly, the developed methods can be used to estimate the rate at which the error rate decreases as the number of objects to be classified increases.

Acknowledgements

The authors would like to acknowledge the computational support of Sciometrics; the FBI Laboratory for supplying the set of handwritten documents; and for the reviews and comments from Dr. Danica Ommen of Iowa State University. The authors would like to thank the Journal of Forensic Document Examination and Dr. Carolyne Bird for the permission to use and replicate their graph. This is publication number 19-03 of the Laboratory Division of the FBI. Names of commercial products are provided for identification purposes only, and inclusion does not imply endorsement of the manufacturer, or its products or services by the FBI. The views expressed are those of the authors and do not necessarily reflect the official policy or position of the FBI or the U.S. Government. The opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect those of the US Department of Justice.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1 This includes natural writing from a writer, e.g. notes, letters, grocery lists or diary entries.

2 This includes everything an individual has ever written and could possibly write.

3 Throughout this paper we use writing samples and documents interchangeably.

4 Throughout this paper we use similarity scores, where the closer the score is to 1, the more similar the two writing samples are; the closer the score is to 0, the less similar the two writing samples are. The use of a dissimilarity score will also work in place of the similarity score from this paper, with a few modifications.

5 These are analogous to false positives and false negatives, respectively.

6 This is the most simple form of DNA type evidence. For more complex forms and issues relating to drop-out and drop-in, see Balding [Citation10].

7 Generally refers to the relevant population the samples were collected from, where one would want to know the commonalities and errors of that specific population.

8 These are letters (upper and lowercase), numbers, and characters such as $ and &.

9 For a discussion on the meaning of common source and on the difference between common source and specific source in forensic evidence interpretation, see Ommen and Saunders [Citation26].

10 We assume that the population of writers is so large that we can treat the sampled writers as from a simple random sample with respect to some distribution on the relevant population of writers, i.e. there is no need for a finite population correction.

11 In other words, each writing sample is assumed to be randomly generated from that writer's writing profile.

12 For notational convention, we will use hats above complete U-statistics, and tildes above incomplete U-statistics.

13 Note that the first term in (2) involving σc2 dominates Var(θˆN) for large values of N, as the second term quickly converges to zero.

14 An incomplete U-statistic is one that does not calculate over all possible combinations of samples. A complete U-statistic does calculate over all possible combinations of samples.

15 The distribution of Kθ~N(K)(n) given {Di}i=1N is binomial with probability of success θˆN(n) and K trials.

16 K = 9 was chosen to restrict computational time down to three days.

17 In this setting, the ECDF is a function that gives the proportion of score values (in the set of observed scores) that are less than or equal to some threshold.

18 Using a threshold of 0.01 is conservative in this application, particularly for writing samples with fewer words.

19 This threshold corresponds to a rate of non-match errors of at most 1%, which was suggested by the results in the previous sub-section.

20 It is not possible to accurately estimate very small RMP with K = 9, and so we considered common lengths of writing samples to be at most 100. Larger lengths of writing samples will result in very few or zero matches. K can be increased to accommodate larger lengths of writing samples, given computational time and resources are available.

21 As is clear from the form of the standard error of the U-statistic, the variance of the conditional match probability is proportional to the standard error of the estimated RMP.

22 This is not suggesting that the best estimator of the standard error of this data is the combination of fitted logistic models for the RMP and TMP. The estimator discussed here is to show one approach to estimating the standard error. See 1.3.4 for a more detailed discussion of this issue.

23 Monte Carlo simulation generates samples to estimate properties of a distribution.

24 This is to prevent the exact same content being compared, as that would induce bias and increase the chance of RMP.

25 The values of K and the choices of writing sample length are the same as the RMP application in 2.1.2.

26 600 was chosen because that is approximately how many calculations that could be performed in two days using available computational resources.

27 Some documents may contain less than 128 words due to writer mistakes or processing errors, these documents (and the corresponding comparison) would be skipped when included in a comparison for which they did not possess enough words.

Additional information

Funding

The preliminary work was supported in part under a Contract Award from the Counterterrorism and Forensic Science Research Unit of the Federal Bureau of Investigation Laboratory Division. Preliminary aspects of this research was supported in part by Awards No. 2009-DN-BX-K234 and 2014-IJ-CX-K088 awarded by the National Institute of Justice, Office of Justice Programs, US Department of Justice.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 1,209.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.