123
Views
0
CrossRef citations to date
0
Altmetric
Articles

Investigating lay evaluations of models

ORCID Icon &
Pages 569-604 | Received 09 Sep 2020, Accepted 23 Oct 2021, Published online: 09 Nov 2021
 

Abstract

Many important decisions depend on unknown states of the world. Society is increasingly relying on statistical predictive models to make decisions in these cases. While predictive models are useful, previous research has documented that (a) individual decision makers distrust models and (b) people’s predictions are often worse than those of models. These findings indicate a lack of awareness of how to evaluate predictions generally. This includes concepts like the loss function used to aggregate errors or whether error is training error or generalisation error. To address this gap, we present three studies testing how lay people visually evaluate the predictive accuracy of models. We found that (a) participant judgements of prediction errors were more similar to absolute error than squared error (Study 1), (b) we did not detect a difference in participant reactions to training error versus generalisation error (Study 2), and (c) participants rated complex models as more accurate when comparing two models, but rated simple models as more accurate when shown single models in isolation (Study 3). When communicating about models, researchers should be aware that the public’s visual evaluation of models may disagree with their method of measuring errors and that many may fail to recognise overfitting.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 In the domain of predicting a company’s financial health, Libby (Citation1976) claimed to find that people actually outperformed models. However, Goldberg (Citation1976) reanalyzed the same data, and rather convincingly came to the opposite conclusion.

2 Forecasting often involves multiple cue variables, but for the sake of simplicity we refer only to the single cue variable case throughout.

3 Unless the cost of an error of a particular magnitude is externally dictated as in the case of misdiagnosing a patient leading to well defined monetary costs in the form of liability or expensive tests.

4 Our preregistered analysis did not include the participant random effect. The mixed effects model slightly changes the coefficients. In the non-random effects model the interaction term was significant (p = 0.03).

5 Our preregistered analysis did not include the participant random effects. The addition of the participant random effects alters the coefficients, generally by increasing their magnitude. The only change in statistical significance is one of the question intercepts.

6 This study was not pre-registered as we followed the well-known joint-separate evaluation paradigm.

7 Note this analysis was conducted on the level of the ratings rather than the participant level, meaning each participant in the joint evaluation condition contributed two data points.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 418.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.