123
Views
0
CrossRef citations to date
0
Altmetric
Articles

Investigating lay evaluations of models

ORCID Icon &
Pages 569-604 | Received 09 Sep 2020, Accepted 23 Oct 2021, Published online: 09 Nov 2021
 

Abstract

Many important decisions depend on unknown states of the world. Society is increasingly relying on statistical predictive models to make decisions in these cases. While predictive models are useful, previous research has documented that (a) individual decision makers distrust models and (b) people’s predictions are often worse than those of models. These findings indicate a lack of awareness of how to evaluate predictions generally. This includes concepts like the loss function used to aggregate errors or whether error is training error or generalisation error. To address this gap, we present three studies testing how lay people visually evaluate the predictive accuracy of models. We found that (a) participant judgements of prediction errors were more similar to absolute error than squared error (Study 1), (b) we did not detect a difference in participant reactions to training error versus generalisation error (Study 2), and (c) participants rated complex models as more accurate when comparing two models, but rated simple models as more accurate when shown single models in isolation (Study 3). When communicating about models, researchers should be aware that the public’s visual evaluation of models may disagree with their method of measuring errors and that many may fail to recognise overfitting.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 In the domain of predicting a company’s financial health, Libby (Citation1976) claimed to find that people actually outperformed models. However, Goldberg (Citation1976) reanalyzed the same data, and rather convincingly came to the opposite conclusion.

2 Forecasting often involves multiple cue variables, but for the sake of simplicity we refer only to the single cue variable case throughout.

3 Unless the cost of an error of a particular magnitude is externally dictated as in the case of misdiagnosing a patient leading to well defined monetary costs in the form of liability or expensive tests.

4 Our preregistered analysis did not include the participant random effect. The mixed effects model slightly changes the coefficients. In the non-random effects model the interaction term was significant (p = 0.03).

5 Our preregistered analysis did not include the participant random effects. The addition of the participant random effects alters the coefficients, generally by increasing their magnitude. The only change in statistical significance is one of the question intercepts.

6 This study was not pre-registered as we followed the well-known joint-separate evaluation paradigm.

7 Note this analysis was conducted on the level of the ratings rather than the participant level, meaning each participant in the joint evaluation condition contributed two data points.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.