ABSTRACT
Point estimates from Automated Valuation Models (AVMs) represent the most likely value from a distribution of possible values. The uncertainty in the point estimate – the width of the range of possible values at a given level of confidence – is a critical piece of the AVM output, especially in collateral and transactional situations. Estimating AVM uncertainty, however, remains highly unstandardised in both terminology and methods. In this paper, we present and compare two of the most common approaches to estimating AVM uncertainty – model-based and error-based prediction intervals. We also present a uniform language and framework for evaluating the calibration and efficiency of uncertainty estimates. Based on empirical tests on a large, longitudinal dataset of home sales, we show that model-based approaches outperform error-based ones in all but cases with very highest confidence level requirements. The differences between the two methods are conditioned on model class, geographic data partitions and data filtering conditions.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
1. As opposed to uncertainty around measurement (International Organization for Standardization [ISO], Citation2008)[R3.2]
2. See Appendix A for the complete list
3. Corelogic (Citation2011, Citation2017) presents a possible third approach; one based on information quality, noting that their uncertainty measure is [a] ‘range of estimates based on consistency of information’. As construction of this metric is proprietary we do not consider this approach in this paper
4. We use a robust linear model to create a house price index and then adjust all sales to a single point in time (the most recent month in the data). We use the hpiR R package to create the house price indexing (Krause, Citation2020).
5. It should be noted that this approach is a considerably more complex approach to prediction intervals than the standard linear model extension common in textbooks, whereby the distributions of parameter estimates and model standard errors are combined leveraged to produce prediction intervals.
6. See https://github.com/andykrause/kingCoData for instructions on how to access this data as well as details on its construction.
Additional information
Notes on contributors
A. Krause
A. Krause is the Manager of Applied Science for Zillow’s Home Valuations (Zestimate) team, under the larger Artificial Intelligence unit. He has a PhD from the University of Washington in Urban Planning and has worked in both industry and was a lecturer at the University of Melbourne in the Masters of Property program.
A. Martin
A. Martin is the Senior Manager of Automated Valuation and Machine Learning for Zillow Offers. He has a Master’s of Science in Statistics from Stanford University. Andrew has been working at the intersection of machine learning and home valuation since 2014.
M. Fix
M. Fix is a Principal Applied Scientist on the Zillow Offers Home Valuation Team. He has a Masters of Science in Statistics from the University of Washington. Matthew has worked at Zillow Group since 2013, where he focuses on building new models aimed at improving pricing accuracy.