171
Views
1
CrossRef citations to date
0
Altmetric
Article

Using shrinkage for data-driven automated valuation model specification – a case study from Berlin

, ORCID Icon &
Pages 130-153 | Received 03 Mar 2020, Accepted 16 Mar 2021, Published online: 13 Apr 2021
 

ABSTRACT

We study whether data-driven AVM specification that combines a flexible-yet-simple regression model with shrinkage estimators considerably improves upon the prediction accuracy of a conventional hedonic model. A rolling window prediction comparison based on all condominium sales in Berlin, Germany, between 1996 and 2013 delivered the following results. The highly parameterised model can result in extreme errors if the flexible model, which employs roughly 3,800 variables, is estimated by OLS and even if shrinkage is applied via Ridge regression. Once the most extreme errors are disregarded, Ridge regression appears as the clear winner of the prediction comparison. It is the only procedure that delivers a considerable reduction in the root mean squared prediction error relative to a parsimonious benchmark model (estimated via OLS). Of the two procedures with variable selection capability, Elastic Net delivers a slightly better prediction performance. Lasso, on the other hand, acts considerably more as a selector and typically sets the bulk of the several thousand coefficients to zero. Both procedures largely agree in terms of which characteristics they frequently select: core characteristics of hedonic pricing such as floor space, building age and location dummies.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

2. By contrast, the model in Schulz et al. (Citation2014) had less than 30 coefficients.

3. Even if Lasso is not considered suitable in the context of prediction it clearly bears the potential to help finding a decent specification in the modelling stage.

4. We use the sales price as the dependent variable. Often, hedonic regressions are fit to log prices because they tend to be roughly normally distributed and homoscedastic, allowing to apply standard inference procedures. The focus of this paper, however, is prediction rather than statistical inference about effects of property characteristics. Moreover, in the case of prediction, fitting the regression models to log prices entails an additional re-transformation step from the log scale to the original scale with its own intricacies. We estimated all our model employing both, prices and log prices, and found no qualitative differences in the results. Hence, we present only the results for sales prices.

5. The number of bedrooms can, like other count variables in the data, of course only take on countably many integer values and is only ‘quasi-continuous’.

6. The models underlying other machine learning methods not considered in this paper, such as a one-layer Neural Network or Regression Trees, can also be written in the form of a coefficient-weighted sum of transformed characteristics.

7. Box and Cox, 1964.

8. The remaining six continuous features of the building or complex (number of all units, number of all residential units, number of commercial units, size of all units, size of all commercial units, number of auxiliary areas) are entered linearly only and are not interacted with any other right-hand side variable to ease the computational burden.

9. To the contrary, tθ1=0.5(Xk)tθ0.5(Xk)=space0.5×space0.5=space1 is not a proper product as both factors use the same transformation parameter, i.e. θ1=θ2=0.5.

10. The dummy variable for the reference district Mitte is left out.

11. This, again, is an upper limit because certain characteristics never occur simultaneously in the data, some of these binary interactions are zero for all observations and consequently dropped in the estimation stage.

12. Centring the variables also allows to estimate the intercept separately by the sample average of the prices in an observation window. See Hastie et al. (Citation2009), p. 64. β0 is therefore omitted in the following discussion of the estimation procedures.

13. For finding the optimal λ, we employ the random search algorithm proposed by Bergstra and Bengio (Citation2012). Their approach of fitting hyperparameters turns out to be superior over classical grid search algorithms in terms of efficiency. Unlike grid search algorithms, random search draws a set of trials {λ1λS} from a uniform density which has the same configuration space as a regular grid would have. For details, see either Bergstra and Bengio (Citation2012) or Python’s scikit-learn documentation (Pedregosa et al., Citation2011). Optimal λ, is then, chosen by 5-fold cross-validation.

14. We thereby assume that typically no data is available on transactions in the current quarter in which valuations are formed because of the rather protracted process of finalising a property transaction. The assumed one-quarter delay in data processing is too optimistic for the administrative data collection procedure behind our data where contracts are entered into the transaction database with an average delay of two quarters.

15. See, for instance, Mayer et al. (Citation2019) and Bogin and Shui (Citation2020).

16. See Hastie et al. (Citation2009), p. 140 and Wood (Citation2017), p.162.

17. It should be pointed out, though, that even in the parsimonious hedonic benchmark model without polynomials some large errors occur (as shown by the reported minimum and maximum in the last row of ). This may not be exclusively driven by the simplicity of this benchmark model but may also partially stem from the heterogeneity of apartments in Berlin with its turbulent history, making mass appraisal a difficult task.

18. Such procedures raise the intricate issue of knot selection and positioning. See, for instance, Wood (Citation2017).

19. They clearly stand out and exceed the ninth largest absolute error by a factor of at least 1.4. These cases can easily be detected after predictions have been calculated, because they result in extremely large or small predictions. It is, however, not so easy to spot them a-priori, i.e. just on the basis of their characteristics. We calculated their Mahalanobis distance from the centre of the data with respect to the eight continuous characteristics we entered as polynomials (using a robust estimator of the variance-covariance matrix). Their Mahalanobis distances from the centre of these eight properties vary between 5.4 and 16.9. These are relatively large distances but all within the 90th percentile of the sample distribution of all Mahalanobis distances.

20. Again, as pointed out above, the 4,090 parameters mentioned above in the presentation of the flexible-yet-linear model is a theoretical upper limit of the number of parameters. Even the flexible OLS model typically estimates only about 3,800 coefficients in each window.

21. Because of the interaction terms, there is `double counting’ and the sum of the height of the bars exceeds the actual total number of nonzero coefficients.

Additional information

Funding

This work was supported by the Deutsche Forschungsgemeinschaft [Research Unit 2569].

Notes on contributors

Nils Hinrichs

Nils Hinrichs is a research associate at the Institute for Computer-assisted Cardiovascular Medicine at the Charité–Universitätsmedizin Berlin. He reveiced his Master's Degree in Statistics from the Humboldt University of Berlin in 2018 and worked as a Data Scientist at Delivery Hero SE until 2020. His main area of scientific interest is applied Machine Learning. He is currently pursuing a PhD in medical Data Science, working on a data-driven decision support system for telemedical care of heart failure patients.

Jens Kolbe

Jens Kolbe is Research Associate at the Chair of Business Statistics and Econometrics at the Technische Universität Berlin. His research interests are the housing markets and related fields of urban economics. He received his PhD in Juli 2017 from Technische Universität Berlin. In his dissertation, he empirically worked on different topics of urban economics focusing on the integration of spatial data into econometric modelling. He has published in Empirical Economics, Landscape and Urban Planning and Ecological Economics.

Axel Werwatz

Axel Werwatz is Professor of Business Statistics and Econometrics at Technische Universität Berlin. He previously held a similar position at the University of Potsdam and was head of a research department at the German Institute for Economic Research (DIW Berlin) where he still is a Research Affiliate. He received his habilitation in econometrics from Humboldt Universität zu Berlin and a Ph.D. in economics from the University of Iowa. He has published in the Journal of Labor Economics, the Journal of Urban Economics, Nature Communications and the American Journal of Epidemiology.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.