Abstract
The hedonic price model is a popular method to estimate the implicit prices of observed attributes of a property. However, the inputs to the model are only numerically quantified information. This study quantifies the unstructured qualitative statements contained in the written descriptions from the Multiple Listing Service (MLS) data. These statements contain unstructured text describing the features and setting of the house, providing important but typically unused qualitative information. Our approach is unique in that we use the qualitative information to classify these words into eight groups that reflect previously unmeasured housing quality. The purpose of the study is to test whether these previously unmeasured attributes of the property have an impact on the selling price of the property and its time on the market. The dataset consists of 5,160 home sales in Ames, Iowa between the second quarter of 2003 and the second quarter of 2015. Our findings show that the role of unstructured qualitative text varies; some are redundant to the quantitative information already in the models and have no effect, while others, particularly those reflecting the quality of the structure, represent unique information and are important predictors in determining housing prices and the time on market.
Notes
Notes
1 The assumption allows us to decompose a review with (E, O) word pairs. First, identify an E word in the review and match to an O word near to an E word. In this frame, all reviews can be separated into a set of bi-term word pairs. Within each word pair, the E word is used for scoring and the O word is used to identify topics.
2 This allocation can be relaxed to incorporate a supervised learning procedure such that we first provide some seed bi-terms and then match the remaining bi-terms to the seed bi-terms using a learned machine or online dictionary. This is explored in Im et al. (2019). In this paper, however, the focus is on the utility of the bi-term construction and interpretation. Manual classification was a feasible approach here given the size of the market studied.
3 The absorption rate is calculated by Flexmls that covers the entirety of Story County, Iowa.
4 Since Ames is a college town, this may reflect the high number of Bachelor’s degrees (graduate students) in lower priced neighborhoods dominated by students.