1,559
Views
1
CrossRef citations to date
0
Altmetric
Construction Management

Classifying apartment defect repair tasks in South Korea: a machine learning approach

, , &
Pages 2503-2510 | Received 23 Mar 2021, Accepted 18 Aug 2021, Published online: 19 Oct 2021

ABSTRACT

Managing building defects in the residential environment is an important social issue in South Korea. Therefore, most South Korean construction companies devote a large amount of human resources and economic costs in managing such defects. This paper proposes a machine learning approach for investigating whether a specific defect can be autonomously categorized into one of the categories of repair tasks. To this end, we employed a dataset of 310,044 defect cases (from 656,266 validated cases of 717,550 total collected cases). Three machine learning classifiers (support vector machine, random forest, and logistic regression) with three word embedding methods (bag-of-words, term frequency-inverse document frequency, and Word2Vec) were employed for the classification tasks. The highest yielded results showed more than 99% accuracy, precision, recall, and F1-scores for the random forest classifier with the Word2Vec embedding. Finally, based on these findings, the implications and limitations of this study are discussed. Representatively, the findings of this research can improve the defect management effectiveness of the apartment construction industry in South Korea. Moreover, to contribute to future research, we have made the dataset publicly available.

1. Introduction

With the exponential growth of the population in urban areas, new residential trends have rapidly developed (Raslanas, Alchimovienė, and Banaitienė Citation2011). Such trends include apartments, which are buildings with five or more floors to accommodate multiple households in a single residential structure. Apartments are now considered to be one of the primary residential types in South Korea (Watt Citation2009; The Housing Policy Division in South Korea Citation2018). In 2018, the Ministry of Land, Infrastructure, and Transport in South Korea showed that apartments accounted for more than 50% of the employed residential types (; Kang Citation2020).

Figure 1. Residential types in South Korea, 2019.

Figure 1. Residential types in South Korea, 2019.

Because of the increasing demand for apartments in several provinces, a pre-sale system has been used, which allows potential residents to make reservations for housing accommodation via a specialized contract before the apartments are built (Shi et al. Citation2015). Although there are several benefits of the pre-sale system for apartment stakeholders, there can also be a number of drawbacks and conflicts for residents (Tabatabai Citation2017). The number of conflict cases, which were addressed by a government organization, increased from 69 in 2010 to 4,290 in 2019 (; The Housing Construction Supply Division in South Korea Citation2013).

Figure 2. Number of apartment defect conflicts registered.

Figure 2. Number of apartment defect conflicts registered.

Because of the rapidly increasing number of conflicts, the South Korean government has enacted several specialized laws to address these conflicts by considering residents’ rights (The Housing Construction Supply Division in South Korea Citation2018). For instance, the Multi-Family Housing Management Act aims to provide various functions for “multi-family housing environments with transparency, safety, and efficiency by prescribing matters on the multi-family housing managements” (The Housing Construction Supply Division in South Korea Citation2018).

Most conflicts originate from the work related to addressing apartment defects (The Housing Construction Supply Division in South Korea Citation2018). Considering these acts, the government has administratively categorized such defect-related work into 19 subjective repair tasks, including finishing work, outdoor water supply, woodwork, and window work (The Housing Construction Supply Division in South Korea Citation2018).

Repair tasks are regarded as the final tasks for completing apartment construction for the provision of pre-arranged residential environments for potential residents (The Housing Construction Supply Division in South Korea Citation2018).

Several attempts have been made to reduce and optimize procedures by considering both technical and heuristic approaches. For instance, Kim et al. (Citation2011) collected a 10-year dataset of apartment defect repair cases and employed multiple regression analyses with discount rates to propose predictive models of apartment defect repair costs. Despite several findings on this topic, only a few studies have attempted to explore apartment defect repair tasks from the perspective of construction companies (Kwon et al. Citation2021). Furthermore, the dataset of apartment defect repair tasks has not received any attention from both academic and practical researchers. Therefore, this study investigates whether a specific defect can be autonomously categorized as one of the specified defect repair tasks. To this end, the study employs machine learning techniques as a promising tool.

The remainder of this paper is organized as follows. An overview of the literature is presented; subsequently, the study methodologies are described, and the results of the machine learning approaches are presented. Finally, both the implications and limitations of this study are discussed.

2. Literature review

2.1. Apartment defects and repair tasks

The widely employed definition of a defect in construction areas is “a flaw that causes problems associated with safety, function, and aesthetics of a building or facilities, which are presented by construction errors and mistakes” (Watt Citation2009). Because defects are critical issues in the field of construction, several prior studies have explored strategies to effectively manage defects using systematic approaches (Milion, Da Cl Alves, and Paliari Citation2017; Park et al. Citation2013). Park et al. (Citation2013) suggested a conceptual systematic framework for addressing construction defects. This framework employed a number of related information sets and building information modeling (BIM) contexts, as well as ontological and augmented reality applications. The framework was organized using three proactive data management components: data collection templates, domain ontology, and automatic inspection functions. Milion, Da Cl Alves, and Paliari (Citation2017) collected data from construction companies to examine the correlation between defects and apartment resident satisfaction. They found that 304 customers showed 88% satisfaction based on the features addressed in the survey (e.g., design, building quality, and technical assistance service). Moreover, they argued that companies should make significant efforts to effectively address apartment defects and improve the satisfaction of apartment residents.

Considering the findings of prior studies, which showed a strong relationship between apartment resident satisfaction and defects, several studies have attempted to provide efficient solutions for classifying each defect case into one of the subjective repair tasks (Kwon, Park, and Lim Citation2014; Lin, Chang, and Su Citation2016). Georgiou (Citation2010) proposed a defect classification system to verify and validate the defects of buildings in Australia by considering 12 defect types (e.g., damp, cracking, and workmanship). Based on the regulations governing the quality of buildings in Australia, several building defects were examined for building residents.

Macarulla et al. (Citation2013) proposed a defect classification system with 15 defect categories (e.g., affected functionality, inappropriate installation, and soiled) for the Spanish housing sector and helped construction stakeholders improve their productivity by reducing potential defects, which were strongly related to economic costs and time wastage.

Classifying the types of repair tasks is an important aspect to consider from a managerial viewpoint because it is directly associated with both the economic and social burdens of construction companies as well as their stakeholders. Lee, Lee, and Kim (Citation2018) employed loss distribution access approaches to estimate the potential risks of eight building defects. Seven types of repair tasks were then presented to reduce the levels of risk (e.g., reinforced concrete (RC), masonry, finishing, mechanical, electrical, and plumbing (MEP), doors and windows, furniture, and miscellaneous).

Although several studies have presented notable findings for defect management and repair tasks, it is still necessary to present this issue from the perspective of construction companies. South Korea is one of the few nations to operate defect management systems for apartments; therefore, the the current status of the systems in South Korea should be known.

2.2. Defect management system in South Korea

In South Korea, a significant number of complaints have been presented by residents regarding the gap between their expectations and the actual apartments (The Housing Construction Supply Division in South Korea Citation2013). Each apartment construction company has built a unique defect management system (Kim et al. Citation2008). shows a representative work procedure. After a resident finds a specific defect in their apartment, they report the specifications of the defect case to a customer service center, which is operated by the apartment construction company. A staff member at the service center documents the specifications using the in-house data management system, which includes a set of basic defect information (e.g., received date, defect location, buildings, defect properties, estimated cause of the defect, estimated repair date, and actual repair date). The staff member then analyzes and assigns each defect case to one of the subjective repair task categories. After these steps, each case is delivered to one of the experts who specialize in each type of subjective repair task. Thus, the overall procedure requires a large amount of work by staff and a long period of time.

Figure 3. Work process of the current defect management systems.

Figure 3. Work process of the current defect management systems.

Considering the current procedures of the defect management systems, several studies have analyzed high-cost repair tasks. For instance, Kim, Ahn, and Lee (Citation2019) analyzed the defect patterns of work types, defect types, and locations during the warranty period of residential buildings in South Korea. In addition, Park and Seo (Citation2017) collected data of South Korean apartment complexes and confirmed a significant association between the number of reported defects and the frequency of finishing work. In regard to finishing work, Park, Ahn, and Lee (Citation2018) constructively investigated the pattern of the maintenance plans of each component of the finishing work based on an analysis of annual maintenance frequency data.

Based on the findings of prior studies, investigating, presenting, and classifying the types of defect repair tasks into one of the specific categories can effectively and significantly reduce both economic costs and human resource utilization (Brodetskaia, Sacks, and Shapira Citation2011).

Previous studies have only suggested methods for effectively classifying and managing defect repair tasks. Thus, this study further proposes machine learning techniques to effectively classify whether a defect case is included in one of the repair tasks. The techniques can lead to quicker and more efficient defect management systems.

3. Method

3.1. Data collection

We collected data of residents’ complaints regarding their apartment defects, which were delivered to one of the private construction companies in South Korea. The company had provided apartments to more than 25,000 households in 2019. We gathered 717,550 defect cases from 19 apartment complex sites that were completed in 2017 and 2018 (), which were categorized into 17 repair tasks (). We excluded 61,284 (8.54%) cases that did not present defect descriptions or repair work types. The remaining 656,266 cases with defect descriptions were considered as the dataset used in this study (). The overall procedure is shown in .

Figure 4. Defect frequency and rate of each work type.

Figure 4. Defect frequency and rate of each work type.

Figure 5. Research methodology and procedures.

Figure 5. Research methodology and procedures.

Table 1. Overview of data collection.

Table 2. Defect frequency and ratio of each work type.

3.2. Pre-processing

The following steps were conducted as part of the data pre-processing procedures.

  • First, we choose to use text descriptions containing over 30 words.

  • We then removed punctuation marks, extra space characters, and single consonants or vowels.

  • We tokenized all defect descriptions.

  • We selected only nouns by using the Okt module from the konlpy library (Park and Cho Citation2014).

  • Finally, we removed missing values by using the data analysis library PandasFootnote1 from Python. We therefore obtained 310,044 defect descriptions.

3.3. Word embedding

To effectively train the machine learning classifiers, we employed three word embedding methods on the pre-processed text into a matrix: bag-of-words (BoW), term frequency-inverse document frequency (TF-IDF), and Word2Vec.

BoW considers only the appearance and does not consider the order or relationship between each token in the text data. Here, BoW was calculated by converting each token in the documents into its frequency value. We generated the BoW using the CountVectorizer module in the Python library scikit-learn.Footnote2 The vocabulary size was set to 5,000.

TF-IDF is a feature representation method that represents how frequently the used words appear in particular documents rather than in a collection of documents (Joachims Citation1997). The TF-IDF was calculated by multiplying the term frequency (TF), which indicates how often a particular word turns up in a document, with the inverse document frequency (IDF), which is the reciprocal of the appearances in a collection of documents. Feature representation using TF-IDF was obtained through the scikit-learn package with a vocabulary of 5,000 words.Footnote3

Word2Vec is an embedding method that transforms words into n-dimensional dense vectors to represent similarities among words (Mikolov et al. Citation2013). We used the Word2Vec module from the Gensim library.Footnote4 The dimensions of the word vector, window size, and minimum frequency for each word were set to 200, 10, and 10, respectively. We trained the Word2Vec model using a continuous bag of words (CBOW) algorithm. After training the model, we represented each defect description as a single vector by averaging the word vectors.

3.4. Classification model

To detect and classify the defect description for each repair task effectively, we employed three classification models: support vector machine (SVM), logistic regression(LR), and random forest (RF). To implement these models, we utilized the LinearSVC, LogisticRegression, and RandomForestClassifier modules from the scikit-learn library.

The SVM classification model separates data from different classes by finding a super plane, the optimal linear decision boundary, with the largest margin. We trained the model to minimize the squared hinge loss and applied the L2 penalty (Cortes, Mohri, and Rostamizadeh Citation2009) to prevent overfitting. In addition, the parameter C was set to 1.0.

LR is a statistical model that calculates the probability of the data belonging to a particular class. In our study, binary classification was performed by setting a threshold value of 0.5. Moreover, we set the parameter C to 1.0. The SAGA solver was used, and the L2 penalty was used to prevent overfitting.

The RF model is an ensemble model that utilizes multiple decision trees as a classification model. We trained the model to reduce the Gini impurity using a bootstrapping technique with 100 decision trees.

In addition, to prevent overfitting to the training samples, we set the maximum height of each tree to 3.0. We validated the three classifiers using five-fold cross-validation procedures. Each fold consisted of 62,008 training and 248,036 testing datasets.

4. Results

We performed multiclass classification and evaluated the results. The true positive, false negative, true negative, and false positive are represented as TP, FN, TN, and FP, respectively. We validated the employed machine learning classifiers using four evaluation metrics: accuracy (EquationEquation (1)), precision (EquationEquation (2)), recall (EquationEquation (3)), and F1-score (EquationEquation (4)).

(1) Accuracy=TP+TNTP+FN+TN+FP(1)
(2) Precision=TPTP+FP(2)
(3) Recall=TPTP+FN(3)
(4) F1score=2×Precision×RecallPrecision+Recall(4)

and summarize the model performances. The RF classifier with the Word2Vec feature representation performed the best in terms of accuracy (99.13%), precision (99.28%), recall score (99.28%), and F1-score (99.27%).

Figure 6. Results of classification models.

Figure 6. Results of classification models.

Table 3. Performance metrics of classification models.

5. Discussion

A rapid increase in the number of customer conflict cases for apartments is highly related to defects (Consumer News, Citation2021). Construction companies bear potential risks and require time-consuming procedures for classifying a specific defect into one of the repair tasks. Moreover, in the case of South Korea, the warranty period of defects in residential buildings is up to 10 years, which is specified in the Multi-Family Housing Management Act.

In previous studies (Park et al. Citation2013), we highlighted the complex procedures involved in managing defects. We now suggest a promising and unique approach for classifying defect descriptions into repair task types. Because the suggested approaches and machine learning models showed significant accuracy in the classification task, the models can be utilized as a defect management system to reduce both economic costs and human resource utilization in construction companies. Moreover, these companies can employ the proposed approaches as supplementary tools for accurately and efficiently assigning each task to experts in one of the repair task domains.

6. Conclusions

We employed machine learning approaches to envision whether a text-based defect description, written by a Korean, can be autonomously classified into specific repair work types. To address this issue, we collected defect description data from one of the largest South Korean construction companies using an in-house data management system. To extract numerical features from Korean texts, we employed three different word-embedding techniques: BoW, TF-IDF, and Word2Vec. When using the Word2Vec and RF classifiers, the highest accuracy (99.13%) was achieved with five-fold validation procedures. As we employed machine learning models to classify the repair tasks, this study provides several theoretical implications as follows.

  • We showed the potential of a data-driven approach based on machine learning classifiers in the field of construction engineering.

  • The classifiers of this study can be applied to unrefined texts, which are typically short and consist of abbreviations.

In addition to the theoretical implications, we provide valuable insights to both practitioners and researchers in related fields. The high-performing classifier can be applied to defect management systems to significantly reduce time, cost, and resources. For instance, both speech recognition techniques and classifiers can be used by construction engineers and customer service centers to effectively analyze and classify high-cost repair tasks.

6.1. Limitations and future studies

Although this study has several practical and theoretical implications, it has a few limitations. In this study, we used machine-learning techniques on repair tasks. This means that the approaches employed in this study can be improved to address multiple repair tasks (e.g., water facility construction, window work, woodwork, and HVAC equipment construction). Moreover, because there are other significant tasks that are categorized as finishing work (e.g., plastering, painting, papering, tiling, indoor masonry, indoor furniture work, kitchen utility work, and home appliance installation work), deeper natural language processing techniques and feature representation techniques that include location and materials should also be considered. Thus, future research should address these limitations and employ the findings of the current study.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Correction Statement

This article has been republished with minor changes. These changes do not impact the academic content of the article.

Additional information

Funding

This research was supported by the Ministry of Science and ICT (MSIT), Korea, under the ICT Challenge and Advanced Network of HRD (ICAN) program [IITP-2020-0-01816] supervised by the Institute of Information & Communications Technology Planning & Evaluation (IITP). This work is also supported by the Korea Agency for Infrastructure Technology Advancement (KAIA) grant funded by the Ministry of Land, Infrastructure and Transport [Grant No. 21ATOG-C161932-01].

Notes on contributors

Eunhye Kim

Eunhye Kim (BSc, Incheon National University) is a master student in the Department of Applied Data Science and Data eXperience Lab at Sungkyunkwan University. She is also affiliated with Daewoo Engineering and Construction. Her research interests include industrial digital transformation and data-driven approaches in construction industry.

Honggeun Ji

Honggeun Ji (BSc, Korea Polytechnic University) is a master student in the Department of Applied Artificial Intelligence and Data eXperience Lab at Sungkyunkwan University. He is also an engineer of artificial intelligence in Raon Data. His research interests include deep neural network for multimedia, as well as industrial artificial intelligence.

Jina Kim

Jina Kim (MSc, Sungkyunkwan University) is a researcher in the College of Computing at Sungkyunkwan University. She is also affiliated with Raon Data. Her research interests lie in solving real-world problems by finding valuable cues in online services or social media with the support of NLP techniques.

Eunil Park

Eunil Park (PhD, KAIST) is an Assistant Professor in the College of Computing, Sungkyunkwan University. His research interests include data science, HCI, and user behavior. He is the inaugural receipt of the NRF-Elsevier Young Researcher Award in Korea (Interdisciplinary Studies). As one of the interdisciplinary scientists in the field, his research results have been published in numerous international social science journals as well as at scientific journals.

Notes

References

  • Brodetskaia, I., R. Sacks, and A. Shapira. 2011. “A Workflow Model for Systems and Interior Finishing Works in Building Construction.” Construction Management and Economics 29: 1209–1227. doi:10.1080/01446193.2011.647829.
  • Consumer News. 2021. “[Consumer Complaint Assessment - Construction] A Number of Complaints Concerning Contracts, Defects, and Services.” [ Online]. Accessed 10 June 2021. http://www.consumernews.co.kr/news/articleView.html?idxno=627935
  • Cortes, C., M. Mohri, and A. Rostamizadeh. 2009. “L2 Regularization for Learning Kernels.” In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, 109–116. Montreal, QC, Canada.
  • Georgiou, J. 2010. “Verification of a Building Defect Classification System for Housing.” Structural Survey 28: 370–383. doi:10.1108/02630801011089164.
  • The Housing Construction Supply Division in South Korea. 2013. “Apartment Defect Dispute Mediation Committee.” [ Online]. Accessed 18 November 2020. http://www.adc.go.kr/
  • The Housing Construction Supply Division in South Korea. 2018. “Multi-family Housing Management Act.” [ Online]. Accessed 18 November 2020. https://www.law.go.kr/LSW/eng/engLsSc.do?menuId=2&section=lawNm&query=15454
  • The Housing Policy Division in South Korea. 2018. “Housing Act.” [ Online]. Accessed 18 November 2020. https://www.law.go.kr/LSW/eng/engLsSc.do?menuId=2&section=lawNm&query=16006
  • Joachims, T. 1997. “A Probabilistic Analysis of the Rocchio Algorithm with Tfidf for Text Categorization.” In Proceedings of the Fourteenth International Conference on Machine Learning, 143–151. Pittsburgh, PA: Morgan Kaufmann Publishers.
  • Kang, M., 2020. “Housing Survey Statistical Report (2019).” [ Online]. Accessed 18 November 2020. http://stat.molit.go.kr/portal/cate/statFileView.do?hRsId=327&hFormId=
  • Kim, B., Y. Ahn, and S. Lee. 2019. “Lda-based Model for Defect Management in Residential Buildings.” Sustainability 11: 7201. doi:10.3390/su11247201.
  • Kim, B. O., Y. D. Je, H. S. Song, and S. B. Lee. 2011. “Prediction Model Development of Defect Repair Cost for Apartment House according to Performance Data.” Journal of the Korea Institute of Building Construction 11: 459–467. doi:10.5345/JKIBC.2011.11.5.459.
  • Kim, Y. S., S. W. Oh, Y. K. Cho, and J. W. Seo. 2008. “A Pda and Wireless Web- Integrated System for Quality Inspection and Defect Management of Apartment Housing Projects.” Automation in Construction 17: 163–179. doi:10.1016/j.autcon.2007.03.006.
  • Kwon, N., Y. Ahn, B. S. Son, and H. Moon. 2021. “Developing a Machine Learning- Based Building Repair Time Estimation Model considering Weight Assigning Methods.” Journal of Building Engineering 43: 102627. doi:10.1016/j.jobe.2021.102627.
  • Kwon, O. S., C. S. Park, and C. R. Lim. 2014. “A Defect Management System for Reinforced Concrete Work Utilizing Bim, Image-matching and Augmented Reality.” Automation in Construction 46: 74–81. doi:10.1016/j.autcon.2014.05.005.
  • Lee, S., S. Lee, and J. Kim. 2018. “Evaluating the Impact of Defect Risks in Residential Buildings at the Occupancy Phase.” Sustainability 10: 4466. doi:10.3390/su10124466.
  • Lin, Y. C., J. X. Chang, and Y. C. Su. 2016. “Developing Construction Defect Management System Using Bim Technology in Quality Inspection.” Journal of Civil Engineering and Management 22: 903–914. doi:10.3846/13923730.2014.928362.
  • Macarulla, M., N. Forcada, M. Casals, M. Gangolells, A. Fuertes, and X. Roca. 2013. “Standardizing Housing Defects: Classification, Validation, and Benefits.” Journal of Construction Engineering and Management 139: 968–976. doi:10.1061/(ASCE)CO.1943-7862.0000669.
  • Mikolov, T., K. Chen, G. Corrado, and J. Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” [ Online]. Accessed 18 November 2020. https://arxiv.org/abs/1301.3781
  • Milion, R. N., T. Da Cl Alves, and J. C. Paliari. 2017. “Impacts of Residential Construction Defects on Customer Satisfaction.” International Journal of Building Pathology and Adaptation 35: 218–232. doi:10.1108/IJBPA-12-2016-0033.
  • Park, C. S., D. Y. Lee, O. S. Kwon, and X. Wang. 2013. “A Framework for Proactive Construction Defect Management Using Bim, Augmented Reality and Ontology- Based Data Collection Template.” Automation in Construction 33: 61–71. doi:10.1016/j.autcon.2012.09.010.
  • Park, E. L., and S. Cho. 2014. “Konlpy: Korean Natural Language Processing in Python.” In Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology, 133–136. Chuncheon, Korea.
  • Park, J., and D. Seo. 2017. “Basic Study on Term of Warranty Liability for Finish Work Defect in Apartment Building.” Water Supply 6: 14.
  • Park, S., Y. Ahn, and S. Lee. 2018. “Analyzing the Finishing Works Service Life Pattern of Public Housing in South Korea by Probabilistic Approach.” Sustainability 10: 4469. doi:10.3390/su10124469.
  • Raslanas, S., J. Alchimovienė, and N. Banaitienė. 2011. “Residential Areas with Apartment Houses: Analysis of the Condition of Buildings, Planning Issues, Retrofit Strategies and Scenarios.” International Journal of Strategic Property Management 15: 152–172. doi:10.3846/1648715X.2011.586531.
  • Shi, S., Z. Yang, D. Tripe, and H. Zhang. 2015. “Uncertainty and New Apartment Price Setting: A Real Options Approach.” Pacific-Basin Finance Journal 35: 574–591. doi:10.1016/j.pacfin.2015.10.004.
  • Tabatabai, S. J. 2017. “The Nature, Terms and Legal Effects of Presale or Pre- Construction Contracts of Building (Apartment).” Journal of Policy & Law 10: 228.
  • Watt, D. S. 2009. Building Pathology: Principles and Practice. Oxford, UK: Blackwell Publishing.