768
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Picture This: A Deep Learning Model for Operational Real Estate Emissions

ORCID Icon & ORCID Icon
Article: 2251982 | Received 25 Apr 2023, Accepted 22 Aug 2023, Published online: 21 Sep 2023

Abstract

We present a deep learning model estimating carbon dioxide equivalent (CO2e) emissions in the real estate sector. The model, which utilizes convolutional neural networks (CNNs) and image classification techniques, is designed to estimate CO2e emissions based on publicly available images of buildings and their corresponding emissions. Our findings show that the model has the ability to provide reasonably accurate estimations of CO2e emissions using images as the sole input. Notably, incorporating primary energy sources as additional input further improves the accuracy up to 75%. The creation of such a model is particularly important in the fight against climate change, as it allows for transparency and fast identification of buildings, contributing significantly to CO2e emissions in the building sector. Currently, information on emission intensity in the real estate sector is scarce, with only a few countries collecting and providing the required data. Our model can help reduce this gap and provide valuable insights into the carbon footprint of the real estate sector.

Introduction

The release of carbon dioxide (CO2) from the building stock constitutes a significant portion of worldwide greenhouse gas (GHG) emissions (UNEP, Citation2022). In the European Union, buildings are accountable for approximately 40% of final energy consumption. (European Commission, Citation2019). The reduction of greenhouse gas emissions in real estate holds significant potential for mitigating global warming and achieving the imperative objective of keeping the global temperature increase below 2°C (IPCC, Citation2022). Notably, approximately 85% of the building stock in the European Union predates 2001 (European Commission, Citation2020), a period characterized by less strict energy performance requirements or more relaxed standards than those in place currently (Joint Research Centre (European Commission), Citation2019). Based on the report presented by the European Commission (European Commission, Citation2020), it becomes clear that the stipulated climate targets aiming at reducing energy consumption by a minimum of 60% shall remain unfulfilled despite the proposed weighted renovation quotas of 1 and 0.2% for deep renovations. There is an apparent discrepancy between actions required and actions taken. This discrepancy, also referred to as decarbonization gap, has increased in the EU in recent years with the only exception being 2021 due to the COVID-19 pandemic (Buildings Performance Institute Europe (BPIE) Citation2022).

While reducing the CO2e emissions of the real estate sector is one of the pillars to limit global warming to below 2°C (Directorate-General for Energy, Citation2012; IPCC, Citation2022), complete nation-wide emission data on the asset level is scarce. To tackle this problem, the EU introduced the Energy Performance of Buildings Directive (EPBD) recast in 2010, making it mandatory to provide potential buyers and tenants of a property with an energy performance certificate (EPC) (European Parliament, Citation2010). After its introduction, EPCs turned into one of the most important sources of information on energy performance of buildings (Arcipowska et al., Citation2014). However, there is criticism on how EPCs are calculated (Coyne & Denny, Citation2021; Majcen et al., Citation2013; Pasichnyi et al., Citation2019) and also that EPCs are not publicly available in many countries (Arcipowska et al., Citation2014; United Nations Environment Programme, Citation2021). The European Union’s response to the aforementioned critique is encapsulated within the revised EPBD. Nevertheless, the timely development of relevant databases, serving as a dependable foundation for forthcoming sustainable real estate policies or renovation strategies, remains an open question. Further creation of EPCs is, at the moment, only mandatory in the EU when selling an object or letting it out to a new tenant. Even with the creation of publicly available EPC databases, the energy performance of owner-occupied real estate and long-term rented objects will remain intransparent at the current state.

To reduce this information barrier, a new approach requiring little resource to estimate GHG emissions of buildings has to be developed. As the determination of energy performance is based on many visual features such as the roof, windows, building age, walls, building size, and materials used (BRE, Citation2022), a deep learning model could possibly provide an approximation of the energy performance and CO2e emissions. In other fields, deep learning has proven to be a possibility to produce cost effective and efficient tools, using only little data as input (Jenkins & Burton, Citation2008; Liermann et al., Citation2019). Nonetheless, despite its already existing utilization within the real estate sector, there is still a considerable scope for continued exploration and investigation into the potential applications of this subject matter (Koch et al., Citation2019).

In this paper, we provide a to our knowledge novel approach to estimating CO2e emissions of residential buildings using a pre-trained Convolutional Neural Network (CNN) model. We train our model using EPC data from the United Kingdom and corresponding Google Street View (GSV) images. The remainder of the paper is structured as follows: In Section “Literature Background and Context,” we provide an overview over relevant literature for our study. In Section “Methodology,” we describe the methodology used to built our model and the database; in Section “Results,” we describe the results of our model, and, in Section “Discussion and Limitations,” discuss limitations and further research potential.

Literature Background and Context

With burgeoning computing capacity, Artificial Intelligence (AI) and especially Machine Learning (ML) systems, due to their potential to automate time consuming processes, have become increasingly popular for practical appliances such as computer vision, speech recognition, natural language processing, robot control, and others (Jordan & Mitchell, Citation2015). With rising interest, real estate is adapting the new methodologies in various ways.

Deep Learning, a subset of Machine Learning, particularly Convolutional Neural Networks (CNNs), has demonstrated significant utility in the realm of image analysis models and computer vision for real estate applications. Its profound efficacy lies in its ability to accurately forecast a diverse range of variables. Koch et al. (Citation2019) provide a literature overview of the image analysis done in real estate research with CNNs. The authors classify the models into distinct categories, based on the input data employed, which comprise models predicated on aerial and satellite images, exterior front-view images, floor plans, and interior images. Due to the high accessibility of data, many papers utilize satellite images as their training and validation input. Earlier papers in this field on real estate related image analysis predominantly work on land-use segmentation or classification (Fröhlich et al., Citation2013; Marmanis et al., Citation2015; Yang & Newsam, Citation2010) and building footprint detection (Cohen et al., Citation2016; Ok, Citation2013; Raikar & Hanji, Citation2016). Accelerating progress in image classification through improvement of deep learning algorithms, especially the introduction of CNNs, and increasing strength of graphic processor units (Gu et al., Citation2018) has opened the possibility to take on more complex tasks and better results.

Jean et al. (Citation2016), for example, predict the economic well-being of five different African countries using a CNN, achieving an accuracy of up to 75% explaining the variability of both asset wealth and consumption expenditure. The researchers construct their model by leveraging publicly accessible data, thereby endowing the model with the potential for scalability to different geographical regions. Through the utilization of satellite image data, the presented model allows for the extraction of valuable information, thus contributing to the amelioration of a data transparency gap. We share a similar motivation of extracting information from publicly available data with a CNN, to build a model that could reduce a data gap. Koch et al. (Citation2021) present a CNN to predict where university graduates live in a city at a micro level, using different density classes. The presented model is able to derive the information using only satellite images after training. They find that their model has trouble differentiating neighbouring classes and thus allow a standard deviation of 1, to determine their overall model fit. The accuracy of the model increases by 37.8% when compared to true class prediction accuracy.

Exterior front-view images provide an alternative basis for image analysis in real estate, presenting features not included in satellite images. With the increase in their availability through web based services such as GSV or Apple’s Look Around, new areas of application have opened up (Koch et al., Citation2019). Paired with the increase in the efficiency of CNNs, several use-cases with high accuracy have been presented. Schmitz and Mayer (Citation2016) use a CNN for the semantic interpretation of facade images. They achieve an accuracy of 85% differentiating facades, doors, windows and other building features. Furthermore, they conclude that even with relatively small training data sets CNNs can be trained sufficiently, if the available data is augmented and an already existing pre-trained model is fine-tuned. Obeso et al. (Citation2017) train their model to distinguish street-view images of buildings in Mexico into four different architectural classes (pre-Hispanic, colonial, modern and other). The model is not only capable of identifying buildings correctly, but also of deriving distinct building features from street-view images linked to a particular architectural style. They suggest that CNNs are able to identify architectural characteristics of a building and assign it to the fitting class with relatively high accuracy (88.01%). Further applications utilizing exterior street-view images to classify buildings include the prediction of building age (Zeppelzauer et al., Citation2018) and building condition (Koch et al., Citation2018). Considering that both the age and condition of a building contain valuable insights into the emissions of CO2e from that building, it is reasonable to contemplate the possibility of developing a comprehensive model that can extract pertinent building characteristics for emission prediction through the analysis of street-view images using a CNN.

Despotovic et al. (Citation2019) build on the previous models and simultaneously predict the construction age and the heating demand, assuming that both can be predicted solely based on visual exterior features of a building. The presented model is able to predict both variables with similar precision resulting in an accuracy of 62% differentiating between five demand classes, proving that exterior images provide enough information to approximately determine their energy demand.

Utilizing CNNs in real estate image classification tasks, has been successful in the past, with all the presented models achieving accuracies significantly above the random probability threshold. Even complex matters, such as architectural style or heating demand, usually determined by professionals, can be approximated using CNNs, as the model are able to extract enough of the required information from the input images. The outcome of this literature review indicates that image analysis utilizing CNNs holds promise for mitigating the information gap associated with the estimation of CO2e emissions attributed to the building stock.

Methodology

Over the past few years, the use of CNNs as a visual-based algorithm has become widely accepted as the standard in the scientific community (Shi et al., Citation2018; Wu et al., Citation2019; Zhou et al., Citation2018). CNNs are multilayer neural networks, taking their name from the convolutional layers which are part of their architecture (Albawi et al., Citation2017). The models are able to learn and recognize visual patterns from the input database, making them very effective for image classification tasks (Sarker, Citation2021). Various models have emerged since the advent of the AlexNet model in 2012, originating from different annotated datasets, which vary in architecture, primarily in the number of layers in a network (depth), the number of kernels per layer (width), and the dataset used (Krizhevsky et al., Citation2017; Wu et al., Citation2019). Investigations have shown that the depth of the model is particularly important for the success of neural networks, especially looking at accuracy or identifying complex features. However, training networks with a substantial depth is challenging, requiring high computational power and offering limited help after a certain stage (Srivastava et al., Citation2015). In order to achieve optimization, a variety of architectures have been developed that utilize identity shortcut connections between layers. This allows for improved information flow between different parts of the network, resulting in more efficient and accurate computations. This has led to the general concept of Residual Neural Networks (short: “ResNets”) (He et al., Citation2016; Targ et al., Citation2016), of which the approach implemented by He et al. (Citation2016) constitutes the fundamental architecture of our model. In our baseline model, we employ a 34-layer ResNet model, which processes the image input from Google Street View.

The foundation for the training and validation data consists of two components, EPCs and GSV. While European energy certificates are subject to the EPBD, there are still differences between the individual member states (Semple & Jenkins, Citation2020). Thus, in order to obtain reliable information regarding the CO2e emissions of buildings, an extensive and quality-controlled database is necessary. After conducting a thorough comparison of European energy performance certificate databases, it was determined that the Energy Performance of Buildings Register database of the Department for Levelling Up, Housing & Communities England & Wales is particularly suitable as a data source for the required emissions data. Particularly, UK EPCs incorporate three important factors: (1) the database is freely available and, therefore, particularly suitable for research purposes, (2) the CO2e emission data does not require any conversion since the data is already displayed in the EPCs, and (3) EPCs are only issued by accredited assessors, which should ensure uniform quality. Additionally, there is already extensive literature on the used UK EPCs which can help to consider any potential information asymmetries and specific points in our research as well as identify weaknesses of the EPCs (e.g. as shown by Crawley et al. (Citation2019); Fuerst et al. (Citation2015); Taylor et al. (Citation2019)).

Operational emissions (in CO2e) of buildings are calculated based on two components. Firstly, the existing energy source, and secondly, the energy intensity of the building. The calculation is then derived by multiplying the conversion coefficient of specific energy sources with the energy intensity of the building. Consequently, in practice, it is possible for older, less modernized buildings, for example, to emit low CO2e values due to their utilization of renewable energies, resulting in a low emission factor despite their high energy intensity. Thus, the trained model faces a high degree of complexity as the overall energy consumption of the house alone is insufficient to deduce the emissions. Instead, a detailed examination of externally visible characteristics, typical for each energy source, must take place. In this context, indicators such as existing chimneys or renewable energy systems can provide insight into the energy source. If none of these characteristics are externally visible, the model may estimate the energy source based on the observable condition or year of construction. The current Energy Efficiency Report (Office for National Statistics, Citation2022) suggests that the energy source correlates with the respective construction years, revealing trends specific to each period. Consequently, a subsequent heating system modernization may go unidentified by the model, potentially leading to inaccuracies.

The calculation and procedures for issuing EPCs for residential buildings in the United Kingdom are generally based on the Standard Assessment Procedure (SAP), which can be applied in a reduced form as Reduced Data SAP (RdSAP) for older buildings constructed before 2008 (BRE, Citation2022). The latest version of the SAP, the SAP 10.2, has been introduced in England in June 2022 and in Wales in November 2022. The SAP calculates (only) the operational CO2e emissions based on the emission factors specified in Table 12 of the SAP annex, which are given in emissions of kg CO2e per kWh. These emission factors are based on the energy source present in the building. In addition to the conventional Energy Efficiency Rating of A–G, this construction of the so-called Environmental Impact Rating represents a measure of notional CO2e emissions. As is the case with the Energy Efficiency Rating: the higher the rating, the lower the expected emissions, with A equating to the lowest emissions and G to the highest emissions (Department for Business, Energy & Industrial Strategy, Citation2022). For the presentation in our training and validation split, the relative indication of CO2e emissions per m2 per year is used.

In December 2022, the England EPC-database held a total of 24,015,615 records. However, considering the ongoing process of EPC updates and their limited validity period, a portion of the available data is outdated. Therefore, it is unsuitable for the purpose of our dataset. If for example an address has two EPCs in the database, we eliminate the EPCs not depicted in the GSV image. This is accomplished by cross-referencing the date when the GSV image was captured with the date of issuance for the EPC. If the EPC is issued after the image was taken, we eliminate the data point. Further we filter addresses, which have no GSV image availability and are therefore unsuitable for our purpose. In order to maintain balance within our database, we restrict the number of data points for each class to match the lowest available data count among all classes. For a CNN model, a balanced and sizeable quantity of images in each class constitutes an indispensable prerequisite for achieving model precision (Luo et al., Citation2018). Predominantly, buildings of interest in the database are within the range of 0–140 CO2e per m2 per year. EPCs that exceed or fall below this range exist but are lacking in terms of the required frequency and timeliness. The classification of our CO2e emissions categories is based on the established Environmental Impact Rating, which designates seven classes (Department for Business, Energy & Industrial Strategy, Citation2022). As depicted in , each classification corresponds to a unique range of 20 kg CO2e per m2 per year.

Table 1. CO2e emission classifications.

After all expired, duplicate, and outdated EPCs, as well as EPCs for buildings outside the time periods available on Google Street View, were excluded from the database, a random sample of 3,000 images was drawn for each classification. Especially the availability of high emission class data after the filtering process limits us to this number. This results in a dataset of 21,000 EPCs distributed throughout England. The descriptive statistics for the entire dataset are shown in .

Table 2. Descriptive statistics of the EPC sample.

The descriptive statistics demonstrate a balanced sample across the classifications. The Energy Efficiency Score, which indicates the calculated energy efficiency class, differs from our classification, as it focuses on energy intensity rather than CO2e emissions (Department for Business, Energy & Industrial Strategy, Citation2022). The analysis of energy sources’ descriptive statistics (refer to in the Appendix) demonstrates a notable prevalence of gas utilization in low-emission categories, where more than 90% of cases are attributed to gas heating in the initial three classes. In the higher classes, this observation shifts towards electricity as the primary energy source. As previously explained, CO2e consumption of buildings is fundamentally calculated based on the primary energy source and energy intensity. In this context, a positive correlation coefficient between energy intensity and emission values can be assumed. Therefore, for the validity of the sample, it appears crucial to have sufficient variability between energy sources and energy intensity. Regarding the seven CO2e classifications, a differentiated picture emerges ( in the Appendix). For energy-efficient buildings, there appears to be a strong correlation between energy consumption and CO2e emissions. However, this strong correlation gradually weakens, until in category 120–140 CO2e/m2/p.a., there is only a correlation coefficient of approximately 0.3.

The Google Street View images were retrieved using the address information provided in the EPC database. The dataset comprises numerous images in which buildings are obscured by obstacles such as cars, vegetation, or other buildings, or are not visible at all. To train the model effectively and reduce the error rate, these images must be removed from the dataset. Examples of high and low information content images are provided in . Images with high information content are characterized by their substantial explanatory value for the model, as they provide a comprehensive view of the building, thus serving as a solid foundation for feature extraction. Conversely, images where buildings are obscured by other objects or captured from unfavorable angles offer limited explanatory value for the model.

Figure 1. High information content (top) and low information content (bottom) images.

Figure 1. High information content (top) and low information content (bottom) images.

Following the manual filtration of images with low information content, a total of 17,922 images remained, resulting in the removal of approximately 14.66% of all images from the dataset. Across all classifications, the dataset remains balanced, as illustrated in . Achieving balance between classes is of paramount importance for machine learning models, as imbalanced training data has the potential to significantly impair the performance of convolutional neural networks (Johnson & Khoshgoftaar, Citation2019).

Figure 2. Count of images in each classification.

Figure 2. Count of images in each classification.

As previously mentioned, we use the pretrained ResNet34 model developed by He et al. (Citation2016) as our baseline model. ResNet34 is an image classification model with 34 weighted layers. The utilized model undergoes pretraining on the ImageNet dataset, where it initially learns from a large dataset for a different task. The resulting parameters from this pretraining are then fine-tuned and adjusted for our specific use case. Pretraining can significantly increase accuracy and is found to be particullary important for use cases with relatively small datasets (Li, Singh et al., Citation2019). To validate our results and identify the appropriate architecture and depth for our use case, we test the performance of several other models based on the ResNet architecture.

We have selected the following comparative models: a deeper and more complex ResNet101 (He et al., Citation2016), a NoisyStudent EfficientNet of size B3, which is a less deep and complex CNN that achieves higher efficiency (Tan & Le, Citation2019), a ResNext101 model that incorporates a cardinality building block into the ResNet architecture, reducing the number of hyperparameters required (Xie et al., Citation2017), and finally, a modified SE ResNet101 that assigns weights to different channels based on their importance for prediction (Hu et al., Citation2019). The chosen models either exhibit greater complexity compared to our baseline model (ResNet101, ResNext101, SE ResNet101) or are less complex (NoisyStudent EfficientNet). We employ this comparison to assess the suitability of our architecture for our specific use case. Deeper models can potentially offer improved performance depending on the task, but also carry the risk of overfitting the data (Ying, Citation2019). For our training the images are input as 3-channel RGB with a (resized) size of 224 × 224 pixels. The 80%/20% training and validation split is performed using a deterministic learner and a stratified k-fold split to ensure comparability of the splits across all model comparisons. In our baseline model, we use simple image augmentation techniques such as random flipping, rotation, zoom warp, and lighting transform. Additionally, we normalize the images. Hyperparameter tuning of the model is performed using the Adam optimization algorithm (Kingma & Ba, Citation2017) which optimizes the parameters during training. One of the most important hyperparameters is the learning rate, which determines the step size at which the model adjusts its internal parameters during the training process and varies based on the specific use case (Smith, Citation2017). We control the learning rate using the Smith (Citation2017) two-stage transfer learning protocol in which the first half of the epochs updates the head of the pre-trained model with a learning rate of 1e-02 and then fine-tunes the entire model with the knowledge gained in the second half. In stage 2, we gradually increase the learning rate, then gradually decrease it, starting from 1e-5 and ending at 1e-4. This allows the neural network to learn more detailed information from new data and, thereby, improve performance. (Ng et al., Citation2015) To evaluate the results, we employ the conventional accuracy equation shown in EquationEquation 1, where the accuracy is defined as the ratio of the number of correctly classified buildings to the total number of buildings in the sample, multiplied by 100 and the F1-Score (EquationEquation 2), the harmonic mean of precision and recall. The F1-Score is a popular metric when looking at models with unevenly distributed classification classes, as it accounts for the different class sizes. While our classification classes are evenly balanced we still provide F1-Scores for a better comparability with other models. Furthermore, Training and Validation Loss are compared between the models. (1) Accuracy=Number of correctly classified buildingsTotal number of buildings in the sample×100(1) (2) F1-Score=2×Precision×RecallPrecision+Recall(2)

Augmentation and Energy Source

A possible measure to tackle overfitting of CNN models is (further) augmenting the input data or increasing the input sample size (Perez & Wang, Citation2017). We randomly choose three of the following methods for the augmentation for every photo of the input sample: image rotation (45°, 90°, 135°), image transposition (left to right, top to bottom), image resizing (divide or multiply height or width by factor 2), and image cropping (reduce size by factor 2). With the overarching objective of enhancing the transparency of the market, we also opted to add additional input to our training data.

As the calculated CO2e emissions are significantly influenced by the primary energy source used (additional elaboration pertaining to the CO2e calculation is provided in Section “Methodology”), we decided to incorporate an additional RGB-layer into the image matching process, allowing for the inclusion of both the CO2e emissions and the primary energy source of the building. The necessary information regarding the primary energy source is readily obtainable from the corresponding EPCs. The same methodology utilized in the base model was applied to the new model, which includes heavy image augmentation and the incorporation of primary energy source data.

Grad-CAM Approach

In addition to the numerical metrics, we employ the Gradient-weighted Class Activation Mapping (Grad-CAM) method developed by Selvaraju et al. (Citation2017) to achieve a visual validation of the model’s performance. This technique employs a heatmap overlaid on the image to highlight regions of particular importance to the model’s predictions. The Grad-CAM method leverages the model’s gradients, which represent the direction for adjusting the CNN weights during iterative training to minimize error. By utilizing these gradients, relevant activation regions within the image are identified, providing insights into the weighting of activations corresponding to specific CO2e classes. This approach significantly improves interpretability and instills confidence. However, it is important to note that the evaluation of the model still relies on established performance metrics such as accuracy, F1-Score, training, and validation loss, which serve as the primary indicators of its overall performance.

Results

In the following, we first present the results of our baseline model. We then compare the results of the baseline model with alternative CNN models. Finally, we evaluate a modified version of the base model.

Baseline Model

We find that our baseline model achieves an accuracy of 43.38%, already outperforming random probability of 14.29% (). In it is apparent that around epoch 34 the models starts overfitting the training data, causing the validation loss to increase, while the training loss continues to decline.

Figure 3. Baseline accuracy.

Figure 3. Baseline accuracy.

Figure 4. Baseline train/validation loss.

Figure 4. Baseline train/validation loss.

After this epoch, the algorithm detects characteristics in the training set that are helpful in predicting the training data but, when used in the validation phase, are not improving accuracy due to the inability to generalize the characteristics to the overall population (Mutasa et al., Citation2020).

Model Comparison

To test the assumption that deeper models might overfit and to benchmark our baseline model, we utilize four additional state-of-the-art models. As expected, the results suggest that the more complex models have a lower training loss whereas the validation loss increased, in some cases significantly. Our training set does not seem to be suited to train the deeper and more complex models encouraging overfitting (). While the computing time using the deeper models significantly increases, none of the tested models outperform our baseline model in terms of accuracy. We suspect that due to the relatively small sample size of the data set, the deeper models are not able to generalize the features learned in the training data (Zhang et al., Citation2021). The only model outperforming the baseline model in computing time is the less complex EfficientNet model. However, it is also achieving a significantly lower accuracy.

Table 3. Benchmarking the baseline model.

Augmentation and Energy Source

The results of the image augmentation indicate an increase in accuracy of approximately 4 percentage points, with the new model achieving an accuracy of approximately 47.59% (). Although the new model exhibits superior performance in comparison to the baseline model, the modest increase in accuracy is somewhat unexpected. Upon scrutinizing the dataset, it was observed that a mere 33.4% of the buildings featured a primary energy source other than gas. If we now assume that the energy source can also be partly determined by features of the house, such as appearance, roof, or chimneys it might become clear why the addition of the primary energy source only had little effect on the prediction results, while being a significant factor in influencing the CO2e emission intensity. This fact contradicts our initial assumption that the model might only estimate energy intensity without drawing conclusions about the energy source or conversion coefficient. In this case, one would expect a higher increase in accuracy since, in practice, one of the two variables used to calculate CO2e values is effectively included in the dataset, thereby greatly reducing the complexity of the model. It is important to note that the results do not provide a direct conclusion regarding whether the baseline version of the model derives the energy source directly (through building characteristics) or indirectly (via, for example, typical trends based on construction year). To test if the addition of the primary energy source altered the model comparison, we run a 50 layer (SE ResNet) to once again benchmark the results of the ResNet34 model. Similar to the previous outcome, the SE ResNet50 model is achieving a lower accuracy at 46.89%, requiring a computing time roughly 6 times higher than the ResNet model.

Grad-CAM Approach

Although it is apparent that our model significantly outperforms random probability, it remains unclear which specific visual features the model employs to approximate CO2e emissions. To further validate our model, we use the Grad-CAM approach (Selvaraju et al., Citation2017). As shown in , the model seems to identify the relevant buildings in every emission class. With the main focus on the property, important features in determining the CO2e emission intensity such as the windows, the walls, and the roof among other components might be used for the estimation. Other parts of the image not related to the emission intensity like the sky, the street, or the neighboring buildings are identified as negligible by the model. Solely the front yard of some images seem to be helping the model with the prediction, which is not surprising as the look and size of the yards might help identify the building age of a property and, therefore, its energy efficiency (Aksoezen et al., Citation2015). In the overall context, however, the front yard does not appear to have any prevailing significance.

Figure 5. Accuracy of ResNet34 model with supplemented primary energy sources.

Figure 5. Accuracy of ResNet34 model with supplemented primary energy sources.

Figure 6. Random heat maps produced by Grad-CAM for different emission classes.

Figure 6. Random heat maps produced by Grad-CAM for different emission classes.

Accuracy

Given that our model is demonstrating superior performance compared to random probability and is visually validated through the Grad-CAM approach, it raises the question as to which factors are hindering its ability to accurately predict the emission class. Analyzing the confusion matrix in we conclude that the model is able to predict every class above random probability, with the lowest CO2e emission class being predicted exceptionally well with a F1-Score of 80% (as seen in ). With an accuracy rate of close to 50%, the model demonstrates reasonable precision across most of the other emission classes, 20–40 kg CO2e/m2 with 37% and 80–100 kg CO2e/m2 with 27% being the least accurate. Furthermore, the actual classes often seem to be confused with neighboring classes. The class with the lowest level of predictive accuracy, 80–100 CO2e/m2, is frequently misidentified with its two neighboring classes, namely 60–80 CO2e/m2 and 80–100 kg CO2e/m2.

Figure 7. Confusion matrix.

Figure 7. Confusion matrix.

Table 4. Classification report.

A similar trend can be observed across all other emission classes, except for the lowest emission class, which was identified with high accuracy from the outset. The model appears to encounter challenges in correctly classifying images falling within the mid-range of the emission spectrum into the appropriate category. Given the fixed boundaries of the predetermined emission classes, it is possible for a house with an emission of, for instance, 41 kg CO2e/m2, to be erroneously classified as belonging to the lower class, owing to the emission value being in close proximity to the upper boundary of the lower class. We test this assumption by plotting a histogram of the deviation of the model over all classes (). The histogram reveals that the majority of incorrect predictions are off by a single emission class, with only a minimal number of predictions diverging by more than one class. When accounting for this standard deviation of one emission class, the model demonstrates an accuracy of approximately 75.34%. This noteworthy improvement in accuracy highlights the model’s superior performance compared to random probability and underscores its exceptional ability to achieve accurate predictions when inaccuracies stemming from the fixed emission class boundaries are disregarded.

Figure 8. Deviation Histogram.

Figure 8. Deviation Histogram.

Discussion and Limitations

To the best of our knowledge, the discussed application of image classification regarding CO2e emission in real estate represents a novel approach that yields promising results and may play a pivotal role in fostering global transparency in the ecology of real estate. While the simplicity and swiftness of the evaluated model are counterbalanced by certain limitations, the former predominantly prevails. Subsequently, we discuss the advantages and restrictions to our model, based on a practical application in a suburban street in Manchester, UK. Utilizing the EPC data of the houses and the corresponding Google Street View images, it becomes apparent that, as shown in , there is significant heterogeneity among the emissions of individual residential buildings in the street.

Figure 9. Illustration of sample EPCs in Manchester, UK. Green = low CO2e emissions; Orange = average CO2e emissions; Red = high CO2e emissions. Satellite image acquired from Google Maps.

Figure 9. Illustration of sample EPCs in Manchester, UK. Green = low CO2e emissions; Orange = average CO2e emissions; Red = high CO2e emissions. Satellite image acquired from Google Maps.

If the primary objective is to achieve a maximum reduction in emissions, it appears that the buildings located in the middle and lower section of the street would benefit most from retrofits, as they are currently emitting high levels of CO2e. Based on front view images, the presented CNN model could potentially identify and designate these areas in a city or on a street without the need for EPC data.

As shown in , a large portion of the buildings are correctly classified by the model. In this example, the emission-intensive street section in the middle-lower area of the image is also correctly predicted. Although this example cannot be equated with an extensive test dataset, the results visualize the potential of the model. When sufficient computing power is available, the model can also be applied at the city or national level. At the same time, the weaknesses and limitations of the model are also apparent, requiring further research. Analyzing the buildings that were not correctly classified reveals the following implications:

Figure 10. Predictions based on the ResNet34 model [σ = 1]. Satellite image acquired from Google Maps.

Figure 10. Predictions based on the ResNet34 model [σ = 1]. Satellite image acquired from Google Maps.

The image quality can significantly affect the ability to identify the correct class. Upon examining the faulty classifications, images taken from unfavorable angles or obstructed by obstacles such as cars or greenery in the field of view appear to be a weakness (e.g., as shown in the incorrectly classified object in ). This result is consistent with Koch et al. (Citation2019), who are suggesting that pre-processing may be necessary to reduce noise for further use. Automated procedures (e.g., through prior object detection based on annotated images) for quality assurance of input data might thus further increase accuracy. Furthermore, certain aspects remain unclear, such as the extent to which the influence of land characteristics not directly tied to the main structure, such as the front yard, impacts the accuracy of the assessments. Further research, potentially in conjunction with object detection techniques, could offer solutions to this matter. (Zhao et al., Citation2022)

Figure 11. Wrong angle image.

Figure 11. Wrong angle image.

Google Street View data is not always up-to-date, limiting the possibilities for obtaining information on the current building stock at the macro level or for specifically identifying regular changes in the building stock, such as renovation rates. In the example from , the images were taken in August 2021 and not updated until the time of publication. The periods of capture might also differ even within one city. Therefore, a comparison of the periods between the training and validation datasets is essential. provides an example of an image of a plot of land that was taken in 2008 but is now developed and responsible for emissions as a building.

Figure 12. Empty plot of land.

Figure 12. Empty plot of land.

In addition to technical limitations due to data quality, the data foundation is subject to limitations arising from the interplay between the CO2e values indicated in the energy certificate, the energy source, and the energy intensity. As previously explained, the CO2e emission values correlate strongly and positively with energy intensity, particularly in the more emission-efficient classes. However, this correlation weakens in the less energy-efficient classes, while the F1-Scores remain relatively constant. Despite the overall high accuracy, it remains uncertain to what extent the model may be biased, potentially generalizing the energy source based on external construction quality, rather than identifying it based on specific details such as the presence of a chimney. Further research could explore the combination of two dedicated models. One model could be trained to identify energy intensity—as demonstrated, for example, by Despotovic et al. (Citation2019)—while another model could identify the energy source based on various external details, using techniques such as object detection, to provide additional insights.

One additional factor that potentially limits the data foundation of the model is the quality and meaningfulness of the obtained Energy Performance Certificates. In scientific discourse, the meaningfulness of EPCs has been subject to intense discussion. At its core, the debate centers around the difference between consumption and demand certificates (also referred to as the “Performance Gap”) (among others, Coyne and Denny (Citation2021); Cozza et al. (Citation2020); Herrando et al. (Citation2016)), the quality of data acquisition methods (among others, Hardy and Glew (Citation2019); Hårsman et al. (Citation2016); Li, Kubicki et al. (Citation2019); Pasichnyi et al. (Citation2019)), the overall meaningfulness with regard to retrofitting (among others, Christensen et al. (Citation2014); Cozza et al. (Citation2020); Gouveia and Palma (Citation2019)), and potential price premiums (among others, Fuerst and McAllister (Citation2011); Olaussen et al. (Citation2017)).

In the context of our research, the Performance Gap and potential quality constraints in the data acquisition methodology are particularly relevant. Concerning the Performance Gap, it can be observed that actual consumption values are not incorporated into the model, leading to a divergence between the estimated and actual CO2e emissions. However, incorporating consumption values would not be advantageous for the model, as individual consumption information is not contained in the front-view images, in contrast to the building’s physical properties. As Cozza et al. (Citation2021) pointed out, in addition to imprecise input data and assumptions, individual occupant behavior and varying climate data are factors contributing to the Performance Gap. These individual characteristics of an inhabited property that are difficult to capture through a front-view image play a crucial role in the calculation of EPCs. However, they are not methodologically relevant to the CNN model. On the other hand, quality constraints in the input data and assumptions, similar to inaccurate or incorrectly oriented Google Street View images, can have a direct impact on the accuracy. In this context, quality assurance of the EPCs, as suggested by Hardy and Glew (Citation2019) through machine learning, can further improve the quality of the model.

In conclusion, the input data, as previously elaborated, is one of the most important factors for a successful CNN model. Although EPCs are not perfect, they currently offer the only comprehensive means of analyzing the building’s energy-related properties and thus serve as a good foundation for feature extraction in ResNet models.

Concluding Remarks

We presented a novel approach for estimating CO2e emissions for residential buildings. In our baseline model we achieve an accuracy of 43.38%, using a ResNet34 architecture. After including image augmentation and primary energy sources to the training data, the accuracy of the model increased to 47.59%. Disregarding the rigid borders of the emission classes and allowing a standard deviation of 1 for the prediction results, our accuracy significantly increased to 75.34%.

Evaluating the achieved results, the model can help to increase transparency regarding CO2e emissions in the real estate market and, thereby, to contribute to the containment of global warming. Potential applications include the support of ESG due diligences, policy making, and identifying need for retrofit. However, there remains a scope of potential for improvement in the model. Especially looking at the database, an increase in quality and sample size of the input data could be beneficial for prediction accuracy. Moreover, the model could be extended by adding additional input information not restricted to visuals, as already demonstrated with the addition of primary energy sources. With increasing progress of computing power, more sophisticated models and images collected by autonomous vehicles, even a model tracking daily progress could be conceivable (Gebru et al., Citation2017). Owing to the model’s capability to enhance emission transparency, it could be correlated with other research domains in the real estate industry, such as price/rent prediction, which often overlooks emission data due to its unavailability. Overall, the presented model delivered promising results and might increase transparency of operational emissions in the residential housing market.

References

  • Aksoezen, M., Daniel, M., Hassler, U., & Kohler, N. (2015). Building age as an indicator for energy consumption. Energy and Buildings, 87, 74–86. https://doi.org/10.1016/j.enbuild.2014.10.074
  • Albawi, S., Mohammed, T. A., & Al-Zawi, S. (2017). Understanding of a convolutional neural network [Paper presentation]. 2017 International Conference on Engineering and Technology (ICET) (pp. 1–6), Antalya, Turkey. https://doi.org/10.1109/ICEngTechnol.2017.8308186
  • Arcipowska, A., Anagnostopoulos, F., Mariottini, F., & Kunkel, S. (2014). Energy performance certificates across the EU (Technical Report). Buildings Performance Institute Europe (BPIE).
  • BRE. (2022). The government’s standard assessment procedure for energy rating of dwellings. Version 10.2 (21-04-2022). https://bregroup.com/sap/sap10/
  • Buildings Performance Institute Europe (BPIE). (2022). EU buildings climate tracker: Methodology and introduction of building decarbonisation indicators and their results. https://www.bpie.eu/publication/eu-buildings-tracker-methodology-and-results-for-building-decarbonisation-indicators/
  • Christensen, T. H., Gram-Hanssen, K., de Best-Waldhober, M., & Adjei, A. (2014). Energy retrofits of Danish homes: Is the energy performance certificate useful? Building Research & Information, 42(4), 489–500. https://doi.org/10.1080/09613218.2014.908265
  • Cohen, J. P., Ding, W., Kuhlman, C., Chen, A., & Di, L. (2016). Rapid building detection using machine learning. Applied Intelligence, 45, 443–457. https://doi.org/10.1007/s10489-016-0762-6
  • Coyne, B., & Denny, E. (2021). Mind the energy performance gap: Testing the accuracy of building energy performance certificates in Ireland. Energy Efficiency, 14(6), 57. https://doi.org/10.1007/s12053-021-09960-1
  • Cozza, S., Chambers, J., Brambilla, A., & Patel, M. K. (2021). In search of optimal consumption: A review of causes and solutions to the energy performance gap in residential buildings. Energy and Buildings, 249, 111253. https://doi.org/10.1016/j.enbuild.2021.111253
  • Cozza, S., Chambers, J., Deb, C., Scartezzini, J.-L., Schlüter, A., & Patel, M. K. (2020). Do energy performance certificates allow reliable predictions of actual energy consumption and savings? learning from the Swiss national database. Energy and Buildings, 224, 110235. https://doi.org/10.1016/j.enbuild.2020.110235
  • Crawley, J., Biddulph, P., Northrop, P. J., Wingfield, J., Oreszczyn, T., & Elwell, C. (2019). Quantifying the measurement error on England and Wales EPC ratings. Energies, 12(18), 3523. https://doi.org/10.3390/en12183523
  • Despotovic, M., Koch, D., Leiber, S., Doeller, M., Sakeena, M., & Zeppelzauer, M. (2019). Prediction and analysis of heating energy demand for detached houses by computer vision. Energy and Buildings, 193, 29–35. https://doi.org/10.1016/j.enbuild.2019.03.036
  • Directorate-General for Energy. (2012). Energy: Roadmap 2050. Publications Office. https://data.europa.eu/doi/10.2833/10759.
  • European Commission. (2019). Commission Recommendation (EU) 2019/786 of 8 May 2019 on building renovation (notified under document C(2019) 3352) (Text with EEA relevance). Official Journal, L 127, 34–79. http://data.europa.eu/eli/reco/2019/786/oj
  • European Commission. (2020). Communication from the commission to the European parliament, the council, the European economic and social committee and the committee of the regions – A rennovation wave for Europe – Greening our buildings, creating jobs, improving lives. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52020DC0662
  • European Parliament. (2010). Directive 2010/31/EU of the European Parliament and of the council of 19 may 2010 on the energy performance of buildings. Official Journal of the European Union. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A32010L0031.
  • Fröhlich, B., Bach, E., Walde, I., Hese, S., Schmullius, C., & Denzler, J. (2013). Land cover classification of satellite images using contextual information. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2, 1–6. https://doi.org/10.5194/isprsannals-II-3-W1-1-2013
  • Fuerst, F., & McAllister, P. (2011). The impact of energy performance certificates on the rental and capital values of commercial property assets. Energy Policy, 39(10), 6608–6614. https://doi.org/10.1016/j.enpol.2011.08.005
  • Fuerst, F., McAllister, P., Nanda, A., & Wyatt, P. (2015). Does energy efficiency matter to home-buyers? An investigation of EPC ratings and transaction prices in England. Energy Economics, 48, 145–156. https://doi.org/10.1016/j.eneco.2014.12.012
  • Gebru, T., Krause, J., Wang, Y., Chen, D., Deng, J., Aiden, E. L., & Fei-Fei, L. (2017). Using deep learning and google street view to estimate the demographic makeup of neighborhoods across the United States. Proceedings of the National Academy of Sciences of the United States of America, 114(50), 13108–13113. https://doi.org/10.1073/pnas.1700035114
  • Gouveia, J. P., & Palma, P. (2019). Harvesting big data from residential building energy performance certificates: Retrofitting and climate change mitigation insights at a regional scale. Environmental Research Letters, 14(9), 095007. https://doi.org/10.1088/1748-9326/ab3781
  • Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G., Cai, J., & Chen, T. (2018). Recent advances in convolutional neural networks. Pattern Recognition, 77, 354–377. https://doi.org/10.1016/j.patcog.2017.10.013
  • Hardy, A., & Glew, D. (2019). An analysis of errors in the energy performance certificate database. Energy Policy, 129, 1168–1178. https://doi.org/10.1016/j.enpol.2019.03.022
  • Hårsman, B., Daghbashyan, Z., & Chaudhary, P. (2016). On the quality and impact of residential energy performance certificates. Energy and Buildings, 133, 711–723. https://doi.org/10.1016/j.enbuild.2016.10.033
  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition [Paper presentation]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778), Las Vegas, NV, USA. https://doi.org/10.1109/CVPR.2016.90
  • Herrando, M., Cambra, D., Navarro, M., de la Cruz, L., Millán, G., & Zabalza, I. (2016). Energy performance certification of faculty buildings in Spain: The gap between estimated and real energy consumption. Energy Conversion and Management, 125, 141–153. https://doi.org/10.1016/j.enconman.2016.04.037
  • Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E. (2019). Squeeze-and-excitation networks. […] (2020). Ieee Transactions on Pattern Analysis and Machine Intelligence, 42(8), 2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372
  • IPCC. (2022). Summary for policymakers. In P. Shukla (Eds.), Climate change 2022: Mitigation of climate change. contribution of working group iii to the sixth assessment report of the intergovernmental panel on climate change. Cambridge University Press.
  • Jean, N., Burke, M., Xie, M., Davis, W. M., Lobell, D. B., & Ermon, S. (2016). Combining satellite imagery and machine learning to predict poverty. Science, 353(6301), 790–794. https://doi.org/10.1126/science.aaf7894
  • Jenkins, R., & Burton, A. M. (2008). 100% accuracy in automatic face recognition. Science, 319(5862), 435–435. https://doi.org/10.1126/science.1149656
  • Joint Research Centre (European Commission). (2019). Achieving the cost-effective energy transformation of Europe’s buildings energy renovations via combinations of insulation and heating & cooling technologies: Methods and data. Publications Office. https://data.europa.eu/doi/10.2760/278207.
  • Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260. https://doi.org/10.1126/science.aaa8415
  • Kingma, D. P., & Ba, J. (2017). Adam: A method for stochastic optimization. arXiv Preprint. https://doi.org/10.48550/arXiv.1412.6980
  • Koch, D., Despotovic, M., Leiber, S., Sakeena, M., Döller, M., & Zeppelzauer, M. (2019). Real estate image analysis: A literature review. Journal of Real Estate Literature, 27(2), 269–300. https://doi.org/10.22300/0927-7544.27.2.269
  • Koch, D., Despotovic, M., Sakeena, M., Döller, M., & Zeppelzauer, M. (2018). Visual estimation of building condition with patch-level convnets [Paper presentation]. Proceedings of the 2018 ACM Workshop on Multimedia for Real Estate Tech (pp. 12–17), Yokohama, Japan. https://doi.org/10.1145/3210499.3210526
  • Koch, D., Despotovic, M., Thaler, S., & Zeppelzauer, M. (2021). Where do university graduates live?–a computer vision approach using satellite images. Applied Intelligence, 51(11), 8088–8105. https://doi.org/10.1007/s10489-021-02268-8
  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. https://doi.org/10.1145/3065386
  • Li, H., Singh, B., Najibi, M., Wu, Z., & Davis, L. S. (2019). An analysis of pre-training on object detection. arXiv preprint arXiv:1904.05871. https://doi.org/10.48550/arXiv.1904.05871
  • Li, Y., Kubicki, S., Guerriero, A., & Rezgui, Y. (2019). Review of building energy performance certification schemes towards future improvement. Renewable and Sustainable Energy Reviews, 113, 109244. https://doi.org/10.1016/j.rser.2019.109244
  • Liermann, V., Li, S., & Schaudinnus, N. (2019). Deep learning: An Introduction. In: Liermann, V., Stegmann, C. (Eds), The impact of digital transformation and FinTech on the finance professional. Palgrave Macmillan. https://doi.org/10.1007/978-3-030-23719-6_17
  • Luo, C., Li, X., Wang, L., He, J., Li, D., & Zhou, J. (2018). How does the data set affect CNN-based image classification performance? [Paper presentation]. 2018 5th international conference on systems and informatics (ICSAI) (pp. 361–366), Nanjing, China. https://doi.org/10.1109/ICSAI.2018.8599448
  • Majcen, D., Itard, L., & Visscher, H. (2013). Theoretical vs. actual energy consumption of labelled dwellings in the Netherlands: Discrepancies and policy implications. Energy Policy, 54, 125–136. https://doi.org/10.1016/j.enpol.2012.11.008
  • Marmanis, D., Datcu, M., Esch, T., & Stilla, U. (2015). Deep learning earth observation classification using imagenet pretrained networks. IEEE Geoscience and Remote Sensing Letters, 13(1), 105–109. https://doi.org/10.1109/LGRS.2015.2499239
  • Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of Big Data, 6(27). https://doi.org/10.1186/s40537-019-0192-5
  • Mutasa, S., Sun, S., & Ha, R. (2020). Understanding artificial intelligence based radiology studies: What is overfitting? Clinical Imaging, 65, 96–99. https://doi.org/10.1016/j.clinimag.2020.04.025
  • Ng, H.-W., Nguyen, V. D., Vonikakis, V., & Winkler, S. (2015). Deep learning for emotion recognition on small datasets using transfer learning [Paper presentation]. Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (pp. 443–449), Seattle, WA, USA. https://doi.org/10.1145/2818346.2830593
  • Obeso, A. M., Benois-Pineau, J., Acosta, A. Á. R., & Vázquez, M. S. G. (2017). Architectural style classification of Mexican historical buildings using deep convolutional neural networks and sparse features. Journal of Electronic Imaging, 26(1), 011016–011016. https://doi.org/10.1117/1.JEI.26.1.011016
  • Office for National Statistics. (2022, October 25). Energy efficiency of housing in England and Wales: 2022. ONS website. https://www.ons.gov.uk/releases/energyefficiencyofhousinginenglandandwales2022
  • Ok, A. O. (2013). Automated extraction of buildings and roads in a graph partitioning framework. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, II-3/W3, 79–84. https://doi.org/10.5194/isprsannals-II-3-W3-79-2013
  • Olaussen, J. O., Oust, A., & Solstad, J. T. (2017). Energy performance certificates–informing the informed or the indifferent? Energy Policy, 111, 246–254. https://doi.org/10.1016/j.enpol.2017.09.029
  • Pasichnyi, O., Wallin, J., Levihn, F., Shahrokni, H., & Kordas, O. (2019). Energy performance certificates—new opportunities for data-enabled urban energy policy instruments? Energy Policy, 127, 486–499. https://doi.org/10.1016/j.enpol.2018.11.051
  • Perez, L., & Wang, J. (2017). The effectiveness of data augmentation in image classification using deep learning. arXiv Preprint. https://doi.org/10.48550/arXiv.1712.04621
  • Raikar, A., & Hanji, G. (2016). Automatic building detection from satellite images using internal gray variance and digital surface model. International Journal of Computer Applications, 9, 25–33. https://doi.org/10.5120/ijca2016910418
  • Sarker, I. H. (2021). Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions. SN Computer Science, 2(6), 420. https://doi.org/10.1007/s42979-021-00815-1
  • Schmitz, M., & Mayer, H. (2016). A convolutional network for semantic facade segmentation and interpretation. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 41, 709. https://doi.org/10.5194/isprs-archives-XLI-B3-709-2016
  • Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual explanations from deep networks via gradient-based localization [Paper presentation]. IEEE international conference on computer vision (ICCV) (pp. 618–626), Venice, Italy. https://doi.org/10.1109/ICCV.2017.74
  • Semple, S., & Jenkins, D. (2020). Variation of energy performance certificate assessments in the european union. Energy Policy, 137, 111127. https://doi.org/10.1016/j.enpol.2019.111127
  • Shi, X., Sapkota, M., Xing, F., Liu, F., Cui, L., & Yang, L. (2018). Pairwise based deep ranking hashing for histopathology image classification and retrieval. Pattern Recognition, 81, 14–22. https://doi.org/10.1016/j.patcog.2018.03.015
  • Smith, L. N. (2017). Cyclical learning rates for training neural networks [Paper presentation]. 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 464–472), Santa Rosa, CA, USA. https://doi.org/10.1109/WACV.2017.58
  • Srivastava, R. K., Greff, K., & Schmidhuber, J. (2015). Training very deep networks. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 28). Curran Associates, Inc.
  • Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks [Paper presentation]. International conference on machine learning (pp. 6105–6114). https://doi.org/10.48550/arXiv.1905.11946
  • Targ, S., Almeida, D., & Lyman, K. (2016). Resnet in resnet: Generalizing residual architectures. arXiv Preprint. https://doi.org/10.48550/arXiv.1603.08029.
  • Taylor, J., Shrubsole, C., Symonds, P., Mackenzie, I., & Davies, M. (2019). Application of an indoor air pollution metamodel to a spatially-distributed housing stock. The Science of the Total Environment, 667, 390–399. https://doi.org/10.1016/j.scitotenv.2019.02.341
  • United Nations Environment Programme. (2022). Global status report for buildings and construction: Towards a zero-emission, efficient and resilient buildings and construction sector. https://www.unep.org/resources/publication/2022-global-status-report-buildings-and-construction
  • United Nations Environment Programme. (2021). 2021 Global status report for buildings and construction: Towards a zero-emission, efficient and resilient buildings and construction sector. Nairobi. https://www.unep.org/resources/report/2021-global-status-report-buildings-and-construction
  • Wu, Z., Shen, C., & Van Den Hengel, A. (2019). Wider or deeper: Revisiting the Resnet model for visual recognition. Pattern Recognition, 90, 119–133. https://doi.org/10.1016/j.patcog.2019.01.006
  • Xie, S., Girshick, R., Dollar, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks [Paper presentation]. IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5987–5995), Honolulu, HI, USA. https://doi.org/10.1109/CVPR.2017.634
  • Yang, Y., & Newsam, S. (2010). Bag-of-visual-words and spatial extensions for land-use classification [Paper presentation]. Proceedings of the 18th Sigspatial International Conference on Advances in Geographic Information Systems (pp. 270–279), San Jose, California. https://doi.org/10.1145/1869790.1869829
  • Ying, X. (2019). An overview of overfitting and its solutions. Journal of Physics: Conference Series, 1168, 022022. https://doi.org/10.1088/1742-6596/1168/2/022022
  • Zeppelzauer, M., Despotovic, M., Sakeena, M., Koch, D., & Döller, M. (2018). Automatic prediction of building age from photographs [Paper presentation]. Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval (pp. 126–134), Yokohama, Japan. https://doi.org/10.1145/3206025.3206060
  • Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3), 107–115. https://doi.org/10.1145/3446776
  • Zhao, K., Liu, Y., Hao, S., Lu, S., Liu, H., & Zhou, L. (2022). Bounding boxes are all we need: Street view image classification via context encoding of detected buildings. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–17. https://doi.org/10.1109/TGRS.2021.3064316
  • Zhou, Y., Hu, Q., & Wang, Y. (2018). Deep super-class learning for long-tail distributed image classification. Pattern Recognition, 80, 118–128. https://doi.org/10.1016/j.patcog.2018.03.003

Appendix A

Figure A1. Scatter plots for energy consumption and CO2e emissions.

Figure A1. Scatter plots for energy consumption and CO2e emissions.

Table A1. Main energy sources per classification.