709
Views
0
CrossRef citations to date
0
Altmetric
Nutrition

Machine learning models using non-linear techniques improve the prediction of resting energy expenditure in individuals receiving hemodialysis

, , &
Article: 2238182 | Received 12 Jan 2023, Accepted 14 Jul 2023, Published online: 28 Jul 2023

Abstract

Purpose

Approximately 700,000 people in the USA have chronic kidney disease requiring dialysis. Protein-energy wasting (PEW), a condition of advanced catabolism, contributes to three-year survival rates of 50%. PEW occurs at all levels of Body Mass Index (BMI) but is devastating for those people at the extremes. Treatment for PEW depends on an accurate understanding of energy expenditure. Previous research established that current methods of identifying PEW and assessing adequate treatments are imprecise. This includes disease-specific equations for estimated resting energy expenditure (eREE). In this study, we applied machine learning (ML) modelling techniques to a clinical database of dialysis patients. We assessed the precision of the ML algorithms relative to the best-performing traditional equation, the MHDE.

Methods

This was a secondary analysis of the Rutgers Nutrition and Kidney Database. To build the ML models we divided the population into test and validation sets. Eleven ML models were run and optimized, with the best three selected by the lowest root mean squared error (RMSE) from measured REE. Values for eREE were generated for each ML model and for the MHDE. We compared precision using Bland-Altman plots.

Results

Individuals were 41.4% female and 82.0% African American. The mean age was 56.4 ± 11.1 years, and the median BMI was 28.8 (IQR = 24.8 − 34.0) kg/m2. The best ML models were SVR, Linear Regression and Elastic net with RMSE of 103.6 kcal, 119.0 kcal and 121.1 kcal respectively. The SVR demonstrated the greatest precision, with 91.2% of values falling within acceptable limits. This compared to 47.1% for the MHDE. The models using non-linear techniques were precise across extremes of BMI.

Conclusion

ML improves precision in calculating eREE for dialysis patients, including those most vulnerable for PEW. Further development for clinical use is a priority.

KEY MESSAGES

  • Potentially impacting millions of patients worldwide, our continuing goal is to understand energy expenditure (EE) across the spectrum of CKD (stages 1–5) in adults and children being treated with dialysis or transplantation, with the intent of providing tools for the health professional that will improve the delivery of quality care.

  • In past research, we have identified and focused on disease-specific variables which account for 60% of the variance in predicting EE in individuals receiving dialysis, but many questions remain unanswered.

  • Our hypotheses are that (1) there are determinants of EE specific to CKD and, (2) predicting EE for individuals may be greatly advanced using sophisticated models that combine these determinants. In this study, we applied machine learning (ML) with linear and non-linear techniques to our existing dataset. The best models demonstrated improved precision in predicting EE for all individuals in the validation group.

Introduction

There are close to 700,000 individuals in the United States with stage 5 chronic kidney disease (CKD) receiving dialysis treatment [Citation1]. Despite ongoing improvements, three-year survival rates for people receiving dialysis are approximately only 50% [Citation1,Citation2]. There are many contributing factors to such poor health outcomes. As the disease progresses, suboptimal nutritional status, inflammation, concomitant hyper-catabolism and metabolic aberrations lead to protein-energy wasting (PEW), which independently exacerbates adverse health sequelae [Citation3–9].

Estimating an individual’s resting energy expenditure (REE) with precision is vital for promoting energy-balance especially among those diagnosed with PEW. It could also provide a cost-effective survival benefit for many people worldwide by targeting the appropriate nutritional and medical interventions [Citation6–12]. However, current clinical methods to assess REE for these individuals have either proven too expensive, such as measured REE (mREE) by indirect calorimetry (IC) or are insufficiently precise to account for specific disease and metabolic circumstances [Citation13,Citation14]. This includes existing predictive energy equations (EEs), such as the Harris-Benedict [Citation15] or Mifflin St Jeor [Citation16] equations which use linear regression algorithms and basic clinical variables such as weight and height [Citation15–27].

Previous investigators have shown that disease-specific variables, such as inflammation, blood glucose levels and renal biomarkers are predictive of REE in individuals receiving dialysis [Citation21,Citation23].

To date, dialysis-specific EEs have improved predictive accuracy by achieving around 60% precision (± 10% of zero difference from mREE) when tested in samples of people with similar characteristics to the development group [Citation20–25]. However, our group has demonstrated that disease-specific EEs may not perform well when transferred to different geographical or demographic samples, or to those people with outlying characteristics, such as very low or high body mass index (BMI) [Citation27]. Arguably, such individuals may be the most vulnerable.

In recent years, more sophisticated machine learning (ML) techniques have been applied to medical diagnostics for hemodialysis patients. These methodologies can improve the prediction of clinical outcomes (such as disease progression and mortality) using common clinical factors [Citation28–32]. ML takes a very different approach to traditional linear regression [Citation33,Citation34]. It engages the variables without preconception to determine algorithmic rules from inherent patterns in that data [Citation34]. This is achieved by randomizing large sets of variables into combinations of linear and non-linear models and can select features that yield complex and nuanced interactions [Citation34,Citation35]. In addition, increases in computational power have tremendously improved the researcher’s ability to process large samples of data in vastly more complex and accurate ways [Citation34,Citation35].

It is our hypothesis that such techniques may also be applied to clinical nutrition. For example, using 114 individuals, Ponce et al. applied ML techniques to predict REE in patients with acute kidney injury (AKI) receiving dialysis [Citation28]. This group used a combination of linear and non-linear regression models including Linear Regression with Stepwise Selection, Linear Regression with Regularization, RPART, Support Vector Machine with Radial Kernel, Generalized Boosting Machine, Extreme Gradient Boosting and Random Forest [Citation28]. The best model (Random Forest) predicted REE with 69% accuracy compared to 24% for the Harris-Benedict Equation [Citation28].

The Rutgers Nutrition and Kidney Database (RNKD) is a large renal database in the USA, containing over 600 clinical and demographic variables gathered for 210 hemodialysis patients over 4 separate studies undertaken between 2012 and 2018 [Citation27]. Using this database, the primary objective of this pilot study was to ascertain if a machine learning approach alone can generate a more precise estimation of REE than previous statistical methods using linear and logistical regression only [Citation20–25]. A secondary purpose was to clinically consider the features selected by the best-performing ML models, to guide the direction of future research.

Methods

Study design

This was a secondary analysis of an existing database, the RNKD. The RNKD is an amalgam of four existing studies conducted from 2012 to 2018 [Citation36–39]. The studies were all undertaken in the Northeast and Midwest regions of the United States and only included people receiving maintenance hemodialysis (MHD), 3 times a week for at least 3 months. Sampling for enrollment was conducted by convenience, meaning that individuals in dialysis clinics were asked to volunteer.

Inclusion and exclusion criteria were similar in every study and described in detail by Byham-Gray et al. [Citation21]. Participants were women and men greater than 18 years with stage 5 CKD receiving MHD (conventional method, in-center at a for-profit dialysis unit) 3 times per week for at least 3 months. Exclusion criteria included infective complications or poorly healing wounds, surgical procedures or cardiovascular events within 30 d of enrollment, recreational pharmaceutical usage, frequent ingestion of dietary supplements, a previous diagnosis of heart failure, hepatic disease, or cancer.

Data collection

All the studies in the RNKD used data-gathering protocols that were alike. The individuals and their medical records provided demographic data. Anthropometric and clinical data were gathered on a day free of dialysis. REE was determined by indirect calorimetry using a metabolic cart (Cosmed Quark RMR®, Rome, Italy). Participants were requested not to exercise vigorously and fast for 12 h before the assessment. If a 12 h fast was not possible then a 4 h fast was requested. The fast was introduced to reduce fluid accumulation and its impact on body weight and composition. IC took place before 12 pm. Participants lay still and awake for at least twenty minutes. The measurement protocol was adopted as previously defined by Olejnik et al. [Citation36] and is fully described in previous studies published by this group [Citation21,Citation23,Citation27].

Data mining

The RNKD is stored on a password-protected SPSS file on the Rutgers Box platform. After Approval from the Rutgers University Institutional Review Board (Protocol number: Pro2020001656), the dataset was extracted and delivered as a separate SPSS file. All the data were de-identified before further analysis. They were stored on a password-protected laptop and shared only with co-investigators over encrypted email or cloud. As this was a secondary analysis, no further permissions were required.

Measurement of estimated resting energy expenditure via machine learning models

In this study, we provided the largest group of variables possible to the ML models and allowed the models to effectively choose the predictive features for themselves. The RNKD was screened and cleaned for inaccuracies and assessed by two renal dietitians. Variables were excluded if found to be clinically irrelevant to human metabolism, statistically insignificant, or where insufficient values were available to maintain the integrity of the dataset. For the construction of the models, individuals were split between a training set (80% of cases) and a validation set (20% of cases). Although the case selection was random, each set aimed to maintain the BMI distribution of the entire sample so long as the data permitted, as this was pertinent to our ultimate assessment of precision. Preprocessing procedures included assessment of multicollinearity, box and cox transformations, centering and scaling predictors and creating dummy variables constructed based on linear and non-linear combinations of existing variables to assess such combined effects. Where it was deemed appropriate to impute missing variables this was done via adding the mean value applied across the variable’s present values.

In total, eleven ML models were developed in the training set using linear and non-linear regression, both internally in the ML regression techniques and in combining variables in each model. The selection included Bayesian Ridge, Elastic Net, Gradient Boosting Regressor, Lasso, Linear Regression, Linear SVR (support vector regression), MLP (multi-layer perception) Regressor, Random Forest, Ridge Regression, SGD (stochastic gradient descent) Regressor and SVR. Given the large number of features present in the dataset (435) vs. the number of patients, a feature selection technique based on optimization of the target root mean squared error (RMSE) and R squared (R2) through progressively eliminating the least impactful feature was developed. Variables were excluded from each model one by one if this lowered the RMSE. The breakpoint was set when marginal exclusion increased RMSE, at which point the number of features was set. The performance of each model was assessed in the validation set using relative analysis of the highest R2 and the lowest RMSE. We selected the best three models prioritizing RMSE as this reflects the lowest average divergence from mREE in kcal. We then used the best three models to generate estimates of REE in the validation set.

Measuring estimated resting energy expenditure via predictive equation

For this study, the best model of the Maintenance Hemodialysis Equation (including c-reactive protein {CRP}) was used (MHDE-CRP) [Citation27]. The variables used to create values for eREE were age, sex, weight, and CRP. Although not all individuals in the validation group had values for CRP, we assessed that sufficient real values existed, and that imputation of the rest would not substantially alter the explanation of variance. Missing values were imputed using the median CRP for the entire sample (training + validation sets), which reflected the distribution of values within our dialysis population at large.

Statistical and graphical analyses

No power analysis was undertaken in this study as it was previously included when the MHDE was constructed by this research group [Citation21]. At that time, n = 60 was adequate for equation building and n = 95 was adequate for validation of the equation. Our latest study included 167 individuals from the same dataset in our ML analysis, hence further investigations on sample size were deemed unnecessary. Furthermore, statistical significance was demonstrated for the relevant findings, indicating that the study utilized a sufficient sample size.

ML models were developed, and analysis performed, using Python (version 8.4.0) and Sci-Kit (version 1.1.1) learn package. Statistical analyses were performed using Statistical Package for Social Sciences (SPSS, IBM Corp., version 27, Armonk NY). If values were found to be normal via visual inspection, they were expressed as mean and standard deviation (SD). If not normal, values were stated as median, 25th and 75th percentiles. An intraclass correlation coefficient (ICC) was calculated to analyze the reliability of each equation using a model with a single rater, 2-way mixed-effects and absolute agreement [Citation37]. Alpha priori was established at 0.05.

We used a modified Bland–Altman plot to measure the levels of agreement between mREE and eREE from each model [Citation38]. The original Bland-Altman plot graphically assesses agreement between two methods of measurement by examining one method on the Y-axis by comparison with either the true measure on the X-axis or the mean of both measures if the criterion is not known [Citation38]. In this case, we used residual values calculated via percentage on the Y-axis and mREE (the criterion measure) on the X-axis. A full description of the method was previously published by this group [Citation27]. Limits of agreement for predictive equations have been established at ± 10% from zero difference from mREE in the nutrition literature [Citation38]. Those limits have been used for validation by Byham-Gray et al. [Citation21,Citation23], Morrow et al. [Citation25] and Bailey et al. [Citation27] when assessing equations for people receiving dialysis [Citation21,Citation23,Citation25,Citation27]. This graphical analysis was applied to each of the best models (and the MHDE) across the complete validation sample for which REE was generated. The analysis was subsequently repeated with the validation set divided into subgroups of BMI. Individuals with a BMI less than 24.9 kg/m2, 25–29.9 kg/m2, or ≥ 30 kg/m2 were categorized as underweight/normal weight, overweight, or obese.

Clinical and narrative analysis

We categorized the features selected by the best models into groupings to consider their clinical significance. These groupings were, demographic, anthropometric, disease-related, dynamic/clinical, patient-reported and provider-assessed. We then assessed the distribution of features amongst the groups to narratively identify trends that may assist future researchers.

Results

In total, 167 of the individuals retained sufficient variables for this study. The population was 58.7% male, 82% African American, and 80.2% Non-Hispanic (). Ages ranged between 21.5 and 80.7 years. The mean age was 56.4 ± 11.4 years (). The median BMI for the group was 28.8 (IQR = 25.8–34.0) kg/m2. 25.7% of individuals were categorized as underweight or normal weight, 35.3% as overweight, and 39.0% as obese. The sample was randomly split into 80% training sample and 20% validation sample while maintaining the BMI stratification constraint. There was no statistical difference in the frequencies of sex, race, ethnicity and BMI between the total and the validation samples.

Table 1. Frequency of clinical and demographic characteristics of individuals in the Rutgers Nutrition and Kidney Database (N = 167).

Table 2. Demographic and clinical characteristics among individuals in the Rutgers Nutrition and Kidney Database (N = 167).

Selecting the most accurate machine learning models to predict energy requirements

Eleven ML models were run and optimized within the training set (N = 133) to predict REE. Of the full dataset, 43 subjects and 171 variables were omitted due to significant missing data. 188 variables were excluded as they were not deemed clinically relevant, and 11 variables were omitted from modelling as they were not statistically relevant. In total, the optimized models selected 55 features with an individual model range between 8 and 41 features ().

Table 3. Eleven machine learning models for estimating resting energy expenditure ranked by the lowest root mean squared error.

The three best models were selected because they exhibited the lowest RMSE (kcal) from mREE. These models were SVR (103.6 kcal), Linear Regression (119.0 kcal) and Elastic Net (121.1 kcal). We then used these models to generate eREE values within the validation set (N = 34).

Measured and predicted energy requirements

The median mREE was 1512.9 kcal/d and ranged from 1079.5 to 2528.1 kcal (). The median mREE for women (1304.6 kcal/d) was lower than for men (1588.8 kcal/d). The SVR REE had the lowest average prediction of energy requirements, with a median eREE of 1472.8 kcal/d. The Linear Regression REE had the highest average prediction, with a median eREE of 1527.1 kcal/d.

Table 4. Measured and estimated resting energy expenditure among individuals in the validation set of the Rutgers Nutrition and Kidney Database (N = 34).

Levels of agreement

The greatest level of agreement occurred between mREE and the SVR REE, with 91.2% of values within ± 10% from mREE (). The other ML models both performed within acceptable limits. However, in this sample, the MHDE REE only predicted 47.1% of values within acceptable limits. ICC analysis demonstrated excellent reliability for the SVR REE and good reliability for the Linear REE and Elastic Net REE and moderate reliability for the MHDE REE. Bland-Altman plots demonstrated that eREE values within acceptable limits were evenly distributed either side of zero difference for the MHDE REE and for all three ML models ().

Figure 1. Modified Bland Altman Plot of the Percentage Difference between the MHDE REE and mREE. The black line represents zero difference from mREE. The upper red line represents 10% difference from mREE. The lower red line represents −10% difference from mREE.

Figure 1. Modified Bland Altman Plot of the Percentage Difference between the MHDE REE and mREE. The black line represents zero difference from mREE. The upper red line represents 10% difference from mREE. The lower red line represents −10% difference from mREE.

Figure 2. Modified Bland Altman Plot of the Percentage Difference between the SVR REE and mREE. The black line represents zero difference from mREE. The upper red line represents 10% difference from mREE. The lower red line represents −10% difference from mREE.

Figure 2. Modified Bland Altman Plot of the Percentage Difference between the SVR REE and mREE. The black line represents zero difference from mREE. The upper red line represents 10% difference from mREE. The lower red line represents −10% difference from mREE.

Figure 3. Modified Bland Altman Plot of the Percentage Difference between the linear REE and mREE. The black line represents zero difference from mREE. The upper red line represents 10% difference from mREE. The lower red line represents −10% difference from mREE.

Figure 3. Modified Bland Altman Plot of the Percentage Difference between the linear REE and mREE. The black line represents zero difference from mREE. The upper red line represents 10% difference from mREE. The lower red line represents −10% difference from mREE.

Figure 4. Modified Bland Altman Plot of the Percentage Difference between the Elastic Net REE and mREE. The black line represents zero difference from mREE. The upper red line represents 10% difference from mREE. The lower red line represents −10% difference from mREE.

Figure 4. Modified Bland Altman Plot of the Percentage Difference between the Elastic Net REE and mREE. The black line represents zero difference from mREE. The upper red line represents 10% difference from mREE. The lower red line represents −10% difference from mREE.

Table 5. Levels of agreement in resting energy expenditure as derived by indirect calorimetry, Compared to one predictive energy equation and three machine learning models for individuals receiving maintenance hemodialysis (N = 34).

Variability of agreement in different categories of BMI

For participants with obesity, the SVR REE and Linear REE showed the same levels of accuracy (84.6% within limits) () and Elastic Net REE predicted 76.9% of estimates within acceptable limits. For all the ML models, the values outside of limits were underestimated (). The MHDE REE demonstrated only 46.2% accuracy for obese persons, with inaccurate estimates split evenly between over and underestimation ().

Figure 5. (a–d) Percentage Difference between Four different models of eREE and mREE in people receiving MHD categorized as obese. The black lines represent zero difference from mREE. The upper red lines represent a 10% difference from mREE. The lower red lines represent −10% difference from mREE.

Figure 5. (a–d) Percentage Difference between Four different models of eREE and mREE in people receiving MHD categorized as obese. The black lines represent zero difference from mREE. The upper red lines represent a 10% difference from mREE. The lower red lines represent −10% difference from mREE.

Table 6. Levels of agreement in resting energy expenditure as derived by indirect calorimetry compared to one predictive energy equation and three machine learning models, stratified by body mass index (N = 34).

For participants who were overweight, accuracy was higher for the Linear Regression REE (100.0% within limits), closely followed by the SVR REE and Elastic Net; both 92.3% within limits. ( and ) In this subgroup, the MHDE REE performed with greater accuracy than for the total group (). Again, the ML models tended to underestimate when inaccurate and the MHDE REE tended to overestimate eREE where values were out with the limits of agreement. ()

Figure 6. (a–d) Percentage Difference between Four different models of eREE and mREE in people receiving MHD categorized as overweight the black lines represent zero difference from mREE. The upper red lines represent a 10% difference from mREE. The lower red lines represent −10% difference from mR.

Figure 6. (a–d) Percentage Difference between Four different models of eREE and mREE in people receiving MHD categorized as overweight the black lines represent zero difference from mREE. The upper red lines represent a 10% difference from mREE. The lower red lines represent −10% difference from mR.

For individuals who were of normal weight or underweight, none of the values for the MHDE REE reached the threshold for agreement and 87% of the values underestimated energy expenditure ( and ). The Linear Regression REE achieved 50% of values and the Elastic Net REE achieved 62.5% of values within acceptable limits ()). The SVR REE performed best with 100% of estimates within acceptable limits ().

Figure 7. (a–d) Percentage Difference between Four different models of eREE and mREE in people receiving MHD categorized as underweight and normal weight. The black lines represent zero difference from mREE. The upper red lines represent a 10% difference from mREE. The lower red lines represent −10% difference from mREE.

Figure 7. (a–d) Percentage Difference between Four different models of eREE and mREE in people receiving MHD categorized as underweight and normal weight. The black lines represent zero difference from mREE. The upper red lines represent a 10% difference from mREE. The lower red lines represent −10% difference from mREE.

Table 7. Features selected commonly and individually by the best three machine learning models.

Feature selection

The ML models selected 55 specific features with a very wide range of correlation to mREE (r = 0.77 − 0.005). In general, anthropometric features tended to show the highest association with mREE. The top 5 correlated features were lean body mass, weight in kg, dry weight 6 months previously, intradialytic weight gain and height in cm. As the modeling technique eliminated features deemed to be colinear, only two of the top features were employed by all three of the best models (lean body mass and intradialytic weight gain). The best models selected 48 features in total of which 16 were constant to all, 14 were common to two models and 18 were selected by only 1 model (). The best model (SVR) had the most individually selected features. The SVR also demonstrated the most even split of features among the defined demographic, anthropometric and clinical categories ().

Figure 8. Features selected by the SVR Machine Learning model presented by the Categories: Demographic, anthropometric, disease related, clinical/dynamic, Patient-Reported and Provider-Assessed. The numerals show the actual numbers of features used.

Figure 8. Features selected by the SVR Machine Learning model presented by the Categories: Demographic, anthropometric, disease related, clinical/dynamic, Patient-Reported and Provider-Assessed. The numerals show the actual numbers of features used.

Discussion

Our previous research demonstrated that traditional linear regression methods for predicting REE in patients with CKD are insufficiently accurate [Citation27]. This is especially true for more vulnerable individuals as demonstrated by extremes of BMI [Citation27]. In this pilot study, we evaluated the efficacy of ML models to better predict REE using sophisticated techniques. Although the validation sample was small, all best three models achieved improved precision over the best predictive equation in this population, the MHDE-CRP. Moreover, two of the ML models, SVR and Elastic Net, demonstrated markedly better predictive ability across the three subgroups of BMI (underweight/normal weight, overweight, obese).

Protein energy wasting, a clinical challenge

It is estimated that up to 75% of patients treated with dialysis suffer from PEW, a unique nutritional condition that is a separate and strong risk factor for poor health sequelae and mortality [Citation10,Citation40]. However, identifying people with PEW can be difficult. For example, PEW can commonly occur at all levels of BMI, including in obesity, where it can be difficult for providers to pinpoint symptoms of muscle catabolism [Citation41]. Furthermore, traditional methods for assessing malnutrition, such as the Subjective Global Assessment perform sub-optimally in specifically highlighting PEW in this population [Citation40]. The ability to identify accurate energy expenditure in all individuals treated with dialysis may address diagnostic failings and provide a critical first step in selecting an appropriate care plan.

In addition to a restrictive diet, often disrupted by dialysis treatment, several factors contribute to PEW [Citation10]. Metabolic derangements such as acidosis subdue the anabolic action of insulin and can promote the oxidation of amino acids [Citation42,Citation43]. Pro-inflammatory drivers from disease, dialysis treatment, access and the poor biocompatibility of dialysis methods can have a deleterious impact on appetite and directly exacerbate muscle catabolism [Citation10,Citation44–47]. Furthermore, hormonal changes resulting from comorbidities such as diabetes contribute to the loss of lean body mass [Citation48]. The process of PEW is both complex and dynamic, requiring multiple strategies of nutritional, medical and lifestyle intervention [Citation10,Citation45–47]. As improvements in dialysis treatment (such as improved filtration and biocompatibility of medical materials) continue to evolve, dynamic methods to assess their application are increasingly appropriate [Citation44–47] For providers to accurately assess REE, and its changes over time would provide a powerful tool in monitoring the ongoing need and effectiveness of such interventional strategies.

Demographic features

All existing equations predicting REE incorporate age and sex, as these are highly correlated with metabolic output [Citation15,Citation16,Citation20–23]. Interestingly, the best three models selected these features stochastically and neither were included by the SVR. This can be partially explained by the inclusion of LBM and FFM as variables in the dataset, which build sex into the Hume and Deurenberg Equations [Citation49,Citation50]. The omission of age by all but one of the best models is, however, a finding that begs further explanation. This may be consistent with our hypothesis that interactions of features are as important as strong correlations with mREE. For example, it may be that the SVR was able to establish the impact of age from its effects on other features. In our previous analysis of predictive equations from different geographical samples, we hypothesized that racial differences in model building have an impact on estimated REE [Citation27]. In this study, both non-linear models selected race as a feature, and one also selected ethnicity.

Disease-related features

Previous authors have found conflicting evidence regarding the impacts of clinical and disease factors on REE for dialysis patients [Citation20–23]. Byham-Gray et al. established the importance of clinical biomarkers of inflammation (CRP) diabetes (hemoglobin A1c) and muscle catabolism (serum creatinine) in equation building [Citation21]. Fernandes et al. also found a correlation between inflammation and mREE but did not determine that CRP explained REE variance in their sample [Citation20]. In the best three ML models, disease-related, clinical biomarkers and vital signs constituted the largest grouping of individual features. All three of the best models selected length of dialysis treatment, type of dialysis access, diabetes medication (insulin, or oral), anti-inflammatory medication, etiology of hypertension, intradialytic weight gain and heart rate. These factors include a wide range of variables across the spectrum of CKD complications. Of note, many of the compound issues discussed in the PEW literature are represented in the list, including markers of diabetes, inflammation and type of dialysis access [Citation10]. Indeed, it intuitively makes sense that if up to 75% of the dialysis population may be suffering from some degree of PEW, then the ML models will select PEW-relevant features in determining the metabolic drivers of REE. Our findings also agree with Ponce et al. who discovered that several disease-related, medical and dynamic factors (airway pressure, minute volume) were strong predictors in the best model for critically ill patients with AKI [Citation28]. Although chronic CKD and critical AKI seem at opposite ends of the kidney-disease spectrum, they share the symptoms of profound metabolic alteration due to extensive medical complications.

Patient-reported features

Another finding was the abundance of patient-reported variables related to appetite. The best models all selected ‘enjoy mealtimes’ ‘weekly appetite rating’ ‘appetite rating’ and ‘daily appetite.’ Additionally they variably selected another 8 items related to appetite, intake and mealtime enjoyment, on or after dialysis treatment. From a clinical perspective, appetite loss has long been established as a common occurrence in patients receiving dialysis, where the treatment burden leads to substantial fatigue [Citation51]. Moreover, poor appetite is associated with elevated levels of inflammatory cytokines and is a reliable marker of the proinflammatory state [Citation52].

Provider-assessed features

We also observed that elements of malnutrition screening were commonly selected by the best three models, including three sections of the Subjective Global Assessment [Citation53]. These included the physical examination, functional abilities and gastrointestinal symptoms. All these variables give information about an individual’s functional outputs in terms of symptoms of frailty, energy utilization and the altered faculty to eat [Citation53]. Although the SGA has been demonstrated to be a poor diagnostic for PEW directly, it may provide valuable information as part of a more comprehensive investigation [Citation39].

Assessing the impact of non-linear modelling techniques

We hypothesized that by using non-linear modelling techniques, greater precision would be achieved in predicting REE across categories of BMI. The relationship of several factors influencing REE are known to be non-linear, the most fundamental being the interaction between height and weight [Citation20–23,Citation27,Citation28]. In our previous research, we demonstrated that equations using linear techniques will perform with less precision at extremes of BMI as they fail to account for the changing relationship of height and weight along the correlation curve [Citation27]. Ponce et al. observed that when ranking several ML models for the prediction of REE in patients with AKI, the non-linear models performed with greater accuracy than the linear models [Citation28]. This study confirms these findings. Of our best three models, two were non-linear. Additionally, when the best models were assessed in subgroups of BMI, the non-linear models performed equally well for those individuals with the highest and lowest BMI. For example, the SVR REE predicted 100% of values within acceptable limits for the lowest subgroup of BMI and 85% of values within limits at the highest levels of BMI. In the past, where regression analysis has been used to create predictive equations, it has been argued that simplicity in the algorithm is a major consideration for use in clinical practice [Citation20–23]. This has led authors to reject non-linear methods due to the difficulty of manual calculation. As almost all medical calculations are available online, or as apps on mobile devices, we argue that ‘use with a pocket calculator’ is an issue now largely redundant. Furthermore, the ability to get a large number of biomarkers from the patients’ electronic health records leads the way to easily embedding ML-based techniques for the prediction of REE.

Limitations of the study

The original studies in the RNKD were convenience-sampled in the Northeast and Midwest regions of the USA and hence the population was not as diverse as the national average. Additionally, those studies imposed strict medical criteria which resulted in the omission of sicker individuals. Many key variables (anthropomorphic and IC) were gathered on a non-dialysis day. This could affect a post-dialysis weight and BMI, dependent on an individual’s fluid intake and residual renal excretion. Only conventional hemodialysis was undertaken in the original studies. This gives limited insight into the clinical feature differences that may be attributable to peritoneal dialysis or more advanced techniques (such as hemodiafiltration or expanded hemodialysis). Future research should undertake a more comprehensive review of dialysis procedures. For the purpose of this study, certain variables were omitted from the ML dataset to preserve the number of subjects available for training and validation. This includes key clinical markers such as CRP, hemoglobin A1c and serum creatinine which have been previously shown to correlate with mREE. Notwithstanding the omissions of variables, the validation set only comprised of 34 individuals, which represents a small sample size. Finally, the best model (SVR) gave substantially improved precision and a glimpse into the features that may contribute. However, the model does not generate an equation and is, therefore, less interpretable as to the direction of effect.

Implications for practice and research

The application of a precise algorithm for predicting REE in patients receiving dialysis could present a powerful tool for providers to implement and monitor nutritional, medical and physical interventions to mitigate PEW. Current predictive equations provide inadequate precision and lack the scope to model clinical changes in metabolic needs. This is the first study to use ML to predict REE in this patient population with results that suggest a potential step change. However, this was a pilot study utilizing an existing database. To preserve the maximum sample size, some key clinical and functional biomarkers were not presented to the ML models. A research priority would be to expand the dataset to include the missing data and further explore interactions. Although our best model demonstrated high precision, it used many esoteric variables to generate accuracy. This necessarily limits the direct applicability to the clinical setting. A necessary next step is to identify the best dialysis biomarkers that could approximate the precision, and which patient-focused questions may help fill in the gaps. Thereafter the analysis should include people from different geographical locations.

ML is a process best applied to clinical spaces rich in data. A further application in the clinical nutrition field is in critical care, where data points are gathered throughout the day, precise calculation of REE is vital and REE is labile depending on the patient’s medical progress. Another related field is exercise physiology where the measured inputs of athlete nutrition, lifestyle and training schedules may shed light on the metabolic outputs of lean body mass, REE and performance.

Conclusion

Machine learning models using non-linear techniques potentially provide a step-change in predicting REE in individuals with CKD. Feature selection by the ML models suggest that many contributing medical factors of PEW explain the variability of REE. Such information could reveal cost-effective strategies to benefit millions of people worldwide. Further research in this area is a priority.

Author contribution

Alainn Bailey and Laura Byham-Gray initially conceived and designed this study. Suril Gohel and Mohamed Eltawil selected and built the machine-learning models. Alainn Bailey, Laura Byham-Gray, Mohamed Eltawil and Suril Gohel contributed to the appropriate selection of data, statistical analysis and interpretation of the results. Alainn Bailey produced the draft paper, and it was revised for intellectual content by Alainn Bailey, Laura Byham-Gray, Suril Gohel and Mohamed Eltawil. All the authors agree to be accountable for all aspects of the work.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data that support the findings of this study are available on request from the corresponding author, Laura Byham-Gray, and require a data-sharing agreement. The data are not publicly available due to restrictions, i.e. the data may contain information that could compromise the privacy of research participants.

Additional information

Funding

This study was supported by funding from the National Institute of Health, mechanisms [1R15DK090593 01A1; 3R15DK090593 02; 6R15DK09059302], AHRQ mechanism [1K8HS023434 01A1], and from funding from the Academy of Nutrition and Dietetics, and the Rutgers Intramural School of Health Professions Grant Program.

References

  • United States Renal Data System. Annual data report, executive summary. 2019 [cited 2023 May 20]. Available from: https://www.usrds.org/media/2371/2019-executive-summary.pdf
  • National Kidney Foundation. KDOQI. clinical practice guidelines for chronic kidney disease: evaluation, classification and stratification. 2002 [cited 2023 May 20]. Available from: https://www.kidney.org/sites/default/files/docs/ckd_evaluation_classification_stratification.pdf. Accessed May 20 2023.
  • Caron N, Peyrot N, Caderby T, et al. Energy expenditure in people with diabetes mellitus: a review. Front Nutr. 2016;3:1. doi: 10.3389/fnut.2016.00056.
  • Fouque D, Kalantar-Zadeh K, Kopple J, et al. A proposed nomenclature and diagnostic criteria for protein-energy wasting in acute and chronic kidney disease. Kidney Int. 2008;73(4):391–15. doi: 10.1038/sj.ki.5002585.
  • Carrero JJ, Stenvinkel P, Cuppari L, et al. Etiology of the protein-energy wasting syndrome in chronic kidney disease: a consensus statement from the international society of renal nutrition and metabolism (ISRNM). J Ren Nutr. 2013;23(2):77–90. doi: 10.1053/j.jrn.2013.01.001.
  • Herselman M, Moosa MR, Kotze TJ, et al. Protein-energy malnutrition as a risk factor for increased morbidity in long-term hemodialysis patients. J Ren Nutr. 2000;10(1):7–15. doi: 10.1016/s1051-2276(00)90017-7.
  • de Mutsert R, Grootendorst DC, Axelsson J, et al. Excess mortality due to interaction between protein-energy wasting, inflammation and cardiovascular disease in chronic dialysis patients. Nephrol Dial Transplant. 2008;23(9):2957–2964. doi: 10.1093/ndt/gfn167.
  • Avesani CM, Draibe SA, Kamimura MA, et al. Resting energy expenditure of chronic kidney disease patients: influence of renal function and subclinical inflammation. Am J Kidney Dis. 2004;44(6):1008–1016. doi: 10.1053/j.ajkd.2004.08.023.
  • Kamimura MA, Draibe SA, Dalboni MA, et al. Serum and cellular interleukin-6 in haemodialysis patients: relationship with energy expenditure. Nephrol Dial Transplant. 2007;22(3):839–844. doi: 10.1093/ndt/gfl705.
  • Ikizler TA, Cano NJ, Franch H, et al. Prevention and treatment of protein energy wasting in chronic kidney disease patients: a consensus statement by the international society of renal nutrition and metabolism. Kidney Int. 2013;84(6):1096–1107. doi: 10.1038/ki.2013.147.
  • Singer MA. Of mice and men and elephants: metabolic rate sets glomerular filtration rate. Am J Kidney Dis. 2001;37(1):164–178. doi: 10.1016/S0272-6386(01)80073-1.
  • Morton AR, Singer MA. The problem with Kt/V: dialysis dose should be normalized to metabolic rate not volume. Semin Dial. 2007;20(1):12–15. doi: 10.1111/j.1525-139X.2007.00232.x.
  • Schadewaldt P, Nowotny B, Strassburger K, et al. Indirect calorimetry in humans: a postcalorimetric evaluation procedure for correction of metabolic monitor variability. Am J Clin Nutr. 2013;97(4):763–773. doi: 10.3945/ajcn.112.035014.
  • Oshima T, Berger MM, De Waele E, et al. Indirect calorimetry in nutritional therapy. A position paper by the ICALIC study group. Clin Nutr. 2017;36(3):651–662. doi: 10.1016/j.clnu.2016.06.010.
  • Harris JA, Benedict FG. A biometric study of human basal metabolism. Proc Natl Acad Sci U S A. 1918;4(12):370–373. doi: 10.1073/pnas.4.12.370.
  • Mifflin MD, St Jeor ST, Hill LA, et al. A new predictive equation for resting energy expenditure in healthy individuals. Am J Clin Nutr. 1990;51(2):241–247. doi: 10.1093/ajcn/51.2.241.
  • Kamimura MA, Avesani CM, Bazanelli AP, et al. Are predictive equations reliable for estimating resting energy expenditure in chronic kidney disease patients? Nephrol Dial Transplant. 2011;26(2):544–550. doi: 10.1093/ndt/gfq452.
  • Dias Rodrigues JC, Lamarca F, Lacroix de Oliveira C, et al. Agreement between prediction equations and indirect calorimetry to estimate energy expenditure in elderly patients on hemodialysis. Espen J. 2014;9(2):e91–e96. doi: 10.1016/j.clnme.2013.12.002.
  • Wu PY, Chen YT, Wong TC, et al. Energy requirement of patients undergoing hemodialysis: a Cross-Sectional study in multiple centers. Biochem Res Int. 2020;2020:2054265. doi: 10.1155/2020/2054265.
  • Fernandes T, Avesani C, Kamimura M, et al. Estimating resting energy expenditure of patients on dialysis: development and validation of a predictive equation. Nutrition. 2019;67–68:110527. doi: 10.1016/j.nut.2019.06.008.
  • Byham-Gray LD, Parrott JS, Peters EN, et al. Modeling a predictive energy equation specific for maintenance hemodialysis. JPEN J Parenter Enteral Nutr. 2018;42(3):587–596.
  • Vilar E, Machado A, Garrett A, et al. Disease-specific predictive formulas for energy expenditure in the dialysis population. J Ren Nutr. 2014;24(4):243–251. doi: 10.1053/j.jrn.2014.03.001.
  • Byham-Gray L, Parrott JS, Ho WY, et al. Development of a predictive energy equation for maintenance hemodialysis patients: a pilot study. J Ren Nutr. 2014;24(1):32–41. doi: 10.1053/j.jrn.2013.10.005.
  • Oliveira B, Sridharan S, Farrington K, et al. Comparison of resting energy equations and total energy expenditure in haemodialysis patients and body composition measured by multi-frequency bioimpedance. Nephrology. 2018;23(8):748–754. doi: 10.1111/nep.13112.
  • Morrow EA, Marcus A, Byham-Gray L. Comparison of a handheld indirect calorimetry device and predictive energy equations among individuals on maintenance hemodialysis. J Ren Nutr. 2017;27(6):402–411. doi: 10.1053/j.jrn.2017.06.011.
  • Hung R, Sridharan S, Farrington K, et al. Comparison of estimates of resting energy expenditure equations in haemodialysis patients. Int J Artif Organs. 2017;40(3):96–101. doi: 10.5301/ijao.5000575.
  • Bailey A, Brody R, Sackey J, et al. Current methods for developing predictive energy equations in maintenance dialysis are imprecise. Ann Med. 2022;54(1):909–920. doi: 10.1080/07853890.2022.2057581.
  • Ponce D, de Goes CR, de Andrade LGM. Proposal of a new equation for estimating resting energy expenditure of acute kidney injury patients on dialysis: a machine learning approach. Nutr Metab. 2020;17(1):96. doi: 10.1186/s12986-020-00519-y.
  • Siga MM, Ducher M, Florens N, et al. Prediction of all-cause mortality in haemodialysis patients using a bayesian network. Nephrol Dial Transplant. 2020;35(8):1420–1425. doi: 10.1093/ndt/gfz295.
  • Xiong CZ, Su M, Jiang Z, et al. Prediction of hemodialysis timing based on LVW feature selection and ensemble learning. J Med Syst. 2018;43(1):18. Published 2018 Dec 13. doi: 10.1007/s10916-018-1136-x.
  • Sheng K, Zhang P, Yao X, et al. Prognostic machine learning models for first-year mortality in incident hemodialysis patients: development and validation study. JMIR Med Inform. 2020;8(10):e20578. doi: 10.2196/20578.
  • Mezzatesta S, Torino C, Meo P, et al. A machine learning-based approach for predicting the outbreak of cardiovascular diseases in patients on dialysis. Comput Methods Programs Biomed. 2019;177:9–15. doi: 10.1016/j.cmpb.2019.05.005.
  • Avesani CM, Kamimura MA, Cuppari L. Energy expenditure in chronic kidney disease patients. J Ren Nutr. 2011;21(1):27–30. doi: 10.1053/j.jrn.2010.10.013.
  • Mullainathan S, Spiess J. Machine learning: an applied econometric approach. J Econ Perspect. 2017;31(2):87–106. doi: 10.1257/jep.31.2.87.
  • Obermeyer Z, Emanuel EJ. Predicting the future – big data, machine learning, and clinical medicine. N Engl J Med. 2016;375(13):1216–1219. doi: 10.1056/NEJMp1606181.
  • Olejnik LA, Peters EN, Parrott JS, et al. Abbreviated steady state intervals for measuring resting energy expenditure in patients on maintenance hemodialysis. JPEN J Parenter Enteral Nutr. 2017;41(8):1348–1355. doi: 10.1177/0148607116660981.
  • Koo T, Li M. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15(2):155–163. doi: 10.1016/j.jcm.2016.02.012.
  • Altman DG, Bland JM. Measurement in medicine: the analysis of method comparison studies. Statistician. 1983;32(3):307–317. doi: 10.2307/2987937.
  • Frankenfield D, Roth-Yousey L, Compher C. Comparison of predictive equations for resting metabolic rate in healthy nonobese and obese adults: a systematic review. J Am Diet Assoc. 2005;105(5):775–789. doi: 10.1016/j.jada.2005.02.005.
  • Essadik R, Msaad R, Lebrazi H, et al. Assessing the prevalence of protein-energy wasting in haemodialysis patients: a cross-sectional monocentric study. Nephrol Ther. 2017;13(7):537–543. doi: 10.1016/j.nephro.2017.02.013.
  • Koefoed M, Kromann CB, Juliussen SR, et al. Nutritional status of maintenance dialysis patients: low lean body mass index and obesity are common, protein-energy wasting is uncommon. PLOS One. 2016;11(2):e0150012. doi: 10.1371/journal.pone.015001240,4.
  • Bailey JL, Wang X, England BK, et al. The acidosis of chronic renal failure activates muscle proteolysis in rats by augmenting transcription of genes encoding proteins of the ATP-dependent ubiquitin-proteasome pathway. J Clin Invest. 1996;97(6):1447–1453. doi: 10.1172/JCI118566.
  • Graham KA, Reaich D, Channon SM, et al. Correction of acidosis in hemodialysis decreases whole body protein degradation. J Am Soc Nephrol. 1997;8(4):632–637. doi: 10.1681/ASN.V84632.
  • Goldstein SL, Ikizler TA, Zappitelli M, et al. Non-infected hemodialysis catheters are associated with increased inflammation compared to arteriovenous fistulas. Kidney Int. 2009;76(10):1063–1069. doi: 10.1038/ki.2009.303.
  • Lacquaniti A, Campo S, Falliti G, et al. Free light chains, high mobility group box 1, and mortality in hemodialysis patients. J Clin Med. 2022;11(23):6904. doi: 10.3390/jcm11236904.
  • Campo S, Lacquaniti A, Trombetta D, et al. Immune system dysfunction and inflammation in hemodialysis patients: two sides of the same coin. J Clin Med. 2022;11(13):3759. doi: 10.3390/jcm11133759.
  • Monardo P, Lacquaniti A, Campo S, et al. Updates on hemodialysis techniques with a common denominator: the personalization of the dialytic therapy. Semin Dial. 2021;34(3):183–195. doi: 10.1111/sdi.12956.
  • Deger SM, Sundell MB, Siew ED, et al. Insulin resistance and protein metabolism in chronic hemodialysis patients. J Ren Nutr. 2013;23(3):e59–e66. ED doi: 10.1053/j.jrn.2012.08.013.
  • Hume R. Prediction of lean body mass from height and weight. J Clin Pathol. 1966;19(4):389–391. doi: 10.1136/jcp.19.4.389.
  • Deurenberg P, Weststrate JA, Hautvast JG. Changes in fat-free mass during weight loss measured by bioelectrical impedance and by densitometry. Am J Clin Nutr. 1989;49(1):33–36. doi: 10.1093/ajcn/49.1.33.
  • Salazar-Robles E, Lerma A, Calderón-Juárez M, et al. Assessment of factors related to diminished appetite in hemodialysis patients with a new adapted and validated questionnaire. Nutrients. 2021;13(4):1371. doi: 10.3390/nu13041371.
  • Kalantar-Zadeh K, Block G, McAllister CJ, et al. Appetite and inflammation, nutrition, anemia, and clinical outcome in hemodialysis patients. Am J Clin Nutr. 2004;80 (2):299–307. doi: 10.1093/ajcn/80.2.299.
  • Detsky AS, McLaughlin JR, Baker JP, et al. What is subjective global assessment of nutritional status? JPEN J Parenter Enteral Nutr. 1987;11(1):8–13. doi: 10.1177/014860718701100108.