1,885
Views
2
CrossRef citations to date
0
Altmetric
Research Article

Comparison of different predicting models to assist the diagnosis of spinal lesions

ORCID Icon, ORCID Icon & ORCID Icon

ABSTRACT

In neurosurgical or orthopedic clinics, the differential diagnosis of lower back pain is often time-consuming and costly. This is especially true when there are several candidate diagnoses with similar symptoms that might confuse clinic physicians. Therefore, methods for the efficient differential diagnosis can help physicians to implement the most appropriate treatment and achieve the goal of pain reduction for their patients.

In this study, we applied data-mining techniques from artificial intelligence technologies, in order to implement a computer-aided auxiliary differential diagnosis for a herniated intervertebral disc, spondylolithesis, and spinal stenosis. We collected questionnaires from 361 patients and analyzed the resulting data by using a linear discriminant analysis, clustering, and artificial neural network techniques to construct a related classification model and to compare the accuracy and implementation efficiency of the different methods.

Our results indicate that a linear discriminant analysis has obvious advantages for classification and diagnosis, in terms of accuracy.

We concluded that the judgment results from artificial intelligence can be used as a reference for medical personnel in their clinical diagnoses. Our method is expected to facilitate the early detection of symptoms and early treatment, so as to reduce the social resource costs and the huge burden of medical expenses, and to increase the quality of medical care.

Introduction

With the continuing advances in medical technology, the average life expectancy of Taiwanese residents is continually increasing. As a result, Taiwan has been officially classified as an aging society since 1993. Since then, the proportion of people over the age of 65 in has increased even further, pushing Taiwan to becoming a super-aged society.Citation1 At the same time, elderly people tend to have various spinal degeneration and chronic diseases that force them to live in pain. Moreover, the lives of people in modern society often include many bad habits and a disorganized daily schedule. They are often too busy to take care of their own health, and they do not know much about disease prevention and health promotion.Citation2 Therefore, more and more young people are also suffering from spinal degenerative diseases. Based on the analysis of the behavior of those who are currently seeking medical care, it has been found that most people seek medical help only after they become aware of the symptoms. Therefore, it is essential for doctors to make an accurate differential diagnosis of diseases, based on a patient’s signs and symptoms during a quick outpatient visit.Citation3 Making an accurate diagnosis in busy outpatient clinics is also an example of the potentially excellent application of computer-aided decision-making. Once an accurate diagnosis has been made, advanced treatment or minimally invasive surgery can be implemented to reduce the delay in disease management, to speed up the patient’s recovery process, and to improve and enhance his or her quality of life.

Article review

Spinal disease

Lower back pain is a common clinical symptom that accounts for 20% of the musculo-skeletal symptoms seen in outpatient clinics.Citation4 The primary objective of this study was to discover how to evaluate patients efficiently and make an accurate differential diagnosis.Citation5 The common causes of chronic lower back pain are osteoarthritis (OA), a herniated intervertebral disc (HIVD), spondylolithesis, myofascial pain syndrome, a compression fracture, seronegative spondyloarthropathy, osteoporosis, referred pain, and a tumor.Citation6 Because the differential diagnosis of lower back pain often takes a long time, doctors should first rule out conditions that require urgent management (e.g., an infection, nerve compression, or a tumor) on the patient’s first visit.Citation4,Citation7 If patients do not have any of these urgent conditions, it is not necessary to arrange an excessive examination, in order to get an immediate clear diagnosis. PCPs should first provide appropriate explanations to a patient to establish a good doctor-patient relationship. It should be emphasized that a detailed medical history and complete physical examination, combined with a basic X-ray examination, is sufficient for diagnosing 90% of patients.Citation7,Citation8 A detailed medical history should include at least the following items: the location, characteristics, duration, radiation, associated symptoms, aggravating factors, illness, trauma, past history, and psychosocial status. Based on the patient’s medical history, lower back pain can be simply divided into two categories: those stemming from mechanical factors and from inflammatory factors. Other important conditions that patients must be asked about are morning stiffness, pain severity during rest, pain severity after exercise, nerve compression, blood inflammatory markers, and other diseases.Citation9,Citation10

The primary motivation for conducting this study on computer-aided diagnosis technologies is to shorten the often time-consuming process of a complete assessment, including X-ray and laboratory tests, in order to reduce the unnecessary pain and burden of patients and to reduce the medical costs.Citation11,Citation12 illustrates three examples of spinal diseases that were the classification target of our study.

Figure 1. Three types of spinal disease with similar symptoms. An X-ray of each is shown.

Figure 1. Three types of spinal disease with similar symptoms. An X-ray of each is shown.

Data mining technology

Due to the application of information technologies in medicine in recent years, medical informatics has been developed.Citation12 The purpose of medical informatics is to use information technologies, combined with a patient- and medical problem-oriented diagnosis model, to gain medical knowledge and find guidelines for the medical treatment of various diseases.Citation13 Therefore, if information technologies can be efficiently applied as reference information for the diagnosis of a disease, it can be of great help in the treatment and prevention of diseases. A data mining technique refers to the process of extracting hidden, previously unappreciated, and potentially useful information from a database, in order to learn about, and identify, important factors.Citation14 Data mining techniques can be used to discover useful information from a large set of data and to provide a reference for policy makers. The entire process of data mining includes data selection, pre-processing, conversion, data mining, interpretation, and evaluation. Currently, data mining techniques have been widely applied in many areas of medical diagnostics. For example, by using tools, such as data mining techniques and Bayesian networks, for the historical data in a medical database, the regularity of information can be presented to determine the doctors’ medical behavior for the same type of disease, to serve as a reference for constructing a clinical pathway.Citation15 In addition, a Linear Discriminant Analysis (LDA), a Principle Component Analysis (PCA), and Artificial Neural Networks (ANNs) have been used to assist machine learning to diagnose glaucoma.Citation16 A model based on association rules has been applied for analyzing associated information in the medical charts of dental clinics, in order to construct a knowledge bank for a dental decision support system that serves as a reference for dentists when making their diagnoses.Citation17 A Multi-group Discriminant Analysis (MDA) was combined with serum testing and radioactive treatment to produce a set of linear functions.Citation18 Receiver Operation Characteristic (ROC) curves were further used to analyze the explanatory power of new variables for predicting if Hepatitis C will develop into liver cirrhosis.Citation19 The FP-Growth algorithm under the association rules has been used to quickly discover the hidden relationships between diseases in the National Health Insurance database.Citation20 The discovered rules include the conditional probability of contracting other diseases, after having had a certain disease. The results of data mining can effectively assist doctors and medical researchers in conducting their medical research. Bayesian networks, decision trees, and Back Propagation Neural network (BPN) algorithms have been applied to breast cancer, to Chinese medicine tongue diagnosis images, and to the management of the health records of patients with diabetes, respectively.Citation21 A C4.5 decision tree analysis and BPNs have been applied in the Therapeutic Drug Monitoring (TDM) of blood vancomycin. Based on the monitoring of historical cases by medical institutions, a classification model has been constructed that can be used to predict the effects of vancomycin in patients, to efficiently assist medical personnel to accurately monitor its treatment effectiveness, and to reduce the potential waste of medical resources.Citation21 The aforementioned related studies all attest to the feasibility of applying information technology techniques for assisting, or even predicting, the results of medical diagnoses.Citation22 In this study, we focused primarily on the classification-based analysis methods. We used a cluster analysis to categorize patients and to consider all their characteristics. Neural networks were further used to construct a prediction model with a high predictive ability. Artificial Neural Networks (ANNs) are computational information processing models that are based on the structure of the human brain. The primary purpose of their initial development was to simulate how neurons work in the human brain. Through the parallel computation of these artificial neurons, large data-sets can be handled. ANNs have already been applied in clinical diagnosis and prediction. For example, Baxt applied ANNs in diagnosing Acute Myocardial Infarction (AMI). After the ANNs were developed, 331 patients were used to verify their accuracy. The sensitivity of the ANNs was 92% and their specificity was 96%, which was a good result, in comparison to a 77.7% sensitivity and 84.7% specificity of the physician AMI diagnoses.Citation23 Pablo Lapuerta et al. used ANNs to predict if patients with alcoholism have severe liver diseases. Their research population consisted of 144 patients suffering from alcoholism. They compared the effectiveness of the predictions by their ANNs by using the Maddrey discriminant function and logistic regression. The ROC values were 81.5% for their ANNs, 73.8% for the Maddrey score, and 78.2% for logistic regression. There was a significant statistical difference, then, between the ROC values of the ANNs and the Maddrey score, but no significant difference between the ANN and logistic regression values.Citation24 Buzatu et al. used ANNs to predict the survival of patients who undergo cardiac surgery. They compared the prediction effectiveness of ANNs with the new and old Denver models. Their study included 1875 patients. And the results showed that ANNs have a 14% error rate in predicting patient survival and a 31% error rate in predicting patient mortality. Compared to the error rates of 15% and 31%, respectively, for the old Denver model, and 18% and 31%, respectively, for the new Denver model, the ANN model constructed by the researchers was not only more accurate than the Denver models, but it was also easier to modify.Citation25 As for research published in Taiwan, Chen Di- Xiang investigated the relationship between various diseases (i.e., the incidence of contracting diseases through the application of association rules) by using data-mining techniques to provide references for future disease prevention and treatment.Citation26 Wu Su-Ying constructed a knowledge management system for disease classification in hospitals by applying data mining techniques.Citation27 Tang Shou-Sheng applied data mining techniques (e.g., regression analysis and ANNs) in the medical prediction of patients with tuberculosis (TB) because increasing the TB cure rate is an important objective of the WHO in the fight against TB. They found that the ANN model generally has a better prediction power, and its MSE value is also lower than the MSE value of a regression analysis.Citation28 Huang Sheng-Chong applied the association rules and classification analysis to find the relationship between the patients’ symptoms and their diseases, in order to guide them in seeking medical help and to serve as a reference for the medical staff in the diagnosis.Citation29 Pan Ya-Shey used data mining technologies to construct diagnostic assessments for psychiatric diseases, diabetics, and renal diseases.Citation30

Research method and procedures

Research protocol

The model construction in this study followed the next steps of the Cross-industry Standard Process for Data Mining (CRISP-DM): (1) Collecting information from the patients’ charts: the primary information source in this study was from the charts of outpatients; (2) Preprocessing the information: screening, deleting and regularizing the information used in this study to reduce noise and incomplete data that may affect the accuracy of the prediction model developed by this study; (3) Filtering the research variables: investigating factors that may affect the targeted spinal diseases, based on the relevant literature, followed by the selection of primary discriminating factors to be integrated as input variables for our study; (4) Training samples and testing sample clusters: applying the K-fold cross-validation technique to group samples into K subset clusters. The K-1 groups were used as the training group and the remaining one group was used as the testing group to test the validity once. Then another group was chosen as the testing group, with the remaining K-1 groups as the training group. The mean accuracy of the analysis model was averaged after ten tests; (5) Constructing a classification model: our study developed a constructing prediction model; (6) Selecting the best model: the best model was chosen from the various trial models by comparing the model accuracy; and (7) Discussing the results and suggested improvements: the results from analysis in this study were integrated, in order to propose the conclusions and suggested improvements.

In this study, we used IBM SPSS Statistics and Modeler as the analysis tools for data mining. These tools combined analysis techniques with many Graphic User Interfaces (GUIs) that provide many algorithms, such as an association analysis, decision tree C5.0, ANNs, classification, clustering, regression, a prediction model, and a sequence pattern analysis, so that the result outputs have the advantage of easy interpretation.

Data collection

The data collection period was from March 2015 to December 2017. The data were collected at the neurosurgical and orthopedic clinic in a medical center in Northern Taiwan. The patients with spinal diseases who first visited the clinic were screened by physicians as candidate research subjects. After a full explanation of the purposes and methods of the study, and after acquiring the patients’ consent, the candidate subjects filled in the questionnaires. The test subjects were between 20 and 85 years old and they met the screening criteria for spinal diseases. Two-hundred-and-eight questionnaires were collected. After excluding eight incomplete data-sets that contain incomplete answers, a total number of 200 effective questionnaires were included. With this sample size, the analysis results were more reliable and had a reference value. Questionnaire information from each patient was keyed into the SPSS software with 46 fields, including their basic information (coding, age, gender, marital status, occupational type, and work styles), their symptoms (total 21 fields) and the test items (total 19 fields). Information from all the fields was keyed into the SPSS software for further analysis.

Research method

The primary data mining techniques used in this study were LDA, clustering, and ANNs.

Linear Discriminant Analysis (LDA)

LDA is a statistical analysis method that is used in conditions with an existing classification scheme. When facing new samples, the criteria were chosen to determine how to place the new samples into the existing categories. LDA is a technique that combines more than two independent variables and dependent variables with multiple measures. In the LDA technique, there should be more than two classification categories, and there should be at least the smallest observed sample size in each category. There are three LDA methods, namely: (1) the Mahalanobis distance: The standardized distance between the new sample and each group was compared, and the new sample was classified into the group with the smallest distance, without considering the prior probability of each group; (2) the Fisher’s linear discriminant function: Multiple variables were first reduced to a single variable after linear combination, and then the membership of the new sample was judged without considering the distribution type and prior probability of all the groups; (3) Baye’s discriminant analysis: This calculated the conditional probability of the new sample’s membership in each group. The new samples were classified into the group with the highest conditional probability. The probability distribution patterns among the groups and prior probabilities need to be considered in this method.

In this study, we used the differential analysis function in the SPSS Statistics software to analyze the data from the samples. Diagnosis is a categorical variable with three possible values (i.e., HIVD, Spondylosis and spinal stenosis). Cases 1, 2, and 3 were hypothesized to the three discrimination values. The same analysis was applied to these three cases by a differential analysis algorithm, based on the calculation of the Mahalanobis and Baye’s distances by MatLab software. A scatterplot for the combined group was further used to observe the critical regions of the samples and to judge if there were any classification misjudgments.

Clustering

The operation of data clustering was conducted by applying the data packet method to distinguish data through the establishment of similar clusters. The clustering method aims to create clusters with the smallest difference and highest similarity in each cluster, and with the maximal difference and lowest similarity between clusters. In short, the clustering technique is used to reduce the complexity of the data and find the common characteristics in the same cluster. The primary purpose of clustering is to find the differences between the clusters, and the similarities within them, to ensure the small within-cluster differences and large between-cluster differences. Currently, the two most commonly-used clustering techniques are the split and hierarchical clustering algorithms. In this study, the clustering group number is known, and therefore, the split clustering algorithm is more appropriate.

Neural networks

A Neural Network (NN) is a type of artificial intelligence. It is also an application that combines medical, mathematical, informational engineering and electrical engineering research. The Artificial Neural Network (ANN) system simulates the operation of cerebral neuronal cells, and it is composed of highly-connected processing units (called nodes or neurons) to form a dynamic operational system. ANNs can use a sample set (i.e., one that is composed of system input and output information) to construct a system model (i.e., the relationship between the inputs and outputs). A system model can be used for estimating, predicting, decision-making and diagnosing. The commonly-seen statistical regression analysis technique can be used as another example. Therefore, ANNs can be considered as a specific form of a statistical technique.

In this study, the network structure of the ANN consisted of a single hidden layer that could provide sufficient accuracy. Therefore, there will only be one hidden layer in the constructed BPN. Because the input layer consisted of quite a number of neurons, we set two neurons in the hidden layer, to conduct testing. The final network output layer consisted of only three neurons: HIVD, spondylosis, and stenosis. As for the network parameter setting, we set the learning rate at 0.2, because a lower learning rate generally leads to a better learning result and we cannot get convergent results with a learning rate larger than 0.3. As for the network training guideline, the training termination condition is when the Root Mean Squared Error (RMSE) is lower, or equal to, 0.0001, or at most 1000 times the training repetitions. The network structure with the smallest RMSE for test data will be selected as the final network structure.

Clinical trial for validating the predictive model

This study is a prospective study that has been reviewed and approved by the “Institutional Review Board Ethics Committee.” The data of 200 cases from between 2015 and 2017 were collected to build a predictive model, and the clinical testing of the predictive model was conducted in the orthopedics clinic of a hospital in northern Taiwan between January and June, 2018. We used the free MIT inventor 2.0 platform creation APP to design a simple APP to assist the spinal disease diagnosis of doctors. Before using the APP, all doctors received three days of training on how to use it. The APP for the assistance of spinal disease diagnosis has analyzed the data of 200 cases and built the keywords and diagnosis database for spinal diseases and physical examinations. We then provided the prediction results of the two algorithms for the doctors’ reference. A total of 27 cases were collected for the experimental group, and finally, the accuracy of the predictive model and the actual diagnosis results of the doctors were again performed on the data of the total number of 227 cases.

Results

Descriptive analysis

As can be seen from , the average age of the 227 cases in this study was 55.89 years old. The female cases were in the majority (64.3%) and the symptoms were mostly pain (94.7%) and anesthesia (52.2%). The main method for seeking treatment was the use of drugs (63.0%), including rehabilitation and alternative therapies. In terms of the etiological analysis, the most common cause was related to work, for all types of reasons, and the common movement postures were about half for standing and half or sitting. As for the severity of pain measured on the standard pain scale, most patients had an average pain score of 6.5, which was moderate. ()

Table 1. Demographic data analysis (N = 227)

Based on the weighted factor analysis in the SPSS Modeler, a physical examination (bodily assessment) has an impact factor of 81.566, which demonstrates that it is the most important factor for influencing the dependent variables. The important factors include soreness, swelling, numbness, tingling, night pain, and discomfort, the time of symptoms occurrence, the duration of the symptoms, pain severity, influence of the symptoms, how the symptoms affect walking and activities, the frequency of symptoms occurrence, the history of seeking treatment, surgery, rehabilitation, massages, cold and hot compresses, medication use, dressings, the causes of the symptoms and current work, and physical examinations, such as a Spurling sign, a Lhermitte sign, a Hoffmann sign, a Right Side Straight Leg Raising Test (SLRTright), and Left Side Straight Leg Raising Test (SLRTleft), big toe flexion, big toe extension, sign reflex, sex, and type ().

Figure 2. Results of the analysis of important factors.

Figure 2. Results of the analysis of important factors.

Linear discriminant analysis

In this study, we used a Linear Discriminant Analysis (LDA) in the SPSS statistics package to conduct an analysis of the data sample. Diagnosis is a categorical marker with three possibilities: HIVD, spondylosis and stenosis. We hypothesized Cases 1, 2, and 3 to represent these three discrimination values. From the confusion matrix, we found that the Case 1 classification has the poorest accuracy (). The overall discrimination accuracy is as high as 90.5%. The same analysis was applied to these three cases to calculate the Mahalanobis and Baye’s distances with the discriminant analysis algorithm, using the MatLab software. The results showed that Fisher’s discriminant analysis has a higher accuracy (). From the scatterplot of the combined group, we can observe that some samples were in the critical region of the relative centroid positions, which caused a mis-classification. This is most obvious for Cases 1 and 2.

Figure 3. Results of discriminant analysis.

Figure 3. Results of discriminant analysis.

Artificial Neural Network (ANN) analysis

In this study, the network structure of the ANN consisted of a single hidden layer that can provide a sufficient accuracy. Therefore, there will only be one hidden layer in the constructed BPN. Because the input layer consisted of quite a number of neurons, we set two neurons in the hidden layer to conduct the testing. The final network output layer consisted of only three neurons: HIVD, spondylosis and stenosis. As for the network parameter setting, we set the learning rate at 0.2 because a lower learning rate generally leads to a better learning result and we cannot get convergent results with a learning rate that is larger than 0.3. As for the network training guideline, the training termination condition is when Root Mean Squared Error (RMSE) is lower, or equal to, 0.0001, or at most 1000 times the training repetitions. The network structure with the smallest RMSE for test data will be selected as the final network structure. The overall accuracy for judgment is 81.6%, with a higher accuracy for Cases 1 and 2. The judgment accuracy is lower for Case 3, so it cannot be judged accurately. The classification accuracy rate for each technique, combined with implementing the clustering algorithm from the Modeler, was organized in .

Figure 4. Results of artificial neural network analysis.

Figure 4. Results of artificial neural network analysis.

Figure 5. Summary of analytic results.

(1Matlab implementation2,4 SPSS Statistics function3 SPSS Modeler function)
Figure 5. Summary of analytic results.

Conclusion

All models performed well in predicting the resolution of different spinal diseases (the ROC curve area ranged from 0.88 to 0.91). The discriminant analysis is statistically better than the neural network models and clustering, and provides a well-calibrated model. Among them, the Fisher discriminant analysis uses projection technology for dimensionality reduction. After the dimension is reduced, the deviation within the group is calculated (which can be compared to the random error in the analysis of variance), and the deviation between the groups is calculated (which is analogous to the deviation between groups between the levels of various factors in the analysis of variance). The convex optimization method is then used to find a straight line, or hyperplane, that minimizes the deviation within the group and maximizes the deviation between groups, to segment the different categories. Since the training data set classes have great differences and there are only subtle differences in each data-set class, the projection method that maximizes the class spacing can obtain good results.

At the ideal threshold, the sensitivity, specificity and accuracy of all models are greater than 80%. We excluded the records of missing or unknown data during the modeling process. Other methods, such as the substitution of the mean or mode values, could have been used to prepare the data-sets for modeling; however, we chose a more conservative data exclusion method, given the uncertainties involved in the substitution methods. We acknowledge that this elimination may have introduced certain biases and that future studies should be conducted to validate our results.

In addition to the prediction results, the discriminant analysis model also provides methods that help to explain the prediction. By comparing the magnitude of the coefficients of the discriminant equation, a discriminant analysis allows one to infer the contribution of the variables to the model. The clustering method cannot provide clear conditional rules. The neural network also does not provide a method that helps to explain the prediction. It is worth mentioning the extended usage of other methods on the disease classification problem. For example, the ensemble method builds a set of base classifiers from the training data, and then classifies them by controlling the weight of each base classifier’s prediction. Using the method of combining the discriminant model with the neural network may further improve the accuracy and generalization ability of the model, but the actual effect still needs to be evaluated. Other machine learning methods, such as SVM or Random forest, are also possible candidate strategies. To implement the model in a hospital, the information department can encapsulate the model as a web service, or an internal calling module for calling from other server script programs, or it can gradually modify the models by updating the data-set base dynamically. Medical units can make practical considerations, based on the cost and business needs.

The purpose of this study is to demonstrate a comparison of the model‘s performance, and not to provide a detailed analysis of each model. All models perform well in predicting the possible categories of different spinal diseases. However, a discriminant analysis is statistically better than the neural network models and clustering. We concluded that the discriminant analysis model is the method of choice for predicting spinal disease in this data-set, given that it has a good classification performance, calibration and potential insight into variable relevance. Its popularity among healthcare researchers is also a major advantage. This model can provide an effective guide for the differentiation of spinal pathologies with similar symptoms, to assist physicians in their diagnosis of diseases and to reduce subjective cognition and judgment. This model can also reduce unnecessary examinations, which will shorten the diagnosis protocol time, it will be helpful in promoting the accuracy of medical diagnosis, and it will reduce the medical expenses and unnecessary wastage. In the future, we will continue to collect data and expand the classification types, in order to investigate the efficiency of other classification methods. We expect to provide practical help for clinical medical care.

Discourse statement

All authors declare that they have no conflict of interest situations (financial or personal interests) that may affect or appear to affect the impartiality and the integrity of the peer review process of the

Additional information

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

References