Full article: Real-time bearing fault classification of induction motor using enhanced inception ResNet-V2

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

The rolling bearing is a vital part used in different rotating electrical devices. Detecting defects in bearings is crucial for the safe operation of these machines. However, it is challenging to use Deep Learning techniques to identify bearing defects when the machine is not under load. To resolve this issue, this paper presents the Constant-Q Non-stationary Gabor Transform with enhanced Inception ResNet-V2, proposed for the early-stage classification of ball bearing faults in induction motors. The proposed model obtained the vibration images, i.e. time-frequency images of unfiltered vibration signals from the laboratory experimental setup. These images are applied to the proposed model, which classifies the ball bearing faults under various load conditions while adjusting its hyperparameter values instead of employing default ones. Furthermore, the model underwent training using k-fold method to assess its resilience with the use of optimal values obtained from hyperparameter tuning. The model is evaluated by performance metrics like F1-score, Recall, Precision, Confusion Matrix and Training time. The proposed model accomplished an average classification accuracy of 99.84% in low load and full load conditions within a few epochs. Ultimately, when compared to Inception-V4 and ResNet-50, which achieved 91.41% and 91.65%, respectively, the experimental findings unambiguously demonstrate the superior performance of the proposed model over both models.

Introduction

Rotating Electrical Machines are an inevitable part of our day-to-day life as we depend on their continuous operation. It is vital to detect faults accurately and timely of the machines. Bearing faults (41%) are the major faults in the motor due to inner race, outer race and ball defects. It can create mechanical noises and reduce the life of the motor (Amar, Gondal, and Wilson Citation2014). When these anomalies are unnoticed in the primary stage, these may lead to unscheduled downtime or even major catastrophic and financial losses. To prevent this kind of situation, it is necessary to diagnose and avert the faults of different parts of the machine (Gangsar and Tiwari Citation2020).

The fault detection intends to examine the acquired data precisely and estimate the state of the interior parts, to take action on its maintenance, more importantly for induction motor (IM) due to their high significance in most of the industries (Gundewar and Kane Citation2022). Existing methodologies mainly focus on model-based and data-driven techniques. Model-based techniques are used to investigate the features of the signals in various domains, such as the time domain, frequency domain and time-frequency domain. These are very helpful to detect the single fault under steady states in the IM.

Combination of model-based and data-driven algorithms are accomplished by Machine Learning (ML) methods, such as artificial neural network, support vector machine (Agrawal and Jayaswal Citation2020), Nearest Neighbour (Glowacz et al. Citation2018), decision tree, random forest and fuzzy logic (Rodríguez and Arkkio Citation2008). The aforementioned methods mainly depend on manual feature extraction, which requires human expertise. Deep Learning (DL) is an alternative approach to overcome the problems of ML. DL patterns can learn the features automatically from the sensor data (Xie, Huang, and Choi Citation2021), which does not require handcrafted features. Advanced direct learning features from the raw signal, frequency data and time-frequency spectrum by using Convolutional Neural Network (CNN) (Jing et al. Citation2017) was discussed for discriminating the faults and accomplish higher classification accuracy. Effective data preprocessing is used to convert the signal into 2-D images. A new CNN pattern that learned the features from the images and classifies the faults is proposed (Wen et al. Citation2017). A unique transfer CNN (Wen, Li, and Gao Citation2020) learned the features from the images which are converted from the raw signals and detect the fault diagnosis. A Deep Convolutional Transfer Learning Network that comprises fault detection and domain adjustment part (Guo et al. Citation2018) was designed for bearing fault classification. In the fault detection part, a 1-D CNN learns the useful features automatically from one machine and uses the domain adjustment to understand the unlabeled data from the new machine. A time-frequency spectrum is used to convert the images from the vibration and current signals and is given to Deep CNN (Shao et al. Citation2019) for classifying the motor faults. A unique information fusion method (Hoang and Kang Citation2019), was used to extract useful information from current signals and fed to CNN for diagnosing the external bearing defects. However, the motor current signal approach effectiveness is notably inferior to that of vibration signals, particularly for bearing faults, and it also necessitates signal preprocessing. A novel approach for intelligent fault diagnosis of rolling bearings is introduced, combining the methods of Short-Time Fourier Transform (STFT) and CNN (Zhang and Deng Citation2023). Continuous Wavelet Transform (CWT) converts the vibration data into time-frequency images, which are given as input to Self-attention lightweight SqueezeNet algorithm, classifying the faults with higher accuracy. In (Grover and Turk Citation2022), four pre-trained models, AlexNet, GoogleNet, VGG-19 and ResNet-50 are able to rapidly learn the features in the bi-spectrum images, which are transformed from the vibration signals and discriminate the faults with higher accuracy. Despite the benefits of the transfer learning model, it has some drawbacks, such as being computationally expensive due to the pretrained network and fine-tuning process. The Wavelet Packet Transform (WPT) is applied to convert non-stationary vibration signals into images, which are then inputted into a novel residual network (ResNet) (Yu et al. Citation2022) for the purpose of diagnosing bearing faults. Time-frequency images obtained from raw vibration signals using CWT and fed into the Transformer classifier to classify known faults (Wu, Triebe, and Sutherland Citation2023). The classifier’s ability to extract advanced features is utilized, and a technique based on Mahalanobis distance is used to detect previously unseen fault conditions effectively. A novel local binary temporal convolution neural network (Xue et al. Citation2023) is used to extract features from the scalogram images obtained from vibration signals. These features are then used to classify different types of bearing faults. An innovative method for accurately detecting Compound Faults in Ball Bearings (Suthar et al. Citation2022) utilizing Multiscale-SinGAN, Heat Transfer Search Optimization, and Extreme Learning Machine. The aforementioned algorithms indicate that CNN-based fault classification achieves greater accuracy but has the following drawbacks.

(i) Traditional deep learning models struggle to accurately classify low-level bearing faults under low load conditions due to insufficient feature extraction from images. These models rely on automatically learning features from raw data, making it difficult to distinguish normal and faulty conditions.
(ii) Deep learning models require large amounts of data for tasks like image recognition or classification. Training involves iteratively adjusting parameters to minimize predicted and actual outputs. However, working with large datasets becomes more complex and time-consuming. As the dataset size increases, the number of epochs required can increase, leading to longer training times for deep learning models.

The major contribution of the work is summarized as follows:

Constant-Q Nonstationary Gabor Transform (CQ-NSGT) is developed to obtain the vibration images from the raw data.
The Enhanced Inception ResNet-V2 algorithm applied for classifying ball-bearing faults in induction motor. By utilizing a deeper architecture, this algorithm aims to extract more comprehensive and relevant information from the complex CQ-NSGT images. Inception ResNet-V2 combines features of residual connections and inception modules, allowing for both depth and breadth in feature extraction. To enhance its effectiveness, the algorithm has been internally improved by expanding the Inception ResNet Blocks A, B, and C, enabling the extraction of more comprehensive and pertinent information from intricate CQ-NSGT images.
The Enhanced Inception ResNet-V2 effectively identified bearing faults with higher accuracy during various load conditions in the early stages and compared with Inception-V4 and ResNet-50.

From here, section 2 discusses the background work, section 3 gives a detailed description of the Enhanced Inception ResNet-V2. The experimental setup and the proposed method-based bearing defect classification and analysis are shown and examined in sections 4, 5 and 6. Section 7 ends with the conclusion and future work.

Background work

Time-Frequency tool (T-F)

The T-F tool is employed to depict the signal correlation between the time and frequency domain. Several methods are capable of creating T-F images from the non-stationary signals such as Short Time Fourier Transform (STFT), Continuous Wavelet Transform (CWT), Constant-Q Transform (CQT) and Non-Stationary Gabor Transform (NSGT). The STFT offers a consistent T-F resolution determined by the size of the window used. However, when dealing with non-stationary vibration signals, it is necessary to have a representation that combines time and frequency information in a single graph, with a varying frequency resolution. This is crucial for accurately tracking frequency fluctuations over time and ensuring proper categorization of faults. CWT provides variable time-frequency resolution, but this method having computational complexity and the CWT provides variable time-frequency resolution, but this method has computational complexity and lacks invertibility, resulting in an inaccurate representation of vibration frequency (Choudhary et al. Citation2023). The NSGT (Dörfler and Matusiak Citation2015) does not allow for perfect reconstruction because it lacks a complete lattice structure, making it challenging to construct an exact dual frame. This limitation results in inaccurate representation of the frequency content of the vibration signals. The original Constant Q Transform (Liu, Yan, and Zhang Citation2016) lacks invertibility and has a greater number of bins (frequency channels) per octave, resulting in potential loss of instant frequency component changes within the vibration signal and increased computational time. To overcome the problem, an approach could be employed to analyze vibration signals that are not stationary and extract valuable features from them while minimizing the computational time needed. The Constant-Q Nonstationary Gabor Transform (CQ-NSGT) is a technique used to analyze non-stationary signals, such as vibrational data. It is based on a time-frequency representation that provides a comprehensive view of the signal’s characteristics. The CQ-NSGT employs a Q-factor, which represents the ratio of the center frequency to the bandwidth, and this ratio remains constant across all windows, as shown in Equationequation (1)(1) $y_{x} (n) = h_{x} (n) e^{i 2 πn f_{x} / f_{m}}, n \in Z$ (1) to (Equation3(3) $CQ - NSGT (k, l) = \sum_{n = 0}^{N} g (j) y_{x}^{*} (n - k)$ (3) ).

This approach improves the frequency resolution for lower frequencies and enhances the time resolution as the frequency increases. One advantage of the CQ-NSGT is that it requires less computational time. In this work, CQ-NSGT is employed to convert non-stationary vibration signals into T-F images. Moreover, the variable frequency resolution of CQ-NSGT makes it particularly suitable for analyzing and detecting non-stationary machine vibration signatures.

(1)

y_{x} (n) = h_{x} (n) e^{i 2 πn f_{x} / f_{m}}, n \in Z

(1)

(2)

Q_{factor} = f_{x} / f_{m}

(2)

(3)

CQ - NSGT (k, l) = \sum_{n = 0}^{N} g (j) y_{x}^{*} (n - k)

(3)

Where $h_{x} (n)$ denotes the center window function, $f_{x}$ represents center frequency and $f_{m}$ denotes the sampling frequency. Once the conjugate of $y_{x} (n)$ is computed, the Constant-Q Transform (CQT) is then evaluated using the specified summation block (Deveci et al. Citation2023).

CNN model

CNNs play an important role in image classification because of their feature extraction ability with good prediction accuracy. Each stage has a convolution layer followed by pooling layer. A normal CNN is built by assembling one or more such two layer stages combined with a fully connected layer. The input data can be current or vibration signals, temperature image and an audio signal from a motor. The activation function is applied to this convolution output to obtain the result from the last layer, as shown in Equationequation (4)(4) $y_{k}^{j} = z (\sum_{y \in N_{j}} y_{k}^{j - 1} * d_{i k}^{j} + b_{i}^{j})$ (4) where $y_{k}^{j}$ is the $k_{th}$ feature map of $j_{th}$ layer, $z$ is the activation function, $N$ denotes the input feature map, $dot (.)$ represents the convolution process, $d$ is the kernel filter, and $c$ denotes the bias (Chen et al. Citation2022; Li et al. Citation2020).

(4)

y_{k}^{j} = z (\sum_{y \in N_{j}} y_{k}^{j - 1} * d_{i k}^{j} + b_{i}^{j})

(4)

Various types of activation functions are tanh, sigmoid, ReLU, LeakyReLU, SeLU, and swish. In this article, ReLU is employed due to its greater performance (Xing, Ma, and Yang Citation2016) and its function is shown in Equationequation (5)(5) $z_{jki} = \max (0, x_{jki})$ (5) .

(5)

z_{jki} = \max (0, x_{jki})

(5)

Inception module

Traditional CNN models go for deeper networks for obtaining different feature maps, which give too many parameters. It is difficult to train and also increase the time and memory consumption. By reducing the complexity, inception block was introduced (Szegedy et al. Citation2015), which comprises multi-parallel convolution layers with various kernel sizes and performs parallel operations to extract more features. illustrates the inception module, which has four layers with 1 × 1 convolution (convolution with kernel size 1 × 1), 3 × 3 convolution, 5 × 5 convolution and a max pooling with kernel dimension 3 × 3. These four layers extract the features from the images and then combine their features into feature map output (filter concatenation). GoogleNet, Inception V3 and Inception V4 models are developed using this module. Inception V4 model has higher accuracy and fastest classification response time due to the use of a greater number of inception blocks than GoogleNet and Inception V3. This process solves the drawback of traditional CNN since parallel structure reduces the vanishing gradient during backpropagation (Zhu et al. Citation2019).

Figure 1. Inception module.

Residual architecture

Residual Network (ResNet), which presents an optimized concept for learning deeper networks, is viewed as the extension of deeper networks. In this architecture, a residual block is a significant part of ResNet − 50 (Wen, Li, and Gao Citation2020), which consists of convolution and batch normalization layer and ReLU activation function. depicts two types of residual block structures, residual block 1 and residual block 2. The foundation of residual blocks is the idea of employing crosscut connections to skip whole convolutional layer blocks using mathematical Equationequation (6)(6) $Y = G (x_{i}) + x_{i}$ (6) . To avert the vanishing gradients problem, these shortcuts support the optimization of training parameters in error backpropagation. This optimization improves the overall performance for fault detection using deeper CNN structures.

Figure 2. (a) Residual block 1 and (b) residual block 1.

(6)

Y = G (x_{i}) + x_{i}

(6)

(7)

Y = G (x_{i}) + Z (x_{i})

(7)

Both residual blocks have three convolution and Batch Normalization layers. In the first residual block, $G$ is the converted signal and $x$ represents the input. The input is added to $G (x)$ by using a shortcut connection, which can be formulated in Equationequation (6)(6) $Y = G (x_{i}) + x_{i}$ (6) . In Equationequation (7)(7) $Y = G (x_{i}) + Z (x_{i})$ (7) , $Z (x)$ represents the shortcut path that contains one convolution and batch normalization layer. ReLU is used in the ResNet architecture to avert the exploding gradient problem and reduce the complexity caused by the convolution layers (Chen et al. Citation2022).

Enhanced inception ResNet – V2

Inception module and residual architecture are combined to form Inception residual networks (Inception ResNet) which is similar to the Inception V4 network in performance. It strongly indicates that utilizing inception networks with residual connections allows for thorough feature extraction from Time Frequency images in terms of depth and breadth. This significantly speeds up training, improves recognition accuracy, and also provides further evidence that inception residual networks outperform higher version inception networks (Deveci et al. Citation2023). The fundamental components of the enhanced Inception ResNet-V2 model are formed in all layers before the final layers. It comprises Inception-ResNet-A, Inception-ResNet-B, Inception-ResNet-C, Reduction A and Reduction B modules (Szegedy et al. Citation2017). The enhanced version of Inception-ResNet-V2 introduces improvements compared to the basic version of Inception ResNet-V2. It still consists of several convolution layers, but with an increased number of Inception-ResNet-A, Inception-ResNet-B, and Inception-ResNet-C blocks. Specifically, it includes 10 Inception-ResNet-A blocks, 20 Inception-ResNet-B blocks, and 10 Inception-ResNet-C blocks. The proposed model main advantage comes from expanding the Inception ResNet blocks A, B, and C. These adjustments are aimed at enhancing the model’s performance, particularly when working with smaller datasets. By implementing these modifications, the model has the potential to improve accuracy, adapt better to datasets with limited samples, and reduce training time. In the initial stage of the Enhanced Inception-ResNet V2, the stem component performs a series of operations on the image, as shown in . Initially, a 3 × 3 convolution layer with 32 filters is applied, which reduces the size of the image to half while generating 32 feature maps measuring 112 × 112 pixels each. Following this, another 3 × 3 convolution layer with 32 filters is employed to process the feature maps, resulting in a further reduction in spatial dimensions. This step produces 32 feature maps that are now 56 × 56 pixels in size. Finally, a 3 × 3 convolution layer with 64 filters is applied to the feature maps. While this operation does not affect the spatial dimensions, it increases the number of feature maps to 64. As a result, the output of the stem is comprised of 64 feature maps measuring 56 × 56 pixels each. These feature maps play a crucial role in the overall network as they serve as the foundation for extracting low-level features from the input image. They are essential for enabling the network to learn higher-level features that are critical for accurate predictions. In depicits, Inception-ResNet A, B, and C blocks are all based on the Inception module, but they differ in terms of the convolution filters they utilize. The Inception-ResNet A block employs two 3 × 3 convolution filters, making it the simplest and smallest among the three blocks. On the other hand, the Inception-ResNet B block incorporates a combination of one 1 × 7 filter and one 7 × 1 filter, providing a larger receptive field to capture longer-range dependencies in the input image. Additionally, the Inception-ResNet C block utilizes a combination of one 1 × 3 filter and one 3 × 1 filter, resulting in a smaller receptive field but higher computational efficiency. These Inception-ResNet A, B, and C blocks are interconnected through residual connections, which enable the network to learn more complex features without significantly increasing its depth. These connections are instrumental in enhancing the network’s accuracy and efficiency in the Inception-ResNet V2 model. In illustrates, Reduction A and Reduction B blocks are present in the Inception-ResNet V2 model, are responsible for down sampling the spatial dimensions of the feature maps. This down-sampling is achieved through a combination of convolution and pooling operations. The Reduction A block reduces the spatial dimensions by half through pooling with a stride of 2, while the Reduction B block maintains the spatial dimensions while reducing the number of output feature maps by a factor of four using a combination of 1 × 1 and 3 × 3 convolution filters. The inclusion of the Reduction blocks in the Inception-ResNet V2 model serves two purposes: reducing the computational cost of the network and improving its invariance to image transformations like translation and scaling (Chen et al. Citation2022; Szegedy et al. Citation2017). An average pooling layer follows to reduce the size of the feature map and then the dropout function of ratio 0.8 is used to lessen overfitting and increase the accuracy. Finally, the softmax layer classifies the faults based on the probabilities of the features.

Figure 3. Enhanced inception ResNet – V2 (stem block).

Figure 4. Inception ResNet A, B and C modules.

Figure 5. Reduction A and B modules.

Experimental setup

The experimental setup comprises the 5-HP three-phase IM as a prime mover coupled with a separately excited DC generator, which is connected to the 5 KW 3PH resistive load as shown in . The inner race fault and outer race fault of bearing are created by electric discharge machining. Also, a real-time ball looseness fault in bearing is also considered. So along with healthy (N) bearing, ball looseness fault (BF) and different bearing fault sizes are used for this research work, i.e., inner race (IRF) and outer race (ORF) as shown in . The motor specifications and the bearing specifications are stated in . Accelerometers (Model: CTC AC102 – 1A) are kept at the horizontal and vertical positions at the non-drive end of the motor housing. The vibration data were captured by the accelerometers for 60 secs with each fault (IRF, ORF and BF) in different bearings of each, along with healthy bearing at 6.25 kHz sampling rate. The vibration signals from the accelerometers are converted into current signal by using vibration transmitter (VAD 300 IP) which is fed to the current module (NI 9246) attached with NI CRIO 9067 in LabVIEW. depicts the time-domain vibrations data gathered for normal and defective bearings, along with the application of various T-F techniques such as CQ-NSGT, STFT, CWT, CQT, and NSGT.

Figure 6. Enhanced inception ResNet-V2 based vibration methodology for IM bearing defect classification.

Figure 7. Various fault condition of bearings.

Figure 8. Time domain vibration signals and various T-F vibration images.

Table 1. Motor and bearing specifications.

Download CSV Display Table

Enhanced inception ResNet-V2 based bearing defect classification

The model proposed in this work is for classifying the bearing defects under variable load conditions using CQ-NSGT with enhanced Inception-ResNet-V2 architecture. depicts the overall process of the proposed method. The proposed method comprises three parts, and in the first part, CQ-NSGT is used to convert unfiltered vibration data into images in T-F domain, as shown in . Subsequently, the image dataset was enhanced by incorporating CQ-NSGT image samples of various bearing fault conditions alongside healthy conditions. This augmentation process involved techniques such as rotation, horizontal flip, and vertical flip, followed by resizing the images to a dimension of 299 × 299. The resulting images were then fed into the Enhanced Inception-ResNet-V2 architecture. There are 1,000 classes in the basic Inception-ResNetV2 network output, and for this work only 4 classes are necessary for bearing fault classification (N, IRF, ORF and BF), and hence, the final layer will have only 4 outputs. The model receives the training images in batches of 32 images for each over 100 epochs during the training phase. The batch training method is useful for reducing the amount of training storage needed to fit the entire model in memory and to accelerate training. The dropout layer’s job is to prevent overfitting and aid in making the trained model more generic. In the last step, bearing defects are detected and classified by softmax layer.

Result and analysis

In this work, 2400 vibration image samples are used from four bearing states (N, IRF, ORF, BF) under variable load conditions. The IM is loaded for 20%, 40%, 60%, 80% and 100% of its full load while capturing the vibration data. Three prominent DL networks namely Inception-V4, ResNet-50 and proposed model are used to demonstrate the work. All of the models are written in python scripts and using TensorFlow framework in Google Colab Pro T4 (High-RAM) GPU. For each model, 1920 vibration images used for training and 480 images for validation, as shown in . The methodology utilized in this study involved two stages: one with default hyperparameter values and the other involving the tuning of hyperparameter values. In the first stage, the Adamax optimizer has a default learning rate hyperparameter set to 0.01, and each model is trained on a default batch size of 32 images over 50 epochs. The goal was to assess which hyperparameters have a greater impact on the classification model performance. To achieve this, we employed grid-search method (Zhang et al. Citation2021) selecting various hyperparameter values and tested all potential configurations to identify the values that enhance the model performance. In the second phase, we excluded optimizers such as Adam and Stochastic Gradient Descent, whose performance fell below the average. After this, we examined how the performance of the models was affected by different learning rate and batch size values. We adjusted the initial learning rate to 0.01 and evaluated its impact on the Adamax optimizer using batch sizes of 32 and 64 for all models. Subsequently, we repeated the process with a reduced initial learning rate of 0.002. To assess the effectiveness of the proposed model and prevent overfitting, a K-fold cross-validation method (Oyedele and Dutta Citation2023) with K = 5 is utilized. This approach divides the dataset into five equal parts, providing thorough evaluation across various data subsets. Additionally, to counter overfitting, an early stopping technique is applied. This method actively monitors the model’s performance on both the training and validation datasets during the training process. It ceases training when the performance on the validation dataset starts to decline, signifying reduced generalization ability, while the model continues to improve on the training dataset. By combining K-fold cross-validation, with early stopping, the model dependability and capacity for generalization are improved, ensuring precise identification of faults in induction motor ball bearings while mitigating the risk of overfitting. EquationEquation (8)(8) $Recall = \frac{TP}{TP + FN}$ (8) to (Equation10(10) $F 1 - score = 2 * \frac{Precision * Recall}{Precision + Recall}$ (10) ) namely precision, F1-Score and Re-call were used to evaluate the proposed algorithm, where TP is True Positive, TN is True Negative, FP is False Positive and FN is False Negative. All of the metrics listed below are presented as percentages. The accuracy vs. epochs and loss vs. epochs graphs for training and validation are shown in for all the DL models used in this work with optimal hyperparameters are Adamax optimizer, learning rate set at 0.002 and batch size of 32.

(8)

Recall = \frac{TP}{TP + FN}

(8)

(9)

Precision = \frac{TP}{TP + FP}

(9)

(10)

F 1 - score = 2 * \frac{Precision * Recall}{Precision + Recall}

(10)

After completing the experiments, sufficient evidence was gathered to validate our hypothesis that default hyperparameter values may not always yield optimal performance for a specific classification problem. It became apparent that individual hyperparameters can impact classifier performance independently, while in other instances, performance enhancements result from the relationship between multiple hyperparameters. displays the outcomes from evaluations using default values over 50 epochs, whereas showcases the results attained with hyperparameters tuned by the Adamax optimizer. These experiments solidified the notion that hyperparameter tuning plays a pivotal role in addressing the bearing fault classification issue effectively. As a result, it can be deduced that the combination of hyperparameters significantly enhances the performance for bearing fault classification. The optimal performance for the bearing fault classification model yielded an accuracy of 99.84% for Enhanced Inception ResNet-V2, 91.41% for Inception-V4, and 91.65% for ResNet-50 when utilizing vibration images along with optimal hyperparameters: Adamax optimizer, a learning rate set at 0.002, and a batch size of 32.

Table 2. Comparison of the enhanced inception ResNet-V2 with CQ-NSGT to the existing techniques using default hyperparameters.

Download CSV Display Table

Table 3. Comparison of the enhanced inception ResNet-V2 with CQ-NSGT to the existing techniques using after tuning the hyperparameters.

Download CSV Display Table

illustrates the average accuracies from various independent K-folds in cross-validation. The K-fold cross-validation was performed with 1-fold, 2-fold, 3-fold, 4-fold, and 5-fold configurations across all models using the optimal hyperparameters. The proposed model achieved accuracies of 98.75%, 99.37%, 98.75%, 100%, and 100%, respectively. Notably, the proposed model obtained an average validation accuracy of 99.37%. Upon comparing the accuracies, it is evident that all values closely approached the highest accuracy achieved by the optimal proposed model, with no significant drops observed in validation accuracy across different folds. This consistency suggests that the proposed model is likely to maintain high accuracy levels even under varied training conditions using the same dataset. Additionally, when contrasted with the Inception-V4 and ResNet-50 models, the proposed model significantly outperformed them, with Inception-V4 and ResNet-50 achieving slightly lower average accuracies of 92.83% and 94.91%, respectively. This K-fold experiment unequivocally demonstrates the superior performance of the proposed model over Inception-V4 and ResNet-50.

Figure 9. Results obtained from using different time frequency techniques across all models.

The average accuracy results from the utilization of various T-F methods including STFT, CWT, NSGT, CQT, and CQ-NSGT across all models are illustrated in . The proposed model utilizing CQ-NSGT demonstrated a significantly higher average accuracy of 99.84% in , surpassing the average accuracies of 90.52% for STFT, 92.19% for CWT, and 89.54% for NSGT. This comparison indicates that the proposed model with CQ-NSGT outperforms the other models employing different T-F methods, as they achieved lower accuracy. Hence, it can be inferred that the proposed model with CQ-NSGT outshines the other models with diverse T-F methods. illustrates comparison experiments conducted under varying load conditions to evaluate the predictive accuracy of the proposed model, Inception-V4, and ResNet-50 for different bearing fault scenarios. The proposed model achieved high classification accuracy of 99.80% under low load and 99.88% under full load, whereas Inception-V4 and ResNet-50 achieved lower accuracy values of around 91% under both conditions. In this comparison, Enhanced Inception ResNet-V2 outperforms Inception-V4 and ResNet-50, demonstrating its superior performance in predicting bearing faults. shows the comparison of the proposed model with the other two models using a confusion matrix to evaluate their classification performance under various load conditions. The rows in the confusion matrix represent the actual label, while the columns represent the predicted label for various bearing condition. The proposed model achieved 100% accuracy for different bearing conditions. .

Figure 10. Accuracy vs epochs and loss vs epochs (a) inception-V4 (b) ResNet-50 and (c) proposed method (enhanced inception ResNet-V2).

Figure 11. Accuracy vs Epochs and Loss vs Epochs (a) Inception-V4 (b) ResNet-50 and (c) Proposed Method (Enhanced Inception ResNet-V2).

Figure 12. Comparison of the accuracy achieved through 5-fold cross-validation between enhanced inception ResNet-V2, inception-V4, and ResNet-50 using CQ-NSGT.

Figure 13. Confusion matrix (a) inception-V4 (b) ResNet-50 and (c) proposed method (enhanced inception ResNet-V2).

For comparison purposes, the confusion matrix of the best performance by the Inception-V4 and ResNet-50 model as shown in . Both Inception-V4 and ResNet-50 models obtained 100% accuracy for N and BF but misclassified IRF and ORF under various load conditions. The results of this study clearly indicate that the Enhanced Inception ResNet-V2 model outperforms the other two models in bearing fault diagnostic capacity.

In , the proposed model demonstrated a remarkable 100% classification accuracy for all bearing faults, while the Inception-V4 and ResNet-50 models achieved 100% accuracy for BF and N, for IRF 74.8% and 75%, for ORF 90.86% and 91.46%, respectively. Furthermore, the misclassification rates for IRF and ORF were 25.2% and 25% for Inception-V4, and 9.14% and 8.54% for ResNet-50. presents the performance metrics obtained from various models, including Inception-V4, ResNet-50 and the proposed model, for evaluating model performance. The results shown in shows that the proposed model achieved 100% precision, F1 score, and recall, while Inception-V4 and ResNet-50 obtained lower precision, f1-score, and recall percentages of 89.07%, 93.75%, 91.41%, and 92.9%, 90.4%, 91.65%, respectively. Similarly, in comparison with the training duration of different models displayed in , the proposed model exhibited a shorter training time of 21 min and 1 s, while Inception-V4 and ResNet-50 required 23 min and 17 s, and 31 min and 10 s, respectively. Consequently, the proposed model surpasses both models in terms of precision, F1-Score, Recall Score, and training efficiency. The proposed Enhanced Inception ResNet-V2 based vibration methodology is compared with other existing approaches in , based on fault type, acquired signature, and applied methods. In contrast to the methods discussed in the literature, the proposed method has demonstrated significant performance across different bearing fault conditions and varying speed conditions for diagnosing bearing faults. The proposed approach surpassed both Inception-V4 and ResNet-50 by harnessing the strengths of both algorithms, resulting in more extensive and deeper feature extraction from the T-F images. Additionally, the use of hyperparameter tuning techniques improved the model’s stability for classifying bearing faults, leading to higher accuracy in diagnosing bearing defects in real-world industrial machinery applications such as pumps, compressors, wind turbines, and generators.

Table 4. CQ-NSGT images used for training and validation and each class classification accuracy.

Download CSV Display Table

Table 5. Performance metrics and training time.

Download CSV Display Table

Table 6. Comparison of the proposed technique to the existing techniques.

Download CSV Display Table

Conclusion

In this article, the CQ-NSGT is used with improved Inception ResNet-V2 model for real-time fault classification of the ball bearing in an IM. The unfiltered vibration data was collected from the built-in experimental setup and then transformed into images by using CQ-NSGT, which are applied to the DL algorithms. With the help of data augmentation, the proposed approach classified the bearing faults under varying load conditions using a reduced number of image samples during the tuning of its hyperparameter values instead of utilizing default values. Additionally, the model underwent training using multiple k values to evaluate its robustness with the utilization of optimal values derived from hyperparameter tuning. The proposed model achieved perfect precision, F1 score, and recall at 100%, whereas Inception-V4 and ResNet-50 achieved lower percentages of precision, F1 score, and recall at 89.07%, 93.75%, 91.41%, and 92.9%, 90.4%, 91.65%, respectively. Similarly, the proposed model demonstrated a shorter training duration of 21 min and 1 s, compared to Inception-V4 and ResNet-50, which required 23 min and 17 s, and 31 min and 10 s, respectively. But the proposed model has the high average accuracy of 99.84% under both low load and full load conditions compare to other models. From the results, it can be concluded that the enhanced Inception ResNet-V2 model outperforms the other models. In the future work, we can try to implement the severity of the bearing fault using a DL model under different load conditions.

Future scope

Utilizing DL models on edge computing devices, such as low-cost microcontrollers equipped with vibration and acoustic sensors, enables real-time diagnosis of bearing faults. This approach minimizes downtime and enables predictive maintenance.
The DL models need to be designed to diagnose bearing faults, considering a range of environmental and operational conditions, along with diverse industrial machinery such as pumps, compressors, wind turbines, and generators.
DL models trained on a specific type of machinery data may not effectively apply to other types of machinery. Hence, domain adaptation methods are necessary to allow the models to adjust to new machine types with minimal retraining, facilitating proactive maintenance.

Supplemental material

Supplementary Material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/08839514.2024.2378270

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data sets generated during and analyzed during the current study are available from the corresponding author on reasonable request.

Additional information

Funding

This work was supported by the Vellore Institute of Technology SEED Grant - RGEMS Fund [Grant Number SG20220084].

References

Agrawal, P., and P. Jayaswal. 2020. Diagnosis and classifications of bearing faults using artificial neural network and support vector machine. Journal of the Institution of Engineers (India): Series C 101 (1):61–23. doi:10.1007/s40032-019-00519-9.
Google Scholar
Alexakos, C. T., Y. L. Karnavas, M. Drakaki, and I. A. Tziafettas. 2021. A combined short time Fourier transform and image classification transformer model for rolling element bearings fault diagnosis in electric motors. Machine Learning and Knowledge Extraction 3 (1):228–42. doi:10.3390/make3010011.
Google Scholar
Amar, M., I. Gondal, and C. Wilson. 2014. Vibration spectrum imaging: A novel bearing fault classification approach. IEEE Transactions on Industrial Electronics 62 (1):494–502. doi:10.1109/TIE.2014.2327555.
Web of Science ®Google Scholar
Chen, Y., Y. Lin, X. Xu, J. Ding, C. Li, Y. Zeng, W. Liu, W. Xie, and J. Huang. 2022. Classification of lungs infected COVID-19 images based on inception-ResNet. Computer Methods and Programs in Biomedicine 225:107053. doi:10.1016/j.cmpb.2022.107053.
PubMed Web of Science ®Google Scholar
Choudhary, A., R. K. Mishra, S. Fatima, and B. K. Panigrahi. 2023. Multi-input CNN based vibro-acoustic fusion for accurate fault diagnosis of induction motor. Engineering Applications of Artificial Intelligence 120:105872. doi:10.1016/j.engappai.2023.105872.
Web of Science ®Google Scholar
Deveci, B. U., M. Celtikoglu, O. Albayrak, P. Unal, and P. Kirci. 2023. Transfer learning enabled bearing fault detection methods based on image representations of single-dimensional signals. Information Systems Frontiers 1–53. doi:10.1007/s10796-023-10371-z.
Web of Science ®Google Scholar
Dörfler, M., and E. Matusiak. 2015. Nonstationary Gabor frames-approximately dual frames and reconstruction errors. Advances in Computational Mathematics 41 (2):293–316. doi:10.1007/s10444-014-9358-z.
Web of Science ®Google Scholar
Gangsar, P., and R. Tiwari. 2020. Signal based condition monitoring techniques for fault detection and diagnosis of induction motors: A state-of-the-art review. Mechanical Systems and Signal Processing 144:106908. doi:10.1016/j.ymssp.2020.106908.
Web of Science ®Google Scholar
Glowacz, A., W. Glowacz, Z. Glowacz, and J. Kozik. 2018. Early fault diagnosis of bearing and stator faults of the single-phase induction motor using acoustic signals. Measurement 113:1–9. doi:10.1016/j.measurement.2017.08.036.
Web of Science ®Google Scholar
Grover, C., and N. Turk. 2022. A novel fault diagnostic system for rolling element bearings using deep transfer learning on bispectrum contour maps. Engineering Science and Technology, an International Journal 31:101049. doi:10.1016/j.jestch.2021.08.006.
Google Scholar
Gundewar, S. K., and P. V. Kane. 2022. Bearing fault diagnosis using time segmented fourier synchrosqueezed transform images and convolution neural network. Measurement 203:111855. doi:10.1016/j.measurement.2022.111855.
Web of Science ®Google Scholar
Guo, L., Y. Lei, S. Xing, T. Yan, and N. Li. 2018. Deep convolutional transfer learning network: A new method for intelligent fault diagnosis of machines with unlabeled data. IEEE Transactions on Industrial Electronics 66 (9):7316–25. doi:10.1109/TIE.2018.2877090.
Web of Science ®Google Scholar
Hoang, D. T., and H. J. Kang. 2019. A motor current signal-based bearing fault diagnosis using deep learning and information fusion. IEEE Transactions on Instrumentation and Measurement 69 (6):3325–33. doi:10.1109/TIM.2019.2933119.
Web of Science ®Google Scholar
Jing, L., M. Zhao, P. Li, and X. Xu. 2017. A convolutional neural network based feature learning and fault diagnosis method for the condition monitoring of gearbox. Measurement 111:1–10. doi:10.1016/j.measurement.2017.07.017.
Web of Science ®Google Scholar
Li, H., J. Huang, X. Yang, J. Luo, L. Zhang, and Y. Pang. 2020. Fault diagnosis for rotating machinery using multiscale permutation entropy and convolutional neural networks. Entropy 22 (8):851. doi:10.3390/e22080851.
PubMed Web of Science ®Google Scholar
Liu, T., S. Yan, and W. Zhang. 2016. Time–frequency analysis of nonstationary vibration signals for deployable structures by using the constant-Q nonstationary gabor transform. Mechanical Systems and Signal Processing 75:228–44. doi:10.1016/j.ymssp.2015.12.015.
Web of Science ®Google Scholar
Oyedele, O., and H. Dutta. 2023. Determining the optimal number of folds to use in a K-fold cross-validation: A neural network classification experiment. Research in Mathematics 10 (1):2201015. doi:10.1080/27684830.2023.2201015.
Google Scholar
Rodríguez, P. V. J., and A. Arkkio. 2008. Detection of stator winding fault in induction motor using fuzzy logic. Applied Soft Computing 8 (2):1112–20. doi:10.1016/j.asoc.2007.05.016.
Web of Science ®Google Scholar
Shao, S., R. Yan, Y. Lu, P. Wang, and R. X. Gao. 2019. DCNN-based multi-signal induction motor fault diagnosis. IEEE Transactions on Instrumentation and Measurement 69 (6):2658–69. doi:10.1109/TIM.2019.2925247.
Web of Science ®Google Scholar
Suthar, V., V. Vakharia, V. K. Patel, and M. Shah. 2022. Detection of compound faults in ball bearings using multiscale-SinGAN, heat transfer search optimization, and extreme learning machine. Machines 11 (1):29. doi:10.3390/machines11010029.
Web of Science ®Google Scholar
Szegedy, C., S. Ioffe, V. Vanhoucke, and A. Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence, San Fransico, California, USA. 1. Vol. 31. doi:10.1609/aaai.v31i1.11231.
Google Scholar
Szegedy, C., W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA. 1–9. doi:10.1109/CVPR.2015.7298594.
Google Scholar
Wen, L., X. Li, and L. Gao. 2020. A transfer convolutional neural network for fault diagnosis based on ResNet-50. Neural Computing & Applications 32 (10):6111–24. doi:10.1007/s00521-019-04097-w.
Web of Science ®Google Scholar
Wen, L., X. Li, L. Gao, and Y. Zhang. 2017. A new convolutional neural network-based data-driven fault diagnosis method. IEEE Transactions on Industrial Electronics 65 (7):5990–98. doi:10.1109/TIE.2017.2774777.
Web of Science ®Google Scholar
Wu, H., M. J. Triebe, and J. W. Sutherland. 2023. A transformer-based approach for novel fault detection and fault classification/diagnosis in manufacturing: A rotary system application. Journal of Manufacturing Systems 67:439–52. doi:10.1016/j.jmsy.2023.02.018.
Web of Science ®Google Scholar
Xie, T., X. Huang, and S.-K. Choi. 2021. Intelligent mechanical fault diagnosis using multisensor fusion and convolution neural network. IEEE Transactions on Industrial Informatics 18 (5):3213–23. doi:10.1109/TII.2021.3102017.
Web of Science ®Google Scholar
Xing, C., L. Ma, and X. Yang. 2016. Stacked denoise autoencoder based feature extraction and classification for hyperspectral images. Journal of Sensors 2016:1–10. doi:10.1155/2016/3632943.
Web of Science ®Google Scholar
Xiong, J., C. Li, C.-D. Wang, J. Cen, Q. Wang, and S. Wang. 2021. Application of convolutional neural network and data preprocessing by mutual dimensionless and similar gram matrix in fault diagnosis. IEEE Transactions on Industrial Informatics 18 (2):1061–71. doi:10.1109/TII.2021.3073755.
Web of Science ®Google Scholar
Xue, Y., R. Yang, X. Chen, Z. Tian, and Z. Wang. 2023. A novel local binary temporal convolutional neural network for bearing fault diagnosis. IEEE Transactions on Instrumentation and Measurement 72:1–13. doi:10.1109/TIM.2023.3298653.
PubMed Web of Science ®Google Scholar
Yu, X., Z. Liang, Y. Wang, H. Yin, X. Liu, W. Yu, and Y. Huang. 2022. A wavelet packet transform-based deep feature transfer learning method for bearing fault diagnosis under different working conditions. Measurement 201:111597. doi:10.1016/j.measurement.2022.111597.
Web of Science ®Google Scholar
Zhang, M., H. Li, S. Pan, J. Lyu, S. Ling, and S. Su. 2021. Convolutional neural networks-based lung nodule classification: A surrogate-assisted evolutionary algorithm for hyperparameter optimization. IEEE Transactions on Evolutionary Computation 25 (5):869–82. doi:10.1109/TEVC.2021.3060833.
Web of Science ®Google Scholar
Zhang, Q., and L. Deng. 2023. An intelligent fault diagnosis method of rolling bearings based on short-time Fourier transform and convolutional neural network. Journal of Failure Analysis & Prevention 23 (2):1–17. doi:10.1007/s11668-023-01616-9.
Web of Science ®Google Scholar
Zhu, Z., G. Peng, Y. Chen, and H. Gao. 2019. A convolutional neural network based on a capsule network with strong generalization for bearing fault diagnosis. Neurocomputing 323:62–75. doi:10.1016/j.neucom.2018.09.050.
Web of Science ®Google Scholar

Real-time bearing fault classification of induction motor using enhanced inception ResNet-V2

ABSTRACT

Introduction