Full article: Real-time reading system for pointer meter based on YolactEdge

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Despite the extensive deployment of digital instruments in modern times, their stability is challenging to maintain in adverse environmental conditions such as extreme temperatures, pressure, or powerful electromagnetic radiation. Analog meters, owing to their mechanical resilience and electromagnetic impedance, persistently find usage across nuclear power plants, petroleum, and chemical industries. However, under these harsh conditions, manual reading of the instruments may prove to be difficult and dangerous while failing to meet the requirements of real-time monitoring. In recent years, several machine vision-based meter reading systems have been proposed, however, achieving high accuracy through camera-based methods under varying angles and lighting conditions poses significant challenges. Cloud deployment may compromise plant privacy, while edge computing faces limitations in real-time meter reading due to limited computing power. To address these issues, we propose a real-time reading system based on the YolactEdge instance segmentation framework for single-point analog meters. Our system is more accurate than previous studies and is implemented and deployed on the Jetson Xavier NX edge computing device. Our performance evaluation shows that our model outperforms other baselines, with low reference values and relative errors of 0.0237% and 0.0300%, respectively, and an average inference speed of 10.26 FPS with INIT 8 linear acceleration on Nvidia Jetson NX.

Keywords:

1. Introduction

Advances in technology have led to the replacement of analog instruments with digital ones in many industries. However, as mentioned by Donnelly et al. (Citation2006), digital instruments cannot be used in hazardous environments. To address this challenge, we present a high-availability analog meter reading system suitable for use in the petroleum, electricity, and chemical industries, where analog meters are still widely used due to their mechanical stability and resistance to electromagnetic interference. The traditional method of manual meter reading is outdated, being both inefficient and inaccurate, and unable to meet the demand for real-time data. Although meter reading systems based on cloud computing have appeared, these systems carry the risk of privacy violation due to the presence of commercial information about factories and enterprises in meter data (Long et al., Citation2022; Zhang et al., Citation2023). Additionally, a cloud-based approach requires appropriate network resource allocation strategies (Chen et al., Citation2023a) and hardware computing power allocation strategies (Liang et al., Citation2021) to achieve optimal performance. Therefore, we aim to propose a lightweight system that can run offline in real-time on edge computing devices even when facing various network attacks (Fan et al., Citation2023), without the need to synchronise data in real-time to the Industrial Internet of Things (IIOT) (Liang et al., Citation2021).

To clearly express the design intent, we summarise the performance of a analog meter reading system with high availability in the following four aspects:

Accuracy. Accuracy is the most critical evaluation index for evaluating meter reading systems. The meter reading system must genuinely reflect the current system data, reflecting the real system situation. Previous researchers have done numerous studies on traditional vision-based meter reading systems. However, many engineering practices have proven that traditional vision cannot overcome environmental interference to achieve good accuracy in real application scenarios. Even if with image enhancement algorithms (Liang et al., Citation2021; Xiao et al., Citation2022), e.g. to detect meter pointers and scales in complex backgrounds. Compared with the very low robustness of traditional machine vision in real application scenarios, the deep vision applied to meter readings can effectively separate the meter from its natural environment. In particular, the instance segmentation technique applied to meter readings can effectively separate the meter from its natural environment under complex lighting conditions. It can more accurately divide the dial, pointer, and background into multiple instances.
Real-time. The location of meters in the plant is relatively scattered, and the meter data is in real-time changes. The meter reading system, as part of the overall project, if not real-time data reading and feedback, cannot accurately reflect the overall project system situation and cannot be timely in the stress state of early warning (Chen et al., Citation2023b). Traditional methods usually run very fast due to their low requirements for arithmetic power. Still, traditional object detection and segmentation algorithms are less accurate, robust, and sensitive to light. If instance segmentation is used to identify meter data, it has high accuracy in natural environments. However, this approach is computationally demanding and difficult to monitor in real-time, especially if it is deployed on a resource-constrained edge computing devices (Diao et al., Citation2022; Hu et al., Citation2022).
Stability. Stability is reflected in both hardware design and software design. Since no hardware design is involved in this paper, only the software stability is discussed. Software stability is reflected primarily in the visual recognition algorithm that can adapt to various interference factors in the natural environment to locate the key points of the meter stably and accurately.
Compatibility. Analog meter types and specifications are diverse. If not compatible with various meter types, the system is difficult to popularise in practical applications. The table reading system using instance segmentation (Zuo et al., Citation2020) has better generalisation capability because of its larger parameter space. Still, its labelled dataset consumes more time (Lauridsen et al., Citation2019; W. Liu et al., Citation2016) and is much more computationally expensive than the keypoint detection-based approach. On the other hand, the capability to quickly transfer learning to the added meter features (Li et al., Citation2022; Liang et al., Citation2022) also helps to improve the compatibility of the meter reading system.

The current meter reading systems are composed of four stages. In Stage 1, the meter dial is located and identified, while in Stage 2 the camera angle is corrected to reduce errors in angle calculation. Stage 3 involves identifying key points such as hands, dial centre, and scale in the corrected image. In Stage 4, the angle between the tip of the pointer, the dial centre, and the starting scale of the meter is calculated to obtain the meter reading. However, despite recent improvements in each stage, the overall identification framework is still deemed insufficient. There are a few end-to-end systems provided by Charles et al. (Citation2021), but the framework still requires improvement. For example, the instance segmentation by He et al. (Citation2017) is used to extract dials and pointers from images, and Zuo et al. (Citation2020) determines the centreline location of the pointer by binarizing the pointer mask. Identifying the pointer tip through key points is still necessary to distinguish the direction of the pointer.

The current meter reading systems have limitations. One challenge is obtaining a large dataset with various types of meters, as instrumentation data is often kept confidential. Another challenge is with stages 2 and 4 of the meter reading framework. Stage 2, which involves using affine or perspective transformations, requires finding a suitable transformation reference point in the instrument image. Incorrectly choosing this point can result in increased skew or image deformation and distortion. Stage 4, which uses the Hough transform (Illingworth & Kittler, Citation1987) or a binarized pointer mask to fit the meter pointer's centreline, is prone to low robustness and accuracy issues. The Hough transform is easily affected by noise and light, and the accuracy of the binarized pointer mask depends on the pointer's width.

Our contribution in this paper to the field of meter reading systems is significant and aimed at addressing real-world requirements with improved accuracy, speed, stability, and compatibility. We have made substantial progress in the meter reading system, focusing on enhancing these key aspects, as evidenced by the following crucial contributions:

This paper presents a new perspective that the steps of scale location and pointer location not only accumulate errors but also reduce the generalisation ability of the model when using the appearance characteristics of scales and pointers for instrument reading. We redefine the steps of the reading system and improve the compatibility and accuracy of the analog meter reading system by using two models, a small model (SSD W. Liu et al., Citation2016) and a large model (YolactEdge H. Liu et al., Citation2021) to read the meter cooperatively, the framwork as shown in Figure . Our method showed improved results compared to state-of-the-art methods on a public dataset, with average reference error reduced by a factor of 90.08 and average relative error reduced by a factor of 5.69. Additionally, retraining the SSD model takes only 43 minutes (on Nvidia RTX 3090) when adding new meters.
Another point made in this paper is that the camera angle correction stage not only consumes significant computing power but also accumulates errors. Instead, we propose a method that uses sector masks to accurately calculate readings and eliminates the step of camera angle correction. Our system can perform real-time inference at 41.7 FPS on 720P video with an Nvidia RTX 3090 and 9.77 FPS on a Jetson Xavier NX with TensorRT acceleration. Furthermore, when the camera and dial angles fall between 60 and 120 degrees, our system maintains an average reference error of less than 0.16%.

Figure 1. The general framework of the system.

The remainder of this paper is as follows. In Section 2, we provide a review of related works, while in Section 3 we outline the problem and present an overview of our system. Section 4 delves into the key technologies used in the system, the experimental design and the testing environment are detailed in Section 5. Experimental results are discussed in Section 6, and finally, we acknowledge the projects on which this work is based and the financial support received in Section 7.

2. Related work

Currently, most of the research focuses on the first three phases of the upgraded meter reading system, viz. dial identification and positioning, camera angle correction, and key point identification. To tackle the challenges posed by the tilt, rotation, occlusion, distortion, and uneven illumination of meter photos collected in the field, various recognition and correction methods have been proposed. These methods can be grouped into two categories: those based on traditional computer vision techniques and those relying on deep vision methods. Although instance segmentation-based methods are gaining popularity due to their enhanced robustness, their low training and detection speeds continue to hinder wider adoption.

2.1. Traditional computer vision-based methods

In conventional machine vision, Hough transforms are commonly used for circular edge and scale mark detection (Illingworth & Kittler, Citation1987). For instance, Sablatnig and Kropatsch (Citation1994) employed the Hough transform to detect the instrument contour and SIFT to correct camera deflection, and Mai et al. (Citation2018) designed a substation meter reading system based on this method. Chi et al. (Citation2015) utilised the region growing method to locate the dial region and its centre, enhancing the accuracy of circular edge and scale marking detection. Zheng et al. (Citation2016) applied Multi-Scale Retinex technology in meter reading systems to improve dial recognition under varying lighting conditions. Similarly, J. Wang et al. (Citation2018) designed an automatic meter reading system for analog meters in power plants using the Hough transform. Bao et al. (Citation2019) proposed a computer vision measurement method based on inverse perspective mapping to reduce errors caused by the perspective transformation of dials. Lauridsen et al. (Citation2019) utilised a Gaussian mean adaptive threshold to binarize the filtered image, which is robust to lighting variations, and applied K-means and PCA for edge detection and pointer angle determination, improving the accuracy of the meter reading system.

While traditional computer vision-based methods tend to run faster, they are less accurate and robust, susceptible to illumination and background clutter. Despite efforts to enhance the accuracy of these methods in natural environments, they still struggle to handle variations in lighting and background, and their weak generalisation capability limits their use to specific types of meters.

2.2. Deep vision methods

While much of the previous work is still using traditional methods, a growing trend was observed toward applying deep learning to tasks. For example, Y. Liu et al. (Citation2020) proposed to use a region-based convolutional network (FASTER R-CNN) to detect the position of the target meter, which is faster and compatible with multiple sizes of the picture, through the feature pair for perspective transformation and then used the Hough transform to detect the pointer position, through the angle for reading. This system is mounted on top of a meter reading robot and can adjust the camera position by detecting the meter position. In the same year, Zhuo et al. (Citation2020) proposed to use the centripetally of the meter panel's circular tick lines to calculate the centre position of the pointer's rotation, but only for camera use that perfectly faces the dial. Zuo et al. (Citation2020) proposed the use of PrROIPooling instead of ROI-Align to improve the accuracy of Mask-RCNN for dial and pointer positioning. Correcting the binary mask of the dial and the pointer usinf a perspective transformation and finally judging the reading by calculating the slope of the pointer, the author does not seem to consider that the dial will also rotate in the plane. These works often use deep learning methods only in the identification and localisation sections, possibly due to a serious lack of suitable datasets because factory instrumentation data is often confidential, and only (Lauridsen et al., Citation2019; W. Liu et al., Citation2016) provide datasets in these deep-vision-based studies. Moreover, Howells et al. (Citation2021) provided unlabelled meter videos. Ma and Jiang (Citation2018) proposed a semisynthetic dataset that generated a large number of gauges and a spurious background environment through 3D modelling software, but this was limited to a very small range of meter types. Charles et al. (Citation2021) designed a meter reading system that could run on a mobile phone, using fully synthesised data for digital identification, the picture labels of each meter were final readings, and good performance was achieved in real-world testing, with the disadvantage that each table needed to be separately trained on a CNN model. The CNN-based meter reading system developed by Howells et al. (Citation2021), can run on the mobile phone side, with an angle error of $< 1^{\circ}$ and a running speed of up to 25 FPS on mobile devices. However, such models trained using CNN networks do not recognise shadows well, especially for small targets, e.g. pointers, and distinguishing between their ontology and shadows with high precision is difficult.

2.3. Real-time instance segmentation

Balancing system compatibility and accuracy is a challenge when using an end-to-end model for meter readings. To address this, we propose using a lightweight model for adding gauge types quickly and a highly accurate instance segmentation model for reading the gauges. To ensure stability, we deployed our system on the NVIDIA Jetson Xavier NX, a low-power edge computing device, and optimised it using NVIDIA TensorRT provided by CitationNVIDIA.

To improve real-time performance, several optimizations for instance segmentation speed have been made. The Yolact (Bolya et al., Citation2019) method uses prototype mask generation and single-instance mask factor prediction to achieve real-time speeds. Yolact++ (Bolya et al., Citation2020) introduced deformable convolution and optimised the prediction head to achieve faster speeds. Polarmask (Xie et al., Citation2020) translated pixel segmentation into a prediction problem in polar coordinates, reducing computational load but with the limitation of not handling nonconvex polygon contours or holes.

YolactEdge is the only model that has been tested on edge computing devices, with speeds of 30 FPS on Jetson AGX Xavier and 170 FPS on RTX 2080 Ti. Therefore, we presume that using Jetson Xavier NX, which is slightly less computationally capable than Jetson AGX Xavier, as the hardware platform would also meet the requirements of the instrument for real-time meter reading, after all, the number of instances in our task is much smaller than the YouTube VIS dataset used by the authors in the Yolactedge. But YolactEdge has the drawback of randomly selecting keyframes every 4 frames to maintain accuracy. The instance segmentation model takes a long time to train, requiring more than most analog meters to generalise well, but frequent training is not necessary to increase the variety of meters.

The current methods suffer from poor robustness and accuracy in natural environments. Achieving a lightweight, real-time machine and deep learning-based meter reading system model that includes all types of meters is challenging due to a lack of sufficient data sets.

3. System overview of the overall system framework

Our meter reading system has a simplified approach compared to previous solutions, as it does not include the correction step for camera angle rotation. The meter reading process consists of four steps, as illustrated in Figure .

Step 1 involves converting the input image to a 300x300 resolution. In step 2, real-time meter positioning is done using bounding boxes through an SSD network, which also detects the meter type, confidence level, the range of minimum to maximum scale, and the units. If video streaming is performed, the frame with the highest confidence level is marked as a keyframe every five frames, to improve the accuracy of the YolactEdge network during video streaming inference, as explained in detail in Subsection 4.2. Step 3 focuses on obtaining a clearer dial image by cropping the dial area from the input image, based on the bounding box coordinates obtained in step 2. In the final step, the dial image is fed into the YolactEdge network, which segments the dial region into four instances, including the dial region without scale, the region from the minimum scale to the pointer, and the region from the pointer to the maximum scale.

Given the current meter type, the range of minimum to maximum scale, and the unit obtained in step 2, the meter reading can be calculated by determining the ratio of the area from the minimum scale to the pointer to the area from the minimum to the maximum scale, as shown in Figure . The calculation process is depicted as follows: $S_{e f f}$ represents the area from the minimum scale to the pointer, $S_{max}$ represents the area from the pointer to the maximum scale, $R_{min}$ and $R_{max}$ denote the minimum and maximum scale values, respectively, and the formula for meter reading calculation is: (1) $R = (R_{max} - R_{min}) \times \frac{S_{e f f}}{S_{max} + S_{e f f}} - R_{min} .$ (1)

Figure 2. The reading is obtained by the area ratio.

4. Efficient meter reading technique

This section delves into three innovative approaches for an efficient meter reading. These methods aim to optimise the process of obtaining accurate readings from meters, by utilising cutting-edge technology and algorithmic advancements. The first method utilises the SSD (Single Shot MultiBox Detector) model, a single-stage object detection algorithm that classifies and locates the meter. The SSD model uses the VGG16 model as its backbone network and processes images to 300 x 300 x 3 before feeding them into the network. The second approach, Dial Plate Instance Segmentation based on YolactEdge, utilises the features from the previous keyframe to reduce computational load and efficiently process the video stream between keyframes and nonkeyframes. The third method, Meter Readings based on Elliptic Affine Transformations, leverages the ratio between the sector area to determine the reading of the meter, with a focus on correcting the deformation of the pointer and the scale. These methods have been designed to simplify and streamline the meter reading process, offering unique solutions to specific challenges faced during the meter reading process.

4.1. SSD-based meter positioning

Compared with other single-stage object detection algorithms, the SSD model is used to classify and locate the meter. The SSD uses the VGG16 model as the backbone network, removes the Dropout layer and FC8 layer in the VGG16 model, and supplements the four convolutional layers with FC6 and FC7 as convolutional layers. The image or video is preprocessed to $300 \times 300 \times 3$ images before being passed into the SSD network, the feature images of different levels are synthesised, the category and confidence of the default bounding box are calculated, and the target detection results are obtained by nonmaximal suppression. The images with confidence level less than 98% was then selected as nonkeyframe by the test result, and the high-definition picture was cropped before preprocessing following the bounding box coordinate scale to obtain a high-definition dial plate border image. Moreover, the unit and maximum scale values of the meter are obtained using the recognised meter type. The objective function used by the SSD network model is: (2) $L (x, c, l, g) = \frac{1}{N} (L_{c o n f} (x, c) + α L_{l o c} (x, l, g)),$ (2) where N is the number of target boxes; c is the index of the target class; x is an indicator function indicating whether the default bounding box matches the true bounding box; l and g indicate the prediction bounding box and the true bounding box, respectively; the positioning loss $L_{l o c}$ the loss between l and g is calculated using a smooth $L_{1}$ function; the confidence loss $L_{c o n f}$ calculated by Softmax; and the α is a weight parameter. The specific definition of the $L_{l o c}$ and confidence loss $L_{c o n f}$ is: (3) $\begin{aligned} L_{l o c} (x, l, g) & = \sum_{i \in P o s}^{N} \sum_{m \in l o c_{i}} x_{i j}^{k} {smooth}_{L 1} (l_{i}^{m} - {\hat{g}}_{i}^{m}), \end{aligned}$ (3) (4) $\begin{aligned} L_{conf} (x, c) & = - \sum_{i \in P o s}^{N} x_{i j}^{p} \log ({\hat{c}}_{i}^{p}) - \sum_{i \in N e g} x_{i j}^{p} \log ({\hat{c}}_{i}^{0}), {\hat{c}}_{i}^{p} = \frac{\exp (c_{i}^{p})}{\sum_{p} \exp (c_{i}^{p})} . \end{aligned}$ (4) In Equation (Equation4(4) $\begin{aligned} L_{conf} (x, c) & = - \sum_{i \in P o s}^{N} x_{i j}^{p} \log ({\hat{c}}_{i}^{p}) - \sum_{i \in N e g} x_{i j}^{p} \log ({\hat{c}}_{i}^{0}), {\hat{c}}_{i}^{p} = \frac{\exp (c_{i}^{p})}{\sum_{p} \exp (c_{i}^{p})} . \end{aligned}$ (4) ): ${\hat{g}}_{j}^{c x} = \frac{(g_{j}^{c x} - d_{j}^{c x})}{d_{i}^{w}}, {\hat{g}}_{j}^{c y} = \frac{(g_{j}^{c y} - d_{j}^{c y})}{d_{i}^{h}}, {\hat{g}}_{j}^{w} = \log (\frac{g_{j}^{w}}{d_{i}^{w}}), {\hat{g}}_{j}^{h} = \log (\frac{g_{j}^{h}}{d_{i}^{h}}),$ where $d_{j}^{c x}$ , $d_{j}^{c y}$ , $d_{i}^{w}$ , and $d_{i}^{h}$ contain the location information of the target; m represents the number of feature maps; and Pos,loc,and Neg represents the positive, negative, and bounding box coordinate position sets, respectively.

4.2. Dial plate instance segmentation based on YolactEdge

YolactEdge mode will reuse the convolution feature of the specified layer from the previous keyframe when processing nonkeyframes, which helps reduce the computational load on layers with more floating-point operations. As shown in Figure , C4 in the backbone network ResNet-101 will occupy $> 66 %$ of the computational cost, reduce the high cost of computation, and calculate efficiently the video stream between keyframes and nonkeyframes. Moreover, the YolactEdge proposed the solution given a nonkeyframe $I^{n}$ and its previous keyframe $I^{k}$ , and the model first encodes the motion of the objects between them into a two-dimensional video stream, described as vector $M (I^{k}, I^{n})$ , and then uses the video stream to $I_{k}$ . The feature $F^{k} = {P_{4}^{k}, P_{5}^{k}}$ is aligned with frame $I^{n}$ to give the feature distortion ${\tilde{F}}^{n} = {W_{4}^{n}, W_{5}^{n}} = T (F^{k}, M (I^{k}, I^{n}))$ .

Figure 3. YolactEdge reuses P4 and P5 from the previous keyframe in a nonkeyframe.

The proposed method of utilising feature layers P4 and P5 directly from the previous keyframe's feature pyramid offers significant computational savings. This approach may result in a slight reduction in accuracy during video inference, however, this is a trade-off that is well worth making given the savings in computational costs. Furthermore, the effective marking of keyframes by the previous SSD model ensures that the removal of nonkeyframe features will not significantly impact the accuracy of meter reading.

4.3. Meter readings based on elliptic affine transformations

Based on the reading method of the sector area ratio, relative to the first positioning of the pointer and the position of the scale, judging the positive direction of the pointer, and then obtaining the corrected dial through various perspective rotations, a certain degree of deformation of the pointer and the scale was observed in this process. Moreover, these processes are sequentially performed, cannot be parallel multithreading, and will consume more time.

The elliptical side of the dial can be observed as the perfect circle obtained by the front observation table through affine transformation when the camera fails to face the dial. The ratio between the sector area $S_{e f f}$ the pointer sweeping from minimum scale to the sector area $S_{e f f} + S_{max}$ from minimum scale to maximum scale does not change due to affine transformation. The proof process is as follows:

Any point $p (x, y)$ on the plane is affine to give $P^{'} (x^{'}, y^{'})$ : (5) ${\begin{cases} x^{'} = a_{11} x + a_{12} y + a_{13} \\ y^{'} = a_{21} x + a_{22} y + a_{23} . \end{cases}$ (5) The affine matrix was observed: (6) $Δ = (\begin{array}{ll} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array}) \neq 0.$ (6) The equation for a circle is: (7) $x^{2} + y^{2} = a^{2} .$ (7) After a special affine transformation, i.e. rotating the diameter of the circle as the axis, the affine ellipse is obtained: (8) $\begin{aligned} {\begin{aligned} x^{'} & = x \\ y^{'} & = \frac{b}{a} y, \end{aligned} \end{aligned}$ (8) (9) $\begin{aligned} \frac{x^{' 2}}{a^{2}} + \frac{y^{' 2}}{b^{2}} = 1, (a > b > 0) . \end{aligned}$ (9) A sector within a circle is observed as a combination of multiple isosceles triangles, as shown in Figure $Δ o p_{1} p_{2}, Δ o p_{2} p_{3} \dots \dots Δ o p_{n - 1} p_{n}$ (Figure ), where:

Figure 4. Decomposition diagram of sector $\nabla O P_{1} P_{n}$ .

Figure 4. Decomposition diagram of sector ∇OP1Pn.

(10)

S Δ o p_{1} p_{2} = | \begin{array}{lll} 0 & 0 & 1 \\ x_{1} & y_{1} & 1 \\ x_{2} & y_{2} & 1 \end{array} | .

(10) After the affine transformation Formula (Equation8), the equation below is obtained:

(11)

S Δ o p_{1}^{'} p_{2}^{'} = | \begin{array}{ccc} 0 & 0 & 1 \\ x_{1} & \frac{b}{a} y_{1} & 1 \\ x_{2} & \frac{b}{a} y_{2} & 1 \end{array} | .

(11) The ratio of the area

S^{'}

of an arbitrary sector after adhesion transformation to the original sector area S is constant:

(12)

\begin{aligned} \frac{S Δ o p_{1}^{'} p_{2}^{'}}{S Δ o p_{1} p_{2}} & = \frac{b}{a}, \end{aligned}

(12)

(13)

\begin{aligned} \frac{S^{'}}{S} & = lim_{n \to \infty} (\sum_{1}^{n} \frac{S Δ o p_{n}^{'} p_{n + 1}^{'}}{S Δ o p_{n} p_{n + 1}}) = \frac{b}{a} . \end{aligned}

(13)

5. Experimental results and data analysis

In this section, we conduct a thorough evaluation of our system to test its performance across several key metrics. Our experiments aim to provide a comprehensive understanding of the capabilities of the system, and its ability to deliver accurate, reliable, and efficient results. We first outline the fundamental experimental setup, providing detailed information about the hardware and software configurations used to test the system. Subsequently, we evaluate the performance of the system in four critical areas: reading accuracy,system compatibility,inference speed, and system stability. By conducting a systematic and exhaustive analysis of the results, we aim to provide insights into the strengths and limitations of the system, and to identify opportunities for further improvement.

5.1. Experimental setting

Next we present a comprehensive overview of the experimental setup employed in this study. This includes the detailed description of the experimental environment, the selection and use of relevant data sets, as well as the establishment of robust evaluation criteria to accurately assess the performance of our model.

5.1.1. Experimental environment

The technical specifications of the experimental environment used to test the proposed model are outlined. The model was implemented using PyTorch 1.8 and Python 3.6 and was deployed on a server equipped with an Intel i7-8700 CPU, 64-GB memory, and an NVIDIA RTX3090 GPU. The gradient descent algorithm for the SSD model training was selected as the adaptive moment estimation (Adam) algorithm, with a batch size of 16. The learning rate was set using a multistep method, starting with an initial value of 0.001 and a gamma value of 0.9, which decayed by 90% at the end of each epoch training. A total of 60 epochs were trained. Similarly, the Yolact-Edge also utilised a multistep method to adjust the learning rate, starting with an initial value of 0.00001 and a gamma value of 0.1. The model underwent 750,000 iterations in total.

5.1.2. Composition and characteristics of the gauge reading model training dataset

The dataset consisted of 7 gauges, as show in Figure . There are 4 types of pressure gauges with ranges of 1.5bar, 2.5bar, 3bar, and 4.5bar provided by Lauridsen et al. (Citation2019) and were relabelled using the CVAT tool to convert it into the COCO format. We have named the categories Pressure-2.5bar, Pressure-1.5bar, Pressure-4.5bar, and Pressure-3bar. These gauges were chosen due to their small shooting camera deflection angles and the pointer deflection angles covering the entire scale range, without any overexposure or underexposure issues. The gauges Oxygen-2.5bar, Nitrogen-2.5bar, and Propane-2.5bar were filmed specifically for this study and contained video data of the pointer scale ranging from minimum to maximum under normal, overexposed, and overcast light conditions. In addition, five sets of video data were also recorded under varying levels of light intensity and large camera deflection angles.

Figure 5. Seven kinds of meters as meters of the dataset.

The dataset utilised in this article entails a total of 4977 images. However, the images for each category within the dataset are not evenly distributed. Due to the fact that there is only one video per meter category in the publicly available dataset, there are relatively fewer images available. On the other hand, instruments we photographed ourselves have multiple videos which resulted in a larger number of images. The quantities of various meter images are listed in Table . This dataset was then split into training, validation, and test sets in a ratio of 80%:10%:10%, respectively.

Table 1. The Quantity of Images for Each Meter Category in the Dataset.

Download CSV Display Table

5.1.3. Evaluation indicators

Frames per second (FPS) was used during video inference as an indicator of model inference speed, and the map value between 0.5 and 0.95 IoU was given. To verify the accuracy of the proposed reading recognition algorithm, the average relative error $\bar{δ}$ and the average reference error $\bar{γ}$ was compared with the current excellent meter reading system. The expressions of the two indicators are as follows: (14) $\begin{aligned} \bar{δ} & = \frac{\sum_{i = 1}^{n} \frac{| a_{i} - A_{i} |}{A_{i}}}{n} \times 100 %, \end{aligned}$ (14) (15) $\begin{aligned} \bar{γ} & = \frac{\sum_{i = 1}^{n} \frac{| a_{i} - A_{i} |}{S}}{n} \times 100 %, \end{aligned}$ (15) where a is the reading recognised by the algorithm, A is the actual reading, S is the scale range of the pointer meter, and n is the total number of samples.

5.2. Performance comparison of instance segmentation models for meter reading

The analysis of instance segmentation model accuracy is crucial for improving the overall meter reading system accuracy. In this section, we evaluate the performance of the strength segmentation model and compare it with other meter reading systems using a publicly available dataset. The ModelNet-V2 and ResNet101-FPN backbone networks were used during the training phase in YolactEdge. The models were tested on the validation set using IoU thresholds of 0.5, 0.75, 0.9, and 0.95, and the results are presented in Table . The maximum number of iterations was 400,000 after 1,400 epochs. ModelNet-V2 and ResNet101-FPN took 49.93 hours and 90.58 hours to train, respectively. The results showed that the model trained with the ResNet101-FPN backbone network outperformed the ModelNet-V2, as demonstrated in Figure . Since our proposed scheme reduces the time cost and multiple iterations required in the YolactEdge training process, we chose the ResNet101-FPN backbone network for higher accuracy in subsequent comparison experiments.

Figure 6. Loss value decay process of the YolactEdge model.

Table 2. Comparison of target detection and mask extraction performance of two backbone.

Display Table

We evaluated 4 meter videos from the dataset by Lauridsen et al. (Citation2019) on the Nvidia RTX3090 device without TensorRT acceleration (Figure ). It is noticeable that Pressure-3bar was not added to the training set. Linear quantisation of the model using TensorRT can increase the inference speed but also decrease its precision. So we further measured average reference error and average relative error on Jetson NX in three states, where FP32 represents the default Full 32-bit precision, FP16 and INT8 represent the use of Full 16-bit precision and Integer 8-bit precision inference, respectively. Table shows the average reference error ( $\bar{δ}$ ) and average relative error ( $\bar{δ}$ ) of the four meters. The accuracy of our readings is very close to that of manual meter readings, even though it is inferred without adding the meter video to the training set.

Figure 7. Demonstration of readings from four reference meters.

Table 3. The reading accuracy of the four tables given by Lauridsen et al. (Citation2019).

Display Table

Table shows the average reference error and average relative error of all tables for comparison with current state-of-the-art methods. Our results are taken from the average of Table . They are significantly better than previous methods, even if we do not consider that linear acceleration reduces the inference accuracy of the model. The average reference error is degraded by a factor of 90.08, and the average relative error is degraded by a factor of 5.69.

Table 4. Accuracy comparison with previous studies.

Display Table

5.3. System compatibility for meter recognition

Our system has demonstrated its capability to quickly and effectively add new meter types with high compatibility. This has been proven through two key pieces of evidence. Firstly, the instance partitioning model has demonstrated a high level of inference accuracy even for meters that were not included in the training set. During accuracy testing, our system was able to accurately predict the readings of meters that were not part of the training dataset, as illustrated in Figure . This demonstrates the ability of our model to generalise to new, previously unseen meters. The second piece of evidence pertains to the ability of the SSD model to quickly learn new meter types and maintain accurate inference results. To verify this, we conducted experiments to analyse the accuracy of the SSD model and its training process. Figure shows that the SSD-based meter classification and positioning model achieved a high recognition accuracy. From a randomly selected sample of 498 positive and negative examples from the dataset, the SSD model was able to accurately identify the types of meters and accurately mark their bounding boxes.

Figure 8. The proportion of mAP and various meters in the test set.

The SSD training process is shown in Figure . After just 43 minutes and 40 epochs, the loss value of the SSD model stabilised, indicating that the model had reached convergence. This is a remarkably short amount of time and highlights the efficiency of our system. In the event that a new meter type is added, a portion of the weights can be frozen and transfer learning can be performed, which will further reduce the time required for training.

Figure 9. Loss value decay process of SSD model.

Overall, the results of our experiments provide compelling evidence for the high compatibility and quick adaptation capabilities of our system.

5.4. Inference speed

To demonstraing that our system can get meter data in real time, we tested the inference speed in Nvidia RTX3090 and Nvidia Jetson NX, respectively, and pushed the inference speed after linear acceleration.

5.4.1. Inference speed on RTX3090

We evaluated the real-time capability of our system by conducting inference calculations using the Nvidia RTX3090. To do so, we used five instrumentation videos capturing natural lighting environments as our test set. The five environments included: normal soft light without camera angle deflection, overexposure without angle deflection, underexposure without angle deflection, soft light with camera angle deflection, and sunlight with camera angle deflection. The inference results are illustrated in Figure .

Figure 10. Demonstration of three meter readings proposed by the current study.

Table shows the detection speed of various meters using 720P and mp4 formats. The results indicate that the system can maintain a detection speed of 14 frames per second or higher. However, when we used 8K video as the test set, the inference speed was lower than 2 FPS due to the increased data volume.

Table 5. Inference speed in different environments.

Download CSV Display Table

5.4.2. Inference speed on Jetson Xavier NX

To provide a comprehensive understanding of the inference speed under varying linear acceleration conditions, we evaluated our system on the Nvidia Jetson Xavier NX. Seven videos of instruments under soft light without camera angle deflection were selected as the test set. The results of the tests without TensorRT acceleration, with FP16 acceleration, and with INT8 acceleration are presented in Table . The average inference speed of 10.26fps achieved using INT8 precision inference demonstrates that the system is capable of real-time meter reading.

Table 6. Inference speed on Jetson NX.

Download CSV Display Table

5.5. System stability

To evaluate the stability of our system in real-world scenarios, we deployed our models on Nvidia RTX3090 and Nvidia Jetson NX. Our proposed dataset covers various interference conditions, and we randomly selected 50 images from each scenario to measure error statistics and test the system's ability to accurately read meter information in real-time despite multiple interferences.

5.5.1. Accuracy on our proposed dataset

In this paper, we aimed to assess the stability of our model in a realistic scenario when reading meter information in real-time. To do this, we deployed the model on the Nvidia RTX3090 and tested its performance under multiple interference conditions using our proposed dataset. The results of the inference effects on the three meters under different scenarios are displayed in Figure , where in most scenes, our model closely aligns with the manual readings of the meter data. However, when in an underexposure scenario, the fine-grained instance segmentation of the meter decreased, leading to an irregular pattern as seen in Figure . But after conducting the reading error calculation, we found that the impact of this jaggedness on the error was almost negligible and did not affect the calculation of the instance mask's area ratio.

Figure 11. The wavy line between different areas.

To further evaluate the model's accuracy, we extracted fifty inference results from each of the six gauges added to the dataset and calculated the average relative error and average reference error statistics, as shown in Table . Despite being tested in harsh real-world scenarios, the average relative and reference errors in our readings remained close to the error on the public dataset (as shown in Table )). Despite this, our model still suffers from a lack of noise reduction processing, which leads to a significant increase in error when using 8K video to add noise to the video images.

Table 7. The reading accuracy of three kinds of meters under different conditions.

Display Table

5.5.2. Stability testing on Jetson NX

To assess the stability of the system under varying conditions, we conducted a simulation in a realistic scenario, as depicted in Figure . To maintain consistent lighting, we kept the light intensity constant, and varied the camera shooting angle from 90 degrees to 140 degrees, capturing the meter panel in various degrees of deflection. The Jetson Xavier NX was utilised for real-time inference to ensure efficiency.

Figure 12. Scenarios for stability testing.

To gather data for error analysis, we randomly selected 50 images of each meter at each angle, as shown in Figure . The results of the inference on these meter image frames were then analysed to calculate the error. The findings of this study provide valuable insight into the system's stability and performance in real-world scenarios.

Figure 13. Inference results when the camera angle deflection.

We evaluated the system stability of our model by simulating a realistic scenario and changing the camera angle, as depicted in Figure . The results show the average error of the three meters after the angle change. The propane meter's relative error was larger at the initial stage, which we attribute to its dial being made of frosted material that is more challenging for the model to extract its dial features due to light reflection. On the other hand, the oxygen and nitrogen meters maintained a low and stable error between 90 and 120 degrees of camera deflection, but the error rose sharply beyond that range. This stability in camera angle deflection ensures that our model can be deployed on a free-moving meter reading trolley with stable operation even with a deviation of the camera angle from the dial.

Figure 14. Inference results when camera angle deflection and linear acceleration.

We also tested the impact of different degrees of linear acceleration on the inference progress and speed. Figure indicates that there is not a significant change in error when using FP16 acceleration, but when using INT8 acceleration, the error is slightly larger than without acceleration.

6. Conclusion

In this paper, we aimed to propose a highly available and accurate analog meter reading system for real-time use on edge devices. The system was divided into two models: a lightweight single-stage meter classification and localisation model, and a more complex instance segmentation model. The single-stage model can be easily updated as needed when new meters are added, and the elliptic affine transformation ensures real-time meter readings through the camera by directly reading the mask ratio of different areas.

Experiments were conducted to analyse the system's inference speed, generalisation capability, accuracy, and stability. The results confirmed that using the sector's mask area to read meters is a more efficient method compared to correcting deflected meter images through algorithms. The system was able to maintain a stable inference speed of 10.26 FPS with a low relative error rate of 0.16% when the camera angle deflection was between 60 and 120 degrees.

Although this study proposes a new method for the design of simulation metering systems, there are still some limitations. Firstly, this system currently only applies to simulated instruments with a single pointer and a circular dial, which covers most industrial instruments. Secondly, the instrument dataset exhibits a long-tail distribution, which interferes with the accuracy of our classification model. In future research, we aim to expand our meter dataset by collecting samples from a wider range of meter with varying styles and configurations. However, the collected instrument data naturally exhibits a long-tail distribution, so we will further improve the model training framework to reduce the impact of long-tail distribution on model accuracy.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was partially supported by the National Natural Science Foundation of China under Grants 62072170, the Key Research and Development Program of Hunan Province under Grant 2022GK2015, and the Hunan Provincial Natural Science Foundation of China under Grant 2021JJ30141, the Natural Science Research Project in Hechi University under Grant 2022YLXK003, and the Research Project of Improve the Basic Research Ability of Young Teachers in Guangxi Universities under Grant 2022KY0602.

References

Bao, H., Tan, Q., Liu, S., & Miao, J. (2019). Computer vision measurement of pointer meter readings based on inverse perspective mapping. Applied Sciences, 9(18), 3729. https://doi.org/10.3390/app9183729
Google Scholar
Bolya, D., Zhou, C., Xiao, F., & Lee, Y. J. (2019). Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9157–9166).
Google Scholar
Bolya, D., Zhou, C., Xiao, F., & Lee, Y. J. (2020). Yolact++: Better real-time instance segmentation Yolact++: Better real-time instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Web of Science ®Google Scholar
Charles, J., Bucciarelli, S., & Cipolla, R. (2021). Scaling digital screen reading with one-shot learning and re-identification. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 2635–2643).
Google Scholar
Chen, S., Yang, C., Huang, W., Liang, W., Ke, N., Souri, A., & Li, K. C. (2023a). Fairness constraint efficiency optimization for multiresource allocation in a cluster system serving internet of things. International Journal of Communication Systems, 36(3), e5395. https://doi.org/10.1002/dac.v36.3
Web of Science ®Google Scholar
Chen, S., Yang, C., Huang, W., Liang, W., Ke, N., Souri, A., & Li, K. C. (2023b). Fairness constraint efficiency optimization for multiresource allocation in a cluster system serving internet of things. International Journal of Communication Systems, 36(3), e5395. https://doi.org/10.1002/dac.v36.3
Web of Science ®Google Scholar
Chi, J., Liu, L., Liu, J., Jiang, Z., & Zhang, G. (2015). Machine vision based automatic detection method of indicating values of a pointer gauge. Mathematical Problems in Engineering, 2015.
Web of Science ®Google Scholar
Diao, C., Zhang, D., Liang, W., Li, K. C., Hong, Y., & Gaudiot, J. L. (2022). A novel spatial-temporal multi-scale alignment graph neural network security model for vehicles prediction. IEEE Transactions on Intelligent Transportation Systems.
Web of Science ®Google Scholar
Donnelly, M. K., Davis, W. D., Lawson, J. R., & Selepak, M. J. (2006). Thermal environment for electronic equipment used by first responders. National Institute of Standards and Technology. Building and Fire Research….
Google Scholar
Fan, Y., Zhang, W., Bai, J., Lei, X., & Li, K. (2023). Privacy-preserving deep learning on big data in cloud. China Communications, 1–11.
Web of Science ®Google Scholar
Gao, J., Guo, L., Lv, Y., Wu, Q., & Mu, D. (2018). Research on algorithm of pointer instrument recognition and reading based on the location of the rotation center of the pointer. In 2018 IEEE international conference on mechatronics and automation (ICMA) (pp. 1405–1410).
Google Scholar
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In Proceedings of the IEEE international conference on computer vision (pp. 2961–2969).
Google Scholar
Howells, B., Charles, J., & Cipolla, R. (2021). Real-time analogue gauge transcription on mobile phone. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2369–2377).
Google Scholar
Hu, N., Zhang, D., Xie, K., Liang, W., Diao, C., & Li, K. C. (2022). Multi-range bidirectional mask graph convolution based GRU networks for traffic prediction. Journal of Systems Architecture, 133, 102775. https://doi.org/10.1016/j.sysarc.2022.102775
Web of Science ®Google Scholar
Illingworth, J., & Kittler, J. (1987). The adaptive Hough transform. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-9(5), 690–698. https://doi.org/10.1109/TPAMI.1987.4767964
Web of Science ®Google Scholar
Lauridsen, J. S., Graasmé, J. A., Pedersen, M., Jensen, D. G., Andersen, S. H., & Moeslund, T. B. (2019). Reading circular analogue gauges using digital image processing. In 14th international joint conference on computer vision, imaging and computer graphics theory and applications (VISIGRAPP 2019) (pp. 373–382).
Google Scholar
Li, Y., Liang, W., Peng, L., Zhang, D., Yang, C., & Li, K. C. (2022). Predicting drug-target interactions via dual-stream graph neural network. IEEE/ACM Transactions on Computational Biology and Bioinformatics.
Google Scholar
Liang, W., Li, Y., Xie, K., Zhang, D., Li, K. C., Souri, A., & Li, K. (2022). Spatial-temporal aware inductive graph neural network for C-ITS data recovery. IEEE Transactions on Intelligent Transportation Systems.
Web of Science ®Google Scholar
Liang, W., Long, J., Li, K. C., Xu, J., Ma, N., & Lei, X. (2021). A fast defogging image recognition algorithm based on bilateral hybrid filtering. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 17(2), 1–16. https://doi.org/10.1145/3391297
Web of Science ®Google Scholar
Liang, W., Xie, S., Cai, J., Wang, C., Hong, Y., & Kui, X. (2021). Novel private data access control scheme suitable for mobile edge computing. China Communications, 18(11), 92–103. https://doi.org/10.23919/JCC.2021.11.007
Web of Science ®Google Scholar
Liang, W., Xie, S., Zhang, D., Li, X., & Li, K. c. (2021). A mutual security authentication method for RFID-PUF circuit based on deep learning. ACM Transactions on Internet Technology (TOIT), 22(2), 1–20. https://doi.org/10.1145/3426968
Web of Science ®Google Scholar
Liu, H., Soto, R. A. R., Xiao, F., & Lee, Y. J. (2021). YolactEdge: Real-time instance segmentation on the edge. In 2021 IEEE international conference on robotics and automation (ICRA) (pp. 9579–9585).
Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. In European conference on computer vision (pp. 21–37).
Google Scholar
Liu, Y., Liu, J., & Ke, Y. (2020). A detection and recognition system of pointer meters in substations based on computer vision. Measurement, 152, 107333. https://doi.org/10.1016/j.measurement.2019.107333
Web of Science ®Google Scholar
Long, J., Liang, W., Li, K. C., Wei, Y., & Marino, M. D. (2022). A regularized cross-layer ladder network for intrusion detection in industrial internet of things. IEEE Transactions on Industrial Informatics, 19(2), 1747–1755. https://doi.org/10.1109/TII.2022.3204034
Web of Science ®Google Scholar
Ma, Y., & Jiang, Q. (2018). A robust and high-precision automatic reading algorithm of pointer meters based on machine vision. Measurement Science and Technology, 30(1), 015401. https://doi.org/10.1088/1361-6501/aaed0a
Web of Science ®Google Scholar
Mai, X., Li, W., Huang, Y., & Yang, Y. (2018). An automatic meter reading method based on one-dimensional measuring curve mapping. In 2018 IEEE international conference of intelligent robotic and control engineering (IRCE) (pp. 69–73).
Google Scholar
NVIDIA (n.d.). Nvidia tensorrt. https://developer.nvidia.com/tensorrt. Accessed2022-7-10.
Google Scholar
Sablatnig, R., & Kropatsch, W. G. (1994). Automatic reading of analog display instruments. In Proceedings of 12th international conference on pattern recognition (Vol. 1, pp. 794–797).
Google Scholar
Wang, C., Fang, Y., & Jia, L. (2018). The comparison of canny and structured forests edge detection application in precision identification of pointer instrument. In 2018 Chinese control and decision conference (CCDC) (pp. 6361–6365).
Google Scholar
Wang, J., Huang, J., & Cheng, R. (2018). Automatic reading system for analog instruments based on computer vision and inspection robot for power plant. In 2018 10th international conference on modelling, identification and control (ICMIC) (pp. 1–6).
Google Scholar
Xiao, W., Tang, Z., Yang, C., Liang, W., & Hsieh, M. Y. (2022). ASM-VoFDehaze: A real-time defogging method of zinc froth image. Connection Science, 34(1), 709–731. https://doi.org/10.1080/09540091.2022.2038543
Web of Science ®Google Scholar
Xie, E., Sun, P., Song, X., Wang, W., Liu, X., Liang, D., Shen, C., & Luo, P. (2020). Polarmask: Single shot instance segmentation with polar representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12193–12202).
Google Scholar
Zhang, S., Hu, B., Liang, W., Li, K. C., & Gupta, B. B. (2023). A caching-based dual K-anonymous location privacy-preserving scheme for edge computing. IEEE Internet of Things Journal.
PubMed Web of Science ®Google Scholar
Zheng, C., Wang, S., Zhang, Y., Zhang, P., & Zhao, Y. (2016). A robust and automatic recognition system of analog instruments in power system by using computer vision. Measurement, 92, 413–420. https://doi.org/10.1016/j.measurement.2016.06.045
Web of Science ®Google Scholar
Zhuo, H. B., Bai, F. Z., & Xu, Y. X. (2020). Machine vision detection of pointer features in images of analog meter displays. Metrology and Measurement Systems, 589–599.
Web of Science ®Google Scholar
Zuo, L., He, P., Zhang, C., & Zhang, Z. (2020). A robust approach to reading recognition of pointer meters based on improved mask-RCNN. Neurocomputing, 388, 90–101. https://doi.org/10.1016/j.neucom.2020.01.032
Web of Science ®Google Scholar

Real-time reading system for pointer meter based on YolactEdge

Abstract

1. Introduction

2. Related work