1,080
Views
2
CrossRef citations to date
0
Altmetric
Research Article

A networked smart home system based on recurrent neural networks and reinforcement learning

&
Pages 775-783 | Received 28 Jul 2021, Accepted 30 Oct 2021, Published online: 11 Nov 2021

Abstract

With the widespread application of smart home systems, the optimal design of smart home systems has received considerable research attention. This paper puts forward a network smart home system design scheme based on the analysis of the indoor environment and the forecast of the future indoor environment. By building a multi-level network model, an integrated model system from analysis, prediction to decision-making is formed. The swarm intelligent decision-making ability of the networked smart home system is realized by applying a recurrent neural network and a reinforcement learning method. Meanwhile, the indoor simulation environment is built, the indoor environment variables are simulated and the performance of the system is verified by the simulation environment. The simulation results show that the networked smart home system has advantages over the single smart home equipment in the performance of indoor comfort improvement.

1. Introduction

As one of the main application fields of edge computing, the concept of smart home was proposed in the 1980s. The system monitors and manages various communications, home appliances and security equipment, and realizes intelligent management of the home environment. (The concept of smart home was first introduced from the field of edge computing, which is responsible for monitoring and managing various communications among home appliances and security equipment, shed new light into the intellectual management of home environment.) It is generally divided into three phases: smart single product, interconnection and artificial intelligence. It is mainly based on the fourth generation of intelligent products based on all IP technology; technology integration, using short-range real-time communication solutions such as WiFi, Zigbee and so on. In the smart home network environment to provide services through the cloud center, to achieve customized personalized services according to user needs (Al-Ali et al., Citation2017; Ansarey et al., Citation2014; Babou et al., Citation2018; Barsoum & Hasan, Citation2015; Keshtkar & Arzanpour, Citation2017; Xing & Deng, Citation2019). For example, Al-Ali et al. (Citation2017) introduced an energy management system for smart homes and better consumer needs through big data analysis methods and business intelligence. Cheikh et al. (Babou et al., Citation2018) proposed the concept of home edge computing (HEC) in the context of edge computing in the smart home. It is a three-tiered edge computing architecture that provides ultra-low latency for applications to the user's near-end and focuses on handling the problem of overloading data transmission. Utilizing edge computing in the smart home scenario can better meet real-time business needs and bring people a better service experience. The development of smart home systems using edge computing is still in the experimental state.

Machine learning employs the science of artificial intelligence, and its main research object is artificial intelligence, especially how to improve the performance of specific algorithms in experiential learning. Reinforcement learning is an important branch of machine learning, which served to describe and solve the problem that agents use learning strategies to maximize returns or achieve specific goals in their interaction with the environment. The inspiration comes from the behaviourist theory in psychology, that is, how the organism gradually forms expectations of the stimulus under the stimulus of the reward or punishment given by the environment. Since the late 1980s, with growing support from the development of mathematics, reinforcement learning has drawn much attention owing to its ability to learn online and the adaptive ability without a tutor system, and has been applied in many fields, such as handicraft manufacture, robot control, optimization and scheduling, simulations and so on (Hao et al., Citation2018; Mathe et al., Citation2016; Nguyen et al., Citation2019; Sun et al., Citation2018; Yin et al., Citation2018). Research in the past decades investigated various methods, techniques and algorithms used in reinforcement learning, and their defects and problems are gradually revealed. It is the focus of researchers’ discussion and research to find a better way and algorithm to promote the development and wide application of reinforcement learning. Therefore, neural networks and their algorithms have become the focus of many researchers due to their unique generalization and storage capabilities.

Recurrent neural network is an artificial neural network with a tree-like hierarchical structure and the nodes of the network recurse the input information according to their connection order (Tu et al., Citation2016; Wu & Yue, Citation2019; Xu et al., Citation2019; Yan et al., Citation2018; Zhang et al., Citation2017). Wherein, neurons can not only accept the information of other neurons but also accept their information, which forms a network structure with a loop, and subsequently enable the ability of short-term memory neural network. It is a nonlinear dynamic system and can be used to realize associative memory and solve optimization problems by electronic circuits (Haykin, Citation1999; Huaguang & Zhanshan, Citation2009). So far, a large number of researchers have strengthened the effects and applications of reinforcement learning through the characteristics of neural networks (Ding et al., Citation2017; Guo et al., Citation2019; Teich et al., Citation2014). For example, in Guo et al. (Citation2019) and Teich et al. (Citation2014), the authors adopted neural network algorithm as the core adaptive algorithm based on the relationship between the complexity of the smart home environment and non-linear variables, with added wavelet, particle swarm and optimizing neural network algorithms, such as convolution, makes neural network algorithms have better performance in adaptive control systems. Under different processing problem backgrounds, different optimization methods are used to obtain better results. In Ding et al. (Citation2017), the neural network is further studied and explored, and the neural network algorithm based on deep learning is proposed. This algorithm has better performance in controlling system performance and accuracy than ordinary, single or other optimized neural network algorithms. Its advantages lie in multi-layer processing, rich hidden layer mode, complex connection weight relationship, and the large probability simulation which reduces the relationship between various factors and variables in complex and changeable environment. In Xie et al. (Citation2020), Yu et al. (Citation2021), authors have investigated applications of the deep reinforcement learning in smart building energy management with the consideration of different system scales comprehensively. Yu et al. (Citation2019) have researched the problem of minimizing the sum of energy cost and thermal discomfort cost in a long-term time horizon for a sustainable smart home with a heating, ventilation, and air conditioning load and a stochastic program has been proposed to minimize the time average expected total cost.

This paper focuses on the neural network algorithm that serves as the core of the prediction algorithm. The improvement and optimization of the neural network algorithm are proposed based on the prediction results. This research explores the extent to which our designated model of the indoor environment could automatically improve the comfortability of the indoor environment under the simulations. This paper takes the indoor environment analysis and forecasts the future indoor environment changes as the starting point, constructs a multi-level network model, and forms an integrated modelling solution from analysis to prediction and subsequent decision-making. Combining recurrent neural network, network reinforcement learning technology and proposed improvement measures to realize the group intelligent decision-making ability of smart home system.

The mainly contribution of this paper is:

  1. Temperature, humidity, light, dust and wind speed are integrated into the smart home system, so as to improve the overall comfort of the living environment.

  2. Recurrent neural network is used to collect the timing information of environmental variables to improve the intelligence of smart home system.

The rest of this paper is organized as follows. Section 2 analyses the basic problem and constructs the prediction model. Section 3 further analyses the prediction model and integrates it with the reinforcement learning method. Section 4 discusses the details of the algorithm of our models and makes the numerical result in a simulation environment which verifies the performance of the models. Finally, the conclusion is given in Section 5.

2. Analysis of problems and construction of the prediction model

In the indoor environment, human comfort is related to a variety of environmental attributes, including temperature, humidity, somatosensory wind speed, air dust, sunlight, and light intensity in the environment, and densely related to spatial location and time strong dependencies. In this regard, I consider the effects of space position and time, and design temperature, humidity, wind speed, dust, and light intensity as time-varying fields in space, and the corresponding variables are T(x,y,z,t), W(x,y,z,t), V(x,y,z,t), P(x,y,z,t) and L(x,y,z,t), where xR,yR,zR,tR+. These variables together affect the physical comfort of the human body, as shown in Figure (a).

Figure 1. (a) Variables affecting human comfort. (b) Directed graph of coupling relationship between different variables.

Figure 1. (a) Variables affecting human comfort. (b) Directed graph of coupling relationship between different variables.

In view of the coupling effects between these variables, for example, the wind speed will affect the interplay of the humidity field and the temperature field. I represent these coupling relationships in Figure (b). uT,uW,uV,uP,uL respectively are used to represent the intelligent controller's control parameters for temperature, humidity, wind speed, dust and light intensity, then the system model can be established as follows: (1) {Tt=FT(W,V,L,T,uT),Wt=FW(W,V,L,T,uW),Pt=FP(W,V,L,uP),Vt=FV(V,uV),L=FL(uL),(1) where FT,FW,FP,FV and FL are undetermined functionals. Under ideal conditions, these undetermined functionals are present with specific expressions. However, the existence and uniqueness of the solution of the equation cannot be guaranteed even if these expressions exist. Moreover, the boundary conditions of indoor environment are often complicated, and difficult to determine the solution of the continuous space problem under the premise of limited sensors (state observation). Therefore, I do not consider the analytical form of the functional groups above, but from the coupling relationship of (1), I parameterize and discretize the abstract system to obtain a new optimization model.

First, L is not interfered by other variables, so to simplify the expression, let η=(T,W,P,V), uη=(uT,uW,uP,uV), and by discretizing each variables, it yields the state sequence η{n},L{n} and control sequence uη{n},uL{n}. According to Figure (b), the topological relationship among the variables can be described by the coupling matrix as follows: A=(a0a10a2a3a40a5a6a7a8a9000a10),where each ai is an undetermined matrix and 0 is zero matrix. So far, I use recurrent neural network (Tu et al., Citation2016; Wu & Yue, Citation2019; Xu et al., Citation2019; Yan et al., Citation2018; Zhang et al., Citation2017 to describe the model (1) as below, (2) {η{n+1}=RNN(<Aη{k},L{k},uη{k}>k=nn0n),L{n+1}=FL(uL{n},E{n}),(2) where n>n0, η{0},η{1},,η{n0} and L{0},L{1},,L{n0} are initial conditions. Also uη{0}=uη{1}==uη{n0}=0 and E{n} is the outdoor light intensity. RNN is a recurrent neural network. Define comfort function f0(η{k},L{k})R+, whose value represents the level of comfort. Our goal is to maintain the level of comfort in the limited time n0, i.e. max<uη{j},uL{j}>j=n0+1infk>Nf0(η{k},L{k}).

3. Further analysis of the prediction model and reinforcement learning integration

I integrate autonomy as an important design concept into the establishment of the model in system (2), that is, all variables that may be related to the time are embedded into the system (2) as state variables or intrinsic parameters, such as the outdoor light intensity. For outdoor temperature, it varies from day to night. By arranging sensors outdoors, the outdoor temperature is introduced into the system as a multi-dimensional variable T. That is to say, T is a high-dimensional state variable, its dimension is equal to the total number of temperature sensors, some of which are placed outdoors. It is why ai in coupling matrix A is an undetermined matrix. And a0,a4,a8,a10 represent self-coupling matrices of T,W,P,V respectively. The rest in A represents the coupling between variables. So far, in order to be more precise, the first equation in (2) needs to be modified, because the outdoor state is not affected by the control parameters, so I do direct sum decomposition for the state variable η{n}, η{n}=ηI{n}ηO{n}, where ηI{n} represents indoor state and ηO{n} represents the outdoor state. The control variables can only affect the indoor state variables, and the comfort degree is only related to the indoor state variables. So, let f(ηI{k},L{k})=f0(η{k},L{k}). And in the optimization model, the lower bound condition is not conducive to the later solution process, and the original intention of designing the lower bound is to maintain the comfort at a higher level. Therefore, referring to the risk theory, I can substitute the lower bound process by combining a concave function g, so that in the later optimization process, the comfort can reach a higher level, and at the same time reduce its volatility. Therefore, the model can be modified to max<uη{j},uL{j}>j=n0+1M1Mk=0Mgf(ηI{k},L{k}) (3) {ηI{n+1}=RNN(<Aη{k},L{k},uη{k}>k=nn0n),η{n+1}=ηI{n+1}ηO{n+1},L{n+1}=FL(uL{n},E{n}),nn0,(3) where η{0},η{1},,η{n0}, L{0},L{1},,L{n0}, uη{0}=uη{1}==uη{n0}=0, ηO{n}and E{n} are external inputs. M is a positive integer large enough to represent the duration. The function FL can be directly modelled according to the indoor structure, window opening position, light source position and wall reflection, which will not be described here. The RNN layer is composed of four RNNs, which are independent of each other, which are applied to the coupling matrix A to form state output. Let the observation value sequence of indoor state be η~{n}, so I can define the loss function as follows: Lossη(A,Np,<uη{k}>k=n0+1M,<uL{k}>k=n0+1M)=1Mn=n0+1Mη^{n}η~{n}2,where Np is parameters of neutral network. Sum the loss functions of different strategies, and minimize the target infA,Np<uη{k}>k=n0+1M,<uL{k}>k=n0+1MLossη as shown in Figure .

Figure 2. Three-layer network structure of the system.

Figure 2. Three-layer network structure of the system.

Next, I will use the theory and method of reinforcement learning to optimize the control strategy. Reinforcement learning is developed from dynamic programming equation. This means that the learning system obtains the mapping relationship between the state and the behaviour through the interaction with the external environment, and the learning goal is to optimize the value of reward function. Q-learning used here is a classical reinforcement learning algorithm. First define the function Q, Q(<ηI{k}>k=nn0n,<L{k}>k=nn0n;uη{n},uL{n})=max<uη{j},uL{j}>j=n+1M1Mk=nMgf(ηI{k},L{k}).

According to (3), (<ηI{k}>k=nn0n,<L{k}>k=nn0n)(<ηI{k}>k=nn0+1n+1,<L{k}>k=nn0+1n+1) is a Markov decision process (MDP) (Chen et al., Citation2017; Doshi-Velez, Citation2009; Li, Citation2019; Qiao et al., Citation2018; Xie et al., Citation2020; Yu et al., Citation2019; Yu et al., Citation2021; Zheng & Cho, Citation2011. So according to the principle of dynamic principle (Redouane & Cherif, Citation2018; Zheng et al., Citation2008, the Bellman equation holds as below, Q(Sn;an)Q(Sn;an)+α(rn+1+λmaxan+1Q(Sn+1;an+1)Q(Sn;an)),where α<1 is the learning rate and by defining the loss function, LossQ=E(rn+1+λmaxan+1Q(Sn+1;an+1)Q(Sn;an))2,and combining with the system (3), the flow chart of the whole algorithm is shown in Figure .

Figure 3. The algorithm framework.

Figure 3. The algorithm framework.

So, I give the total loss function of the model, Loss is <uη{k}>k=n0+1M,<uL{k}>k=n0+1MLossη+an=(μη{n},uL{n})n=n0+1,,M×(rn+1+λmaxan+1Q(Sn+1;an+1)Q(Sn;an))2.

4. Details of algorithm and numerical result in the simulation environment

In the established simulation environment, I summarize the algorithm framework as follows:

  1. Initialize the state space sequence and clear the system time

  2. Initialize DQN

  3. Enter the status into DQN, use ϵ-greed strategy to generate control strategy, and use control strategy to generate several control sets

  4. Match the control set with the state, enter it into the system (3), update the prediction state, and update the simulation state in the simulation system, and return to step 3 until the system time reaches the set time

  5. Calculate Loss, use backpropagation to optimize the network

  6. Return to step 1 until the Loss converges or reaches the set number of trainings.

In the training, the initial temperature indoor is 0 and the initial humidity is 80, and P is 30. g is the square root function and f is the L2 distance from the state to the ideal state. The number of temperature observer, humidity observer and dust observer are set to be 1, the light intensity is set to a constant value, and the wind speed is measured using equipment to set the wind speed, as shown in Figure . In the environment, the intelligent networking equipment includes air conditioner, humidifier/dehumidifier and air cleaner. The simulation works on a computer with Intel Core i7-10700, 64G memory and an NVIDIA 2080ti graphics card. The soft environment is Python/Pytorch.

Figure 4. Distribution of networked home intelligent devices in the simulation environment.

Figure 4. Distribution of networked home intelligent devices in the simulation environment.

When the equipment is just started, the indoor environment variable distribution is shown in Figure .

Figure 5. Distribution of indoor environment variables when the equipment is just started.

Figure 5. Distribution of indoor environment variables when the equipment is just started.

The learning rate of algorithm training is set as 0.05, the number of times of the outermost cycle training is 100,000, and thegrowth of the rewards during the training process is shown in Figure . After 100,000 times of training, the reward is stable at about 0.75 from 0.5 at the beginning. At this time, the probability that the error of the environmental state value in the prediction system exceeds 10% is less than 0.7%, the probability that the error exceeds 5% is less than 1.7%, and the probability that the error exceeds 1% is less than 7.2%. The relationship between the error bound and probability of the prediction exceeding the error bound is shown in Figure .

Figure 6. The growth of the rewards during the neural network in training.

Figure 6. The growth of the rewards during the neural network in training.

Figure 7. The curve of the relationship between the error bound and probability of the prediction exceeding the error bound.

Figure 7. The curve of the relationship between the error bound and probability of the prediction exceeding the error bound.

After training, the effect of the networked smart home system in the simulation system is shown in Figure , and the changes and comparison of indoor temperature, humidity and micro dust of the networked smart home system are shown in Figure .

Figure 8. Distribution of indoor environment variables after a period of control.

Figure 8. Distribution of indoor environment variables after a period of control.

Figure 9. Comparison of indoor temperature, humidity and dust change between networked smart home system and single smart home equipment.

Figure 9. Comparison of indoor temperature, humidity and dust change between networked smart home system and single smart home equipment.

It can be seen from Figure  that after 800 simulation time units, the indoor environment variables of networked smart home system are basically controlled within a suitable range. Compared with the single smart home equipment, the network smart home system has advantages.

In addition, I make a comparison between our method and some typical method in Table , and the adjustment time cost by different methods to reach the ±5% zone of the comfort state is investigated. The experimental results show that our method has certain advantages in the speed of comfort adjustment.

Table 1. The adjustment time cost by different methods to reach the ±5% zone of the comfort state.

5. Conclusion

In this paper, a kind of network smart home system has been established. Through the construction of multi-level network model, an integrated scheme from analysis to final decision-making has been formed to realize the group intelligent decision-making ability of smart home system. After integrating recurrent neural network, graph structure layer and reinforcement learning technology, the intelligent home network system with independent intelligent decision-making ability has been realized after training. The simulation results have shown that the system has advantages over the single smart home equipment in the performance of indoor comfort improvement.

In the future work, the LSTM or GRU can be put into the network instead of common GNN, it may achieve better results. In addition, the indoor temperature and humidity suitable for sleep and dining may be different, which is also the question needs to study in the future.

Acknowledgements

This work is jointly supported by the Key Project of University Outstanding Young Talents Support Program of Anhui Province(gxyqZD2017140), the Scientific and Technological Innovation Team Construction Project of Wuhu Institute of Technology of China (Wzykj2018A03, Wzykjtd202005), the Key Project of Natural Science Research of Anhui Higher Education Institutions of China (KJ2019A0974,KJ2020A0911).

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the Scientific and Technological Innovation Team Construction Project of Wuhu Institute of Technology of China: [Grant Number (Wzykj2018A03)]; the Key Project of Natural Science Research of Anhui Higher Education Institutions of China: [Grant Number (KJ2020A0911)].

References