Full article: Output Feedback Controller for a Class of Unknown Nonlinear Discrete Time Systems Using Fuzzy Rules Emulated Networks and Reinforcement Learning

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

A model-free adaptive control for non-affine discrete time systems is developed by utilising the output feedback and action-critic networks. Fuzzy rules emulated network (FREN) is employed as the action network and multi-input version (MiFREN) is implemented as the critic network. Both networks are constructed using human knowledge based on IF–THEN rules according to the controlled plant and the learning laws are established by reinforcement learning without any off-line learning phase. The theoretical derivation of the convergence of the tracking error and internal signal is demonstrated. The numerical simulation and the experimental system are given to validate the proposed scheme.

Keywords:

1. Introduction

Due to the complexity of controlled plants nowadays, it is commonly difficult or impossible to establish its mathematical model especially for the discrete time system [Citation1]. By utilising only input–output data of the controlled plant, the model-free approaches have been developed [Citation2, Citation3]. On the other hand, the performance of the controllers is related to data's quality and quantity [Citation4]. For some engineering applications, it is very difficult to access all state variables, thus the output feedback is still a preferable scheme [Citation5, Citation6]. Furthermore, the close-loop analysis and stability approaches have been proposed [Citation7, Citation8, Citation9] to guarantee the performance of controllers. From the engineering point of view, the stability analysis beside of closed-loop's performance is only a basic minimum requirement even for the artificial intelligence controller [Citation10]. Therefore, the optimal controllers are more desirable for modern applications [Citation11] or by nature view [Citation12].

To ensure the closed-loop performance with the optimisation of the predefined cost function, the schemes based on adaptive dynamic programming have been utilised but the mathematic models have been required for its iterative learning [Citation13, Citation14]. With the model-free aspects, reinforcement learning (RL) algorithms have been developed to solve optimal control [Citation15, Citation16] with the estimated solution of the Hamilton–Jacobi–Bellman equation [Citation17, Citation18]. To mimic the RL process, the approaches based on action-critic networks have been derived by artificial neural networks ( ANNs) under considering the controlled plant as a black box [Citation19, Citation20]. Nevertheless, even the mathematic model is unknown but the engineer still has basic human knowledge of the controlled plant such that ‘IF higher output is required THEN more control effort should be supplied’. Thus, the controlled plant can be considered as a grey box.

To integrate the human knowledge as IF–THEN format into the controller, fuzzy logic systems ( FLSs) have been utilised in control applications [Citation21] also including the optimal problems [Citation22]. By including the learning ability to FLS, the integrations between FLS and ANN have been developed such as fuzzy neural network (FNN) [Citation23] and fuzzy rules emulated network (FREN) [Citation24, Citation25]. Thereafter, the approaches of using FNN and FREN for solving the optimal problem with RL have been proposed [Citation26, Citation27] when the controlled plants have been considered as a class of affine systems. On the other hand, the problem of non-affine systems has been studied in Ref. [Citation28] by the approach of critic-action networks when the state feedback has been utilised for gaining enough information to tune ANNs.

In this work, the output feedback model-free controller is proposed when the control effort is non-affine with respect to system dynamics. The controller is designed by the action network called FRENa with the set of IF–THEN rules according to the controlled plant. Thereafter, the long-term cost function is estimated by the multi-input version of FREN called MiFRENc when IF–THEN rules are established under the general aspect for minimising both tracking error and control energy. The learning laws are derived with the RL approach to tune all adjustable parameters of FRENa and MiFRENc aiming to minimise the tracking error and the estimated cost function. Furthermore, the closed-loop analysis is provided by the Lyapunov method to demonstrate the convergence of the tracking error and internal signals.

This paper is organised as follows. Section 2 introduces a class of systems under our investigation and problem formulation. The proposed scheme is introduced in Section 3 including the network architectures with IF–THEN rules of FRENa and MiFRENc and their formulations. The learning laws and closed-loop analysis are derived in Section 4. Section 5 provides the results of the simulation and experimental system.

2. Controlled Plant as a Class of Nonlinear Discrete-Time Systems

In this work, the controlled plant for a class of non-affine discrete time systems is considered as (1) $y (k + 1) = f (y (k), \dots, y (k - n_{y}), u (k), \dots, u (k - n_{u})) + d (k),$ (1) where $y (k + 1) \in R$ is the plant's output with respect to the control effort $u (k) \in R$ , $f (-)$ is an unknown nonlinear function, $n_{u}$ and $n_{y}$ are unknown system orders and $d (k)$ denotes a bounded disturbance such that $| d (k) | \leq d_{M}$ . For further analysis, the following assumptions are expressed according to the unknown nonlinear function $f (-)$ with respect to the control effort $u (k)$ .

Assumption 2.1

The derivative of $y (k + 1)$ with respect to $u (k)$ is existed and bounded such that (2) $0 < g_{m} \leq \frac{\partial y (k + 1)}{\partial u (k)} \leq g_{M},$ (2) where $g_{m}$ and $g_{M}$ are positive constants.

Remark 2.2

The condition mentioned in (Equation2(2) $0 < g_{m} \leq \frac{\partial y (k + 1)}{\partial u (k)} \leq g_{M},$ (2) ) indicates that the controlled plant in (Equation1(1) $y (k + 1) = f (y (k), \dots, y (k - n_{y}), u (k), \dots, u (k - n_{u})) + d (k),$ (1) ) is a positive control direction. That will assist the setting of IF–THEN rules according to the change of control effort $Δ u (k)$ altogether with the change of output $Δ y (k + 1)$ .

Referring to condition (Equation2(2) $0 < g_{m} \leq \frac{\partial y (k + 1)}{\partial u (k)} \leq g_{M},$ (2) ), it is clear that the change of output $Δ y (k + 1)$ with respect to the change of control effort $Δ u (k)$ can be rewritten as (3) $g_{m}^{d} \leq \frac{Δ y (k + 1)}{Δ u (k)} \leq g_{M}^{d},$ (3) where $Δ u (k) > 0$ and $g_{m}^{d}$ and $g_{M}^{d}$ are constants according to $g_{m}$ and $g_{M}$ , respectively. This will lead to the setting of IF–THEN rules such that

IF $Δ u (k)$ is positive-large, THEN $Δ y (k + 1)$ should be positive-large

IF $Δ u (k)$ is negative small, THEN $Δ y (k + 1)$ should be negative small.

By utilising those IF–THEN rules, the adaptive controller based on FRENs will be established in the next section.

3. RL Controller

The proposed controller is illustrated by the block diagram in Figure . In this work, the plant is selected as a DC motor current control. Only the armature current is measured as the output $y (k + 1)$ (mA) when the control effort $u (k)$ (V) is the voltage fed to the driver unit. Thus, the IF–THEN rules mentioned in Section 2 can be rewritten according to the physical nature such that

Figure 1. Closed-loop system architecture.

IF we apply positive-large change of control voltage [ $Δ u (k)$ ], THEN we should have positive-large change of armature current [ $Δ y (k + 1)$ ].

According to this knowledge, the action network (FRENa) is first established to generate the control effort $y (k)$ when the input is the tracking error $e (k)$ defined as (4) $e (k) = r (k) - y (k),$ (4) where $r (k)$ is the desired trajectory. Second, the critic network is designed using MiFRENc to produce the estimated long-term cost function $\hat{L} (k)$ for the controller FRENa. The details of two networks and its IF–THEN rules are given as follows.

3.1. Controller or Action Network

To utilise the action network, the IF–THEN rules with the relation between the tracking error $e (k)$ and the control effort $u (k)$ are first established. By considering the basic knowledge such that, positive-large $e (k)$ means lack of $y (k)$ in positive-large. In order to compensate, it clearly requires that the control effort $u (k)$ should be positive-large. For conclusion, we have IF $e (k)$ is positive-large, THEN $u (k)$ should be positive-large. With seven linguistic levels, it leads to the design of IF–THEN rules as

Table

Display Table

where notations of linguistic variables N, P, L, M, S and Z denote negative, positive, large, medium, small and zero, respectively.

Employing this set of IF–THEN rules, the network architecture of FRENa is illustrated by Figure . According to the network architecture in Figure and the function formulation of FREN in Ref. [Citation24], the control effort $u (k)$ is determined by (5) $u (k) = β_{a}^{T} (k) ϕ_{a} (k),$ (5) where (6) $ϕ_{a} (k) = [μ_{NL} (e_{k}) μ_{NM} (e_{k}) \dots μ_{PL} (e_{k})]^{T}$ (6) and (7) $β_{a} (k) = [β_{a}^{NL} (k) β_{a}^{NM} (k) \dots β_{a}^{PL} (k)]^{T} .$ (7)

Figure 2. Action network or controller based on FREN.

Let us consider FRENa as the function estimator of the unknown control effort, thus it exists the ideal control effort $u^{*} (k)$ with the ideal parameter $β_{a}^{*}$ such that (8) $u^{*} (k) = β_{a}^{* T} ϕ_{a} (k) + ε_{a} (k),$ (8) where $ε_{a} (k)$ is a bounded residual error $| ε_{a} (k) | \leq ε_{a M}$ .

By using the dynamics (Equation1(1) $y (k + 1) = f (y (k), \dots, y (k - n_{y}), u (k), \dots, u (k - n_{u})) + d (k),$ (1) ) with the control laws (Equation5(5) $u (k) = β_{a}^{T} (k) ϕ_{a} (k),$ (5) ) and (Equation8(8) $u^{*} (k) = β_{a}^{* T} ϕ_{a} (k) + ε_{a} (k),$ (8) ), the tracking error $e (k + 1)$ is rearranged as (9) $\begin{aligned} e (k + 1) & = r (k + 1) - y (k + 1) \\ = f (u_{k}^{*}) - f (u_{k}) - d (k) . \end{aligned}$ (9) Recalling Assumption 1 and using mean value theorem, the error dynamic (Equation9(9) $\begin{aligned} e (k + 1) & = r (k + 1) - y (k + 1) \\ = f (u_{k}^{*}) - f (u_{k}) - d (k) . \end{aligned}$ (9) ) can be rewritten as (10) $\begin{aligned} e (k + 1) & = \frac{\partial f (x)}{\partial x} |_{x = u^{m} (k)} [u^{*} (k) - u (k)] - d (k) \\ = g (k) [u^{*} (k) - u (k)] - d (k), \end{aligned}$ (10) where (11) $g (k) = \frac{\partial f (u^{m} (k))}{\partial u^{m} (k)},$ (11) and $u^{m} (k) \in [min {u_{k}^{*}, u_{k}}, max {u_{k}^{*}, u_{k}}]$ . Employing the control laws (Equation8(8) $u^{*} (k) = β_{a}^{* T} ϕ_{a} (k) + ε_{a} (k),$ (8) ) and (Equation5(5) $u (k) = β_{a}^{T} (k) ϕ_{a} (k),$ (5) ), it yields (12) $e (k + 1) = g (k) [β_{a}^{*} - β_{a} (k)]^{T} ϕ_{a} (k) + g (k) ε_{a} (k) - d (k) .$ (12) Let us define ${\tilde{β}}_{a} (k) = β_{a}^{*} - β_{a} (k)$ , $d_{a} (k) = g (k) ε_{a} (k) - d (k)$ and (13) $Λ_{a} (k) = {\tilde{β}}_{a}^{T} (k) ϕ_{a} (k),$ (13) and we obtain (14) $e (k + 1) = g (k) Λ_{a} (k) + d_{a} (k) .$ (14) It is worth to note that the tracking error obtained by (Equation14(14) $e (k + 1) = g (k) Λ_{a} (k) + d_{a} (k) .$ (14) ) is functional by ${\tilde{β}}_{a} (k)$ and the unknown but bounded $d_{a} (k)$ such that $| d_{a} (k) | \leq d_{a M}$ . This relation will be used for the performance analysis afterward.

3.2. Estimated Cost–Function or Critic Network

In this work, the long-term cost function $L (k)$ is employed by an infinite-horizon of the tracking error $e (k)$ and the control effort $u (k)$ with the discount factor $γ_{L}$ as (15) $L (k) = \sum_{i = k}^{\infty} γ_{L}^{i - k} l (i),$ (15) where (16) $l (k) = p e^{2} (k) + q u^{2} (k),$ (16) where p and q are positive constants and $0 < γ_{L} \leq 1$ .

$L (k)$ in (Equation15(15) $L (k) = \sum_{i = k}^{\infty} γ_{L}^{i - k} l (i),$ (15) ) is functional by two input arguments with the quadratic functions ( $f_{x} = x^{2}$ ) of $e (k)$ and $u (k)$ . Thus, an adaptive network MiFRENc is utilised to estimate $L (k)$ as the block diagram in Figure . In order to design MiFRENc, IF–THEN rules are first established by Table . Thereafter, the network architecture of MiFRENc is illustrated by Figure . By utilising the network in Figure and results in Ref. [Citation24], the estimated cost function $\hat{L} (k)$ is determined by (17) $\hat{L} (k) = β_{c}^{T} (k) ϕ_{c} (k),$ (17) where (18) $β_{c} (k) = [β_{ZZ} (k) β_{ZS} (k) \dots β_{LL} (k)]^{T}$ (18) and (19) $ϕ_{c} (k) = [ϕ_{1} (k) ϕ_{2} (k) \dots ϕ_{9} (k)]^{T} .$ (19)

Figure 3. Estimated cost function or critic network.

Table 1. MiFRENc: IF–THEN rules.

Display Table

Using the universal approximation property of MiFREN [Citation24], there exists an ideal parameter $β_{c}^{*}$ such that (20) $L (k) = β_{c}^{* T} ϕ_{c} (k) + ε_{c} (k),$ (20) where $ε_{c} (k)$ is a bounded residual error such that $| ε_{c} (k) | \leq ε_{c M}$ . Adding and subtracting $β_{c}^{* T} ϕ_{c} (k)$ on the left-hand side of (Equation17(17) $\hat{L} (k) = β_{c}^{T} (k) ϕ_{c} (k),$ (17) ) yields (21) $\begin{aligned} \hat{L} (k) & = {\tilde{β}}_{c}^{T} (k) ϕ_{c} (k) + β_{c}^{* T} ϕ_{c} (k) \\ = Λ_{c} (k) + β_{c}^{* T} ϕ_{c} (k), \end{aligned}$ (21) where ${\tilde{β}}_{c} (k) = β_{c}^{T} (k) - β_{c}^{*}$ and $Λ_{c} (k) = {\tilde{β}}_{c}^{T} (k) ϕ_{c} (k)$ .

In order to improve the performance of FRENa and MiFRENc, the learning laws will be developed and explained in the next section.

4. Learning Algorithms and Performance Analysis

4.1. Action Network Learning Law

Considering the tracking error within $Λ_{a} (k)$ as (Equation14(14) $e (k + 1) = g (k) Λ_{a} (k) + d_{a} (k) .$ (14) ) and the estimated cost function $\hat{L} (k)$ , in this work, the error function of action network is given as (22) $e_{a} (k) = \sqrt{g (k)} Λ_{a} (k) + \frac{1}{\sqrt{g (k)}} \hat{L} (k) .$ (22) Thereafter, the cost function to be minimised is utilised as (23) $E_{a} (k) = \frac{1}{2} e_{a}^{2} (k) .$ (23) Applying the gradient descent, the tuning law for $β_{a}$ is derived as (24) $β_{a} (k + 1) = β_{a} (k) - η_{a} \frac{\partial E_{a} (k)}{\partial β_{a} (k)},$ (24) where $η_{a}$ is the learning rate. By using the chain rule and (Equation13(13) $Λ_{a} (k) = {\tilde{β}}_{a}^{T} (k) ϕ_{a} (k),$ (13) ), it yields (25) $\begin{aligned} \frac{\partial E_{a} (k)}{\partial β_{a} (k)} & = \frac{\partial E_{a} (k)}{\partial e_{a} (k)} \frac{\partial e_{a} (k)}{\partial Λ_{a} (k)} \frac{\partial Λ_{a} (k)}{\partial β_{a} (k)} \\ = - e_{a} (k) \sqrt{g (k)} ϕ_{a} (k) . \end{aligned}$ (25) Recalling (Equation24(24) $β_{a} (k + 1) = β_{a} (k) - η_{a} \frac{\partial E_{a} (k)}{\partial β_{a} (k)},$ (24) ) with (Equation25(25) $\begin{aligned} \frac{\partial E_{a} (k)}{\partial β_{a} (k)} & = \frac{\partial E_{a} (k)}{\partial e_{a} (k)} \frac{\partial e_{a} (k)}{\partial Λ_{a} (k)} \frac{\partial Λ_{a} (k)}{\partial β_{a} (k)} \\ = - e_{a} (k) \sqrt{g (k)} ϕ_{a} (k) . \end{aligned}$ (25) ) and using $e_{a} (k)$ in (Equation22(22) $e_{a} (k) = \sqrt{g (k)} Λ_{a} (k) + \frac{1}{\sqrt{g (k)}} \hat{L} (k) .$ (22) ), it leads to (26) $\begin{aligned} β_{a} (k + 1) & = β_{a} (k) + η_{a} e_{a} (k) \sqrt{g (k)} ϕ_{a} (k) \\ = β_{a} (k) + η_{a} [\sqrt{g (k)} Λ_{a} (k) + \frac{1}{\sqrt{g (k)}} \hat{L} (k)] \sqrt{g (k)} ϕ_{a} (k) \\ = β_{a} (k) + η_{a} [g (k) Λ_{a} (k) + \hat{L} (k)] ϕ_{a} (k) . \end{aligned}$ (26) By eliminating $d_{a} (k)$ in (Equation14(14) $e (k + 1) = g (k) Λ_{a} (k) + d_{a} (k) .$ (14) ), the learning law (Equation26(26) $\begin{aligned} β_{a} (k + 1) & = β_{a} (k) + η_{a} e_{a} (k) \sqrt{g (k)} ϕ_{a} (k) \\ = β_{a} (k) + η_{a} [\sqrt{g (k)} Λ_{a} (k) + \frac{1}{\sqrt{g (k)}} \hat{L} (k)] \sqrt{g (k)} ϕ_{a} (k) \\ = β_{a} (k) + η_{a} [g (k) Λ_{a} (k) + \hat{L} (k)] ϕ_{a} (k) . \end{aligned}$ (26) ) is rewritten as (27) $β_{a} (k + 1) = β_{a} (k) + η_{a} [e (k + 1) + \hat{L} (k)] ϕ_{a} (k) .$ (27) The final learning law of FRENa given by (Equation27(27) $β_{a} (k + 1) = β_{a} (k) + η_{a} [e (k + 1) + \hat{L} (k)] ϕ_{a} (k) .$ (27) ) is a practical one because all parameters required on the left-hand side are certainly obtained at the time index k + 1.

4.2. Critic Network Learning Law

In general, the error function of critic networks is employed by the estimated cost function $\hat{L} (k)$ . Therefore, in this work, the error function $e_{c} (k)$ is given as (28) $e_{c} (k) = δ \hat{L} (k) - \hat{L} (k - 1) + l (k),$ (28) where δ is a positive constant. In order to tune $β_{c}$ , the cost function $E_{c} (k)$ is defined as (29) $E_{c} (k) = \frac{1}{2} e_{c}^{2} (k) .$ (29) Applying the gradient descent at (Equation29(29) $E_{c} (k) = \frac{1}{2} e_{c}^{2} (k) .$ (29) ) with respect to $β_{c} (k)$ , we have (30) $β_{c} (k + 1) = β_{c} (k) - η_{c} \frac{\partial E_{c} (k)}{\partial β_{c} (k)},$ (30) where $η_{c}$ is the learning rate. Using the chain rule along $E_{c} (k)$ in (Equation29(29) $E_{c} (k) = \frac{1}{2} e_{c}^{2} (k) .$ (29) ), $e_{c} (k)$ in (Equation28(28) $e_{c} (k) = δ \hat{L} (k) - \hat{L} (k - 1) + l (k),$ (28) ) and $\hat{L} (k)$ in (Equation17(17) $\hat{L} (k) = β_{c}^{T} (k) ϕ_{c} (k),$ (17) ), it yields (31) $\begin{aligned} \frac{\partial E_{c} (k)}{\partial β_{c} (k)} & = \frac{\partial E_{c} (k)}{\partial e_{c} (k)} \frac{\partial e_{c} (k)}{\partial \hat{L} (k)} \frac{\partial \hat{L} (k)}{\partial β_{c} (k)} \\ = e_{c} (k) δ ϕ_{c} (k) . \end{aligned}$ (31) Rewriting (Equation30(30) $β_{c} (k + 1) = β_{c} (k) - η_{c} \frac{\partial E_{c} (k)}{\partial β_{c} (k)},$ (30) ) with (Equation31(31) $\begin{aligned} \frac{\partial E_{c} (k)}{\partial β_{c} (k)} & = \frac{\partial E_{c} (k)}{\partial e_{c} (k)} \frac{\partial e_{c} (k)}{\partial \hat{L} (k)} \frac{\partial \hat{L} (k)}{\partial β_{c} (k)} \\ = e_{c} (k) δ ϕ_{c} (k) . \end{aligned}$ (31) ), it leads to (32) $\begin{aligned} β_{c} (k + 1) & = β_{c} (k) - η_{c} e_{c} (k) δ ϕ_{c} (k) \\ = β_{c} (k) - η_{c} δ [l (k) - \hat{L} (k - 1) + δ \hat{L} (k)] ϕ_{c} (k) . \end{aligned}$ (32) Finally, we have a practical tuning law for MiFRENc.

4.3. Closed-Loop Analysis

In the following theorem, the closed-loop performance of the output feedback controller is demonstrated while the tracking error and internal signals are bounded.

Theorem 4.1

For the non-affine discrete time system mentioned in Section 2, the performance of the closed-loop system configured by the structure of FRENa and MiFRENc in Section 3 is guaranteed in terms of the bonded tracking error and internal signals when the designed parameters are selected as follows: (33) $\begin{aligned} \frac{1}{2} < δ \leq 1, \end{aligned}$ (33) (34) $\begin{aligned} 0 < η_{a} \leq \frac{g_{m}}{ν_{a} g_{M}^{2}} \end{aligned}$ (34) and (35) $0 < η_{c} \leq \frac{1}{ν_{c} δ^{2}},$ (35) where $ν_{a}$ and $ν_{c}$ are upper limits of $| | ϕ_{a} (k) | |^{2}$ and $| | ϕ_{c} (k) | |^{2}$ , respectively.

Proof : The proof is given in Appendix.

The validation of the proposed control scheme will be presented in the next section for the computer simulation system with a non-affine discrete time system and the hardware implementation system for DC motor current control plant.

5. Simulation and Experimental Systems

5.1. Simulation System and Results

The controller developed in this work is first implemented on the nonlinear discrete time given as (36) $y (k + 1) = \sin (y_{k}) + [5 + \cos (y_{k} u_{k})] u (k) .$ (36) It is worth to mention that the mathematic model in (Equation36(36) $y (k + 1) = \sin (y_{k}) + [5 + \cos (y_{k} u_{k})] u (k) .$ (36) ) is used only to establish the simulation. In this test, the desired trajectory is given as (37) $r (k + 1) = A_{r} \sin (ω_{r} π \frac{k}{k_{M}}),$ (37) where $k_{M} = 500$ as the maximum time index, $A_{r} = 1.0$ and $ω_{r} = 8$ . To follow (Equation33(33) $\begin{aligned} \frac{1}{2} < δ \leq 1, \end{aligned}$ (33) ), δ is selected as $δ = 0.75$ and $ν_{a} = ν_{c} = 1.5$ . By using this setting and (Equation35(35) $0 < η_{c} \leq \frac{1}{ν_{c} δ^{2}},$ (35) ), the learning rate of MiFRENc is determined as (38) $0 < η_{c} \leq \frac{1}{δ^{2} ν_{c}^{2}} = \frac{1}{{0.75}^{2} {1.5}^{2}} = 0.7901.$ (38) In this case, the learning rate for MiFRENc is selected as $η_{c} = 0.5$ . To select the learning rate of FRENa, let us chose $g_{m}$ and $g_{M}$ as 1 and 6, respectively. By using (Equation34(34) $\begin{aligned} 0 < η_{a} \leq \frac{g_{m}}{ν_{a} g_{M}^{2}} \end{aligned}$ (34) ), the learning rate of FRENa is determined as (39) $0 < η_{a} \leq \frac{g_{m}}{ν_{a}^{2} g_{M}^{2}} = \frac{1}{{1.5}^{2} 6^{2}} = 0.0123.$ (39) Thus, the learning rate for FRENa is selected as $η_{a} = 0.01$ .

Figures and illustrate the setting of membership functions for FRENa and MiFRENc, respectively. The initial setting of adjustable parameters $β_{◻} (1)$ for FRENa and MiFRENc is given as Table .

Figure 4. FRENa membership functions: simulation case.

Figure 5. MiFRENc membership functions: simulation case.

Table 2. Initial setting $β_{◻} (1)$ : simulation system.

Display Table

Figure displays the tracking performance with both plots of $y (k)$ and $e (k)$ and Figure represents the control effort $u (k)$ . The estimated cost function $\hat{L} (k)$ is illustrated in Figure . The phase plane trajectory of $u (k)$ and $e (k)$ is depicted in Figure to demonstrate the closed-loop system's behaviour.

Figure 6. Tracking performance $y (k)$ and $e (k)$ : simulation system.

Figure 7. Control effort $u (k)$ : simulation system.

Figure 8. Estimated cost function $\hat{L} (k)$ : simulation system.

Figure 8. Estimated cost function L^(k): simulation system.

Figure 9. $u (k)$ and $e (k)$ : simulation system.

5.2. Experimental System and Results

The experimental system is constructed by a DC motor current control. The output $y (k + 1)$ is the armature current (mA) and the input $u (k)$ is the control voltage applied to the driver circuit depicted in Figure . Same as the simulation systems, let us select $δ = 0.75$ , $ν_{a} = ν_{c} = 1.5$ , $g_{m} = 5$ and $g_{M} = 10$ . Thus, the learning rate of FRENa is designed as (40) $0 < η_{a} \leq \frac{g_{m}}{ν_{a}^{2} g_{M}^{2}} = \frac{5}{{1.5}^{2} 10^{2}} = 0.0222.$ (40) In this case, we select $η_{a} = 0.01$ . For MiFRENc, we use the same learning rate as the simulation system such that $η_{c} = 0.5$ because of the same network architecture. The desired trajectory is given as (41) $r (k + 1) = I_{r} \sin (ω_{r} π \frac{k}{k_{M}}),$ (41) where (42) $\begin{aligned} I_{r} & = {\begin{cases} 15 (mA) & if 0 ⩽ k < \frac{k_{M}}{2}, \\ 30 (mA) & otherwise, \end{cases} \end{aligned}$ (42) (43) $\begin{aligned} ω_{r} & = {\begin{cases} 8 & if 0 ⩽ k < \frac{k_{M}}{2}, \\ 4 & otherwise, \end{cases} \end{aligned}$ (43) and $k_{M} =$ 2000.

Figures and represent the setting of membership functions of FRENa and MiFRENc, respectively. All adjustable parameters $β_{◻} (1)$ for FRENa and MiFRENc are initialised as the setting in Table .

Figure 10. FRENa membership functions: experimental system.

Figure 11. MiFRENc membership functions: experimental system.

Table 3. Initial setting $β_{◻} (1)$ : experimental system.

Display Table

Figure displays the motor current $y (k)$ and the tracking error $e (k)$ to demonstrate the performance of the closed-loop system. The maximum absolute value of tracking error is $| e (k) |_{\max} = 48.2936$ (mA) and the average absolute value of tracking error at steady state is 0.4924 (mA) when $k =$ 1500–2000. Figure shows the control effort $u (k)$ . The estimated cost function $\hat{L} (k)$ is illustrated in Figure . The phase plane trajectory of $u (k)$ and $e (k)$ is plotted in Figure . Thus, the large variation is detected because of the back-EMF. In order to evaluate the proposed scheme working under the situation of back-EMF, the pulse-train trajectory is implemented with the response displayed in Figure . It is clear that the effect of back-EMF is eliminated within the second pulse (B).

Figure 12. Tracking performance $y (k)$ and $e (k)$ : experimental system.

Figure 13. Control effort $u (k)$ : experimental system.

Figure 14. Estimated cost function $\hat{L} (k)$ : experimental system.

Figure 14. Estimated cost function L^(k): experimental system.

Figure 15. $u (k)$ and $e (k)$ : experimental system.

Figure 16. Pulse response: experimental system.

6. Conclusions

A model-free adaptive control for a class of non-affine discrete time systems has been developed by RL. The closed-loop system has been established by the output feedback with two adaptive networks FRENa and MiFRENc. The initial settings of FRENa and MiFRENc have been conducted according to the human knowledge of the controlled plant within the format of IF–THEN rules. The performance has been enchanted by the learning laws for both FRENa and MiFRENc while the tracking error and internal signals have been guaranteed the convergence over the reasonable compact sets. The numerical system and experimental results have represented to verify theoretical conjecture.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work has been supported by Fundamental Research Funds for CINVESTAV-IPN and Mexican Research Organization CONACyT [grant number 257253].

Notes on contributors

C. Treesatayapun

C. Treesatayapun received the Ph.D. in elec- trical engineering from Chiang-Mai University, Thailand, in 2004. He was a production engineer at SAGA Elec- tronics (JRC-NJR) from 1998-2000 and was a head of electrical engineering program at North Chiang-Mai University, Thailand from 2001- 2007. He is currently a senior researcher at Department of robotic and advanced manufac- turing, Mexican Research Center and Advanced Technology, CINVESTAV-IPN, Saltillo campus, Mexico. His current research interests include automation and robotic system control and optimization, adaptive and learning algorithms and electric machine drives.

References

Hou ZS, Wang Z. From model-based control to data-driven control: survey, classification and perspective. Inf Sci. 2013;235:3–35.
Web of Science ®Google Scholar
Zhu Y, Hou ZS. Data-driven MFAC for a class of discrete-time nonlinear systems with RBFNN. IEEE Trans Neural Netw Learn Syst. 2014;25(5):1013–1020.
PubMed Web of Science ®Google Scholar
Wang X, Li X, Wang J, et al. Data-driven model-free adaptive sliding mode control for the multi degree-of-freedom robotic exoskeleton. Inf Sci. 2016;327:246–257.
Web of Science ®Google Scholar
Lin N, Chi R, Huang B. Data-driven recursive least squares methods for non-affined nonlinear discrete-time systems. Appl Math Modell. 2020;81:787–798.
Web of Science ®Google Scholar
Kaldmae A, Kotta U. Input–output linearization of discrete-time systems by dynamic output feedback. Eur J Control. 2014;20:73–78.
Web of Science ®Google Scholar
Treesatayapun C. Data input–output adaptive controller based on IF–THEN rules for a class of non-affine discrete-time systems: the robotic plant. J Intell Fuzzy Syst. 2015;28:661–668.
Web of Science ®Google Scholar
Liu YJ, Tong S. Adaptive NN tracking control of uncertain nonlinear discrete-time systems with nonaffine dead-zone input. IEEE Trans Cybern. 2015;45(3):497–505.
PubMed Web of Science ®Google Scholar
Zhang CL, Li JM. Adaptive iterative learning control of non-uniform trajectory tracking for strict feedback nonlinear time-varying systems with unknown control direction. Appl Math Model. 2015;39:2942–2950.
Web of Science ®Google Scholar
Precup RE, Radac MB, Roman RC, et al. Model-free sliding mode control of nonlinear systems: algorithms and experiments. Inf Sci. 2017;381:176–192.
Web of Science ®Google Scholar
Raj R, Mohan BM. Stability analysis of general Takagi–Sugeno fuzzy two-term controllers. Fuzzy Inf Eng. 2018;10(2):196–212.
Web of Science ®Google Scholar
Zhang X, Zhang HG, Sun QY, et al. Adaptive dynamic programming-based optimal control of unknown nonaffine nonlinear discrete-time systems with proof of convergence. Neurocomputing. 2012;35:48–55.
Google Scholar
Eftekhari M, Zeinalkhani M. Extracting interpretable fuzzy models for nonlinear systems using gradient-based continuous ant colony optimization. Fuzzy Inf Eng. 2013;5(3):255–277.
Google Scholar
Liu D, Wang D, Yang X. An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs. Inf Sci. 2013;220(20):331–342.
Google Scholar
Jiang H, Zhang H. Iterative ADP learning algorithms for discrete-time multi-player games. Artif Intell Rev. 2018;50(1):75–91.
Web of Science ®Google Scholar
Liu D, Wang D, Zhao D, et al. Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans Autom Sci Eng. 2012;9(3):628–634.
Web of Science ®Google Scholar
Kiumarsi B, Lewis FL, Modares H, et al. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica. 2014;50(4):1167–1175.
Web of Science ®Google Scholar
Yang Q, Jagannathan S. Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators. IEEE Trans Syst Man Cybern B Cybern. 2012;42(2):377–390.
PubMed Web of Science ®Google Scholar
Ha M, Wang D, Liu D. Event-triggered constrained control with DHP implementation for nonaffine discrete-time systems. Inf Sci. 2020;519:110–123.
Web of Science ®Google Scholar
Xu B, Yang C, Shi Z. Reinforcement learning output feedback NN control using deterministic learning technique. IEEE Trans Neural Netw Learn Syst. 2014;25(3):635–641.
PubMed Web of Science ®Google Scholar
Liu YJ, Li S, Tong S, et al. Adaptive reinforcement learning control based on neural approximation for nonlinear discrete-time systems with unknown nonaffine dead-zone input. IEEE Trans Neural Netw Learn Syst. 2019;30(1):295–305.
Web of Science ®Google Scholar
Allam E, Elbab HF, Hady MA, et al. Vibration control of active vehicle suspension system using fuzzy logic algorithm. Fuzzy Inf Eng. 2010;2(4):361–387.
Google Scholar
Niftiyev AA, Zeynalov CI, Poormanuchehri M. Fuzzy optimal control problem with non–linear functional. Fuzzy Inf Eng. 2011;3(3):311–320.
Google Scholar
Fei J, Wang T. Adaptive fuzzy-neural-network based on RBFNN control for active power filter. Int J Mach Learn Cybern. 2019;10:1139–1150.
Web of Science ®Google Scholar
Treesatayapun C, Uatrongjit S. Adaptive controller with fuzzy rules emulated structure and its applications. Eng Appl Artif Intell. 2005;18:603–615.
Web of Science ®Google Scholar
Treesatayapun C. Adaptive control based on IF–THEN rules for grasping force regulation with unknown contact mechanism. Robot Comput Integr Manuf. 2014;30:11–18.
Web of Science ®Google Scholar
Abouheaf M, Gueaieb W. Neurofuzzy reinforcement learning control schemes for optimized dynamical performance. 2019 IEEE International Symposium on Robotic and Sensors Environments (ROSE). Ontario, Canada; 2019 June. p. 17–18.
Google Scholar
Treesatayapun C. Fuzzy-rule emulated networks based on reinforcement learning for nonlinear discrete-time controllers. ISA Trans. 2008;47:362–373.
Web of Science ®Google Scholar
Wei Q, Lewis FL, Sun Q, et al. Discrete-time deterministic q-learning: a novel convergence analysis. IEEE Trans Cybern. 2017;47(5):1224–1237.
Web of Science ®Google Scholar

Appendix 1.

Proof of Theorem 4.1

Let us refer to the standard Lyapunov function as

(A1)

\begin{aligned} V (k) & = V_{1} (k) + V_{2} (k) + V_{3} (k) + V_{4} (k) \\ = ρ_{1} e^{2} (k) + \frac{ρ_{2}}{η_{a}} {\tilde{β}}_{a}^{T} (k) {\tilde{β}}_{a} (k) + \frac{ρ_{3}}{η_{c}} {\tilde{β}}_{c}^{T} (k) {\tilde{β}}_{c} (k) + ρ_{4} Λ_{c}^{2} (k - 1), \end{aligned}

(A1) where

ρ_{1}

ρ_{2}

ρ_{3}

and

ρ_{4}

are positive constants satisfying the following conditions:

(A2)

\begin{aligned} ρ_{1} > \frac{3}{4} p ρ_{3}, \end{aligned}

(A2)

(A3)

\begin{aligned} ρ_{2} > \frac{ρ_{1} g_{M}^{2} + (ρ_{3} / 8) q}{g_{m}} ρ_{3}, \end{aligned}

(A3)

(A4)

\begin{aligned} ρ_{3} > \frac{ρ_{4}}{δ^{2}} \end{aligned}

(A4) and

(A5)

ρ_{4} > \frac{ρ_{3}}{4} .

(A5) Utilising (Equation14),

Δ V_{1} (k)

is obtained as

(A6)

\begin{aligned} Δ V_{1} (k) & = ρ_{1} [e^{2} (k + 1) - e^{2} (k)] \\ = ρ_{1} [[g (k) Λ_{a} (k) + d_{a} (k)]^{2} - e^{2} (k)] \\ \leq ρ_{1} [2 g^{2} (k) Λ_{a}^{2} (k) + 2 d_{a}^{2} (k) - e^{2} (k)] \\ \leq - ρ_{1} e^{2} (k) + 2 ρ_{1} g_{M}^{2} Λ_{a}^{2} (k) + 2 ρ_{1} d_{a M}^{2} . \end{aligned}

(A6) Recalling the tuning law in (Equation26),

Δ V_{2} (k)

is expressed as

(A7)

\begin{aligned} Δ V_{2} (k) & = \frac{ρ_{2}}{η_{a}} [{\tilde{β}}_{a}^{T} (k + 1) {\tilde{β}}_{a} (k + 1) - {\tilde{β}}_{a}^{T} (k) {\tilde{β}}_{a} (k)] \\ = - 2 ρ_{2} [g (k) Λ_{a} (k) + \hat{L} (k)] {\tilde{β}}_{a}^{T} (k) ϕ (k) + ρ_{2} η_{a} [g (k) Λ_{a} (k) \\ + \hat{L} (k)]^{2} ϕ_{a}^{T} (k) ϕ (k) \\ = - 2 ρ_{2} Λ_{a} (k) [g (k) Λ_{a} (k)] - 2 ρ_{2} Λ_{a} (k) \hat{L} (k) \\ + ρ_{2} η_{a} | | ϕ_{a} (k) | |^{2} [g (k) Λ_{a} (k) + \hat{L} (k)]^{2} . \end{aligned}

(A7) Applying the lower bound and upper bound of

g (k)

, it leads to

(A8)

\begin{aligned} Δ V_{2} (k) & \leq - 2 ρ_{2} g_{m} Λ_{a}^{2} (k) - 2 ρ_{2} Λ_{a} (k) \hat{L} (k) + ρ_{2} η_{a} | | ϕ_{a} (k) | |^{2} g_{M}^{2} Λ_{a}^{2} (k) \\ + ρ_{2} η_{a} | | ϕ_{a} (k) | |^{2} [{\hat{L}}^{2} (k) + 2 g (k) Λ_{a} (k) \hat{L} (k)] \\ = ρ_{2} [- g_{m} Λ_{a}^{2} (k) - (g_{m} - η_{a} | | ϕ_{a} (k) | |^{2} g_{M}^{2}) Λ_{a}^{2} (k) \\ - 2 Λ_{a} (k) [I - η_{a} | | ϕ_{a} (k) | |^{2} g (k)] \hat{L} (k) + η_{a} | | ϕ_{a} (k) | |^{2} {\hat{L}}^{2} (k)] \\ = ρ_{2} [- g_{m} Λ_{a}^{2} (k) - (g_{m} - η_{a} | | ϕ_{a} (k) | |^{2} g_{M}^{2}) [Λ_{a}^{2} (k) \\ + \frac{2 Λ_{a} (k) [I - η_{a} | | ϕ_{a} (k) | |^{2} g (k)] \hat{L} (k)}{g_{m} - η_{a} | | ϕ_{a} (k) | |^{2} g_{M}^{2}}] + η_{a} | | ϕ_{a} (k) | |^{2} {\hat{L}}^{2} (k)] \\ = - ρ_{2} g_{m} Λ_{a}^{2} (k) - ρ_{2} (g_{m} - η_{a} | | ϕ_{a} (k) | |^{2} g_{M}^{2}) | | Λ_{a} (k) \\ + \frac{[1 - η_{a} | | ϕ_{a} (k) | |^{2} g (k)] \hat{L} (k)}{g_{m} - η_{a} | | ϕ_{a} (k) | |^{2} g_{M}^{2}} | |^{2} \\ + ρ_{2} \frac{1 - η_{a} | | ϕ_{a} (k) | |^{2} g_{m}}{g_{m} - η_{a} | | ϕ_{a} (k) | |^{2} g_{M}^{2}} {\hat{L}}^{2} (k) \\ \leq - ρ_{2} g_{m} Λ_{a}^{2} (k) + \frac{ρ_{2}}{g_{m}} {\hat{L}}^{2} (k) - ρ_{2} (g_{m} - η_{a} | | ϕ_{a} (k) | |^{2} g_{M}^{2}) | | Λ_{a} (k) \\ + \frac{[1 - η_{a} | | ϕ_{a} (k) | |^{2} g (k)] \hat{L} (k)}{g_{m} - η_{a} | | ϕ_{a} (k) | |^{2} g_{M}^{2}} | |^{2} . \end{aligned}

(A8) By using the learning law of MiFRENc in (Equation32),

Δ V_{3} (k)

is derived as

(A9)

\begin{aligned} Δ V_{3} (k) & = \frac{ρ_{3}}{η_{c}} [{\tilde{β}}_{c}^{T} (k + 1) {\tilde{β}}_{c} (k + 1) - {\tilde{β}}_{c}^{T} (k) {\tilde{β}}_{c} (k)] \\ = \frac{ρ_{3}}{η_{c}} [- 2 η_{c} δ e_{c} (k) {\tilde{β}}_{c}^{T} (k) ϕ_{c} (k) + η_{c}^{2} δ^{2} e_{c}^{2} (k) | | ϕ_{c} (k) | |^{2}] \\ = - 2 ρ_{3} δ Λ_{c} (k) e_{c} (k) + ρ_{3} η_{c} δ^{2} | | ϕ_{c} (k) | |^{2} e_{c}^{2} (k) . \end{aligned}

(A9) Recalling

e_{c} (k)

in (Equation28) with

\pm δ L (k)

and

\pm L (k - 1)

and using (EquationA10), it yields

(A10)

\begin{aligned} e_{c} (k) & = δ [\hat{L} (k) - L (k)] + δ L (k) - [\hat{L} (k - 1) - L (k - 1)] - L (k - 1) + l (k) \\ = δ [{\hat{β}}_{c}^{T} (k) ϕ_{c} (k) - β_{c}^{T} ϕ_{c} (k) - ε_{c} (k)] + δ L (k) - L (k - 1) + l (k) \\ - [{\hat{β}}_{c}^{T} (k - 1) F_{c} (k - 1) - β_{c}^{T} ϕ_{c} (k - 1) - ε_{c} (k - 1)] \\ = δ {\tilde{β}}_{c}^{T} (k) ϕ_{c} (k) - {\tilde{β}}_{c}^{T} (k - 1) ϕ_{c} (k - 1) + δ L (k) - L (k - 1) \\ + l (k) - δ ε_{c} (k) + ε_{c} (k - 1) \\ = δ Λ_{c} (k) - Λ_{c} (k - 1) + δ L (k) - L (k - 1) + l (k) - δ ε_{c} (k) \\ + ε_{c} (k - 1) \end{aligned}

(A10) or

(A11)

δ Λ_{c} (k) = e_{c} (k) - δ L (k) + Λ_{c} (k - 1) + L (k - 1) - l (k) + δ ε_{c} (k) - ε_{c} (k - 1) .

(A11) By using (EquationA11) and (Equation16), (EquationA9) can be derived as

(A12)

\begin{aligned} Δ V_{3} (k) & = - 2 ρ_{3} e_{c} (k) [e_{c} (k) - δ L (k) + Λ_{c} (k - 1) + L (k - 1) - l (k) \\ + δ ε_{c} (k) - ε_{c} (k - 1)] + ρ_{3} η_{c} δ^{2} | | ϕ_{c} (k) | |^{2} e_{c}^{2} (k) \\ = - ρ_{3} [1 - η_{c} δ^{2} | | ϕ_{c} (k) | |^{2}] e_{c}^{2} (k) - ρ_{3} e_{c}^{2} (k) + 2 ρ_{3} e_{c} (k) [δ L (k) \\ - Λ_{c} (k - 1) - L (k - 1) + l (k) - δ ε_{c} (k) + ε_{c} (k - 1)] \\ = - ρ_{3} [1 - η_{c} δ^{2} | | ϕ_{c} (k) | |^{2}] e_{c}^{2} (k) - ρ_{3} δ^{2} Λ_{c}^{2} (k) + ρ_{3} [δ L (k) \\ - Λ_{c} (k - 1) - L (k - 1) + l (k) - δ ε_{c} (k) + ε_{c} (k - 1)]^{2} \\ \leq - ρ_{3} [1 - η_{c} δ^{2} | | ϕ_{c} (k) | |^{2}] e_{c}^{2} (k) - ρ_{3} δ^{2} Λ_{c}^{2} (k) + \frac{ρ_{3}}{4} Λ_{c}^{2} (k - 1) \\ + \frac{ρ_{3}}{4} l^{2} (k) + \frac{ρ_{3}}{4} [δ L (k) - L (k - 1)]^{2} \\ + \frac{ρ_{3}}{4} [δ ε_{c} (k) - ε_{c} (k - 1)]^{2} \\ \leq - ρ_{3} [1 - η_{c} δ^{2} | | F_{c} (k) | |^{2}] e_{c}^{2} (k) - ρ_{3} δ^{2} Λ_{c}^{2} (k) + \frac{ρ_{3}}{4} Λ_{c}^{2} (k - 1) \\ + \frac{ρ_{3}}{4} p e^{2} (k) + \frac{ρ_{3}}{8} q Λ_{a}^{2} (k) + \frac{ρ_{3}}{8} | | β_{a}^{T} (k) ϕ_{a} (k) | |^{2} \\ + \frac{ρ_{3}}{4} [δ L (k) - L (k - 1)]^{2} + ρ_{3} ε_{c M}^{2} . \end{aligned}

(A12) Finally,

Δ V_{4} (k)

is formulated as

(A13)

Δ V_{4} (k) = ρ_{4} [Λ_{c}^{2} (k) - Λ_{c}^{2} (k - 1)] .

(A13) Recalling (EquationA6), (EquationA8), (EquationA12) and (EquationA13),

Δ V (k)

is rewritten as

(A14)

\begin{aligned} Δ V (k) & \leq - \frac{ρ_{1}}{3} e^{2} (k) + ρ_{1} g_{M}^{2} Λ_{a}^{2} (k) + ρ_{1} d_{M}^{2} \\ - ρ_{2} g_{m} Λ_{a}^{2} (k) - ρ_{2} (g_{m} - η_{a} | | ϕ_{a} (k) | |^{2} g_{M}^{2}) | | Λ_{a} (k) + \\ \frac{[1 - η_{a} | | ϕ_{a} (k) | |^{2} g (k)] L (k)}{g_{m} - η_{a} | | ϕ_{a} (k) | |^{2} g_{M}^{2}} | |^{2} + \frac{ρ_{2}}{g_{m}} L^{2} (k) \\ - ρ_{3} [1 - η_{c} δ^{2} | | ϕ_{c} (k) | |^{2}] e_{c}^{2} (k) - ρ_{3} δ^{2} Λ_{c}^{2} (k) + \frac{ρ_{3}}{4} Λ_{c}^{2} (k - 1) \\ + \frac{ρ_{3}}{4} p e^{2} (k) + \frac{ρ_{3}}{8} q Λ_{a}^{2} (k) + \frac{ρ_{3}}{8} | | β_{a}^{T} ϕ_{a} (k) | |^{2} \\ + \frac{ρ_{3}}{4} [δ L (k) - L (k - 1)]^{2} + ρ_{3} ε_{c M}^{2} \\ ρ_{4} [Λ_{c}^{2} (k) - Λ_{c}^{2} (k - 1)] \\ \leq - [\frac{ρ_{1}}{3} - \frac{ρ_{3}}{4} p] e^{2} (k) - [ρ_{2} g_{m} - ρ_{1} g_{M}^{2} - \frac{ρ_{3}}{8} q] Λ_{a}^{2} (k) \\ - [ρ_{3} δ^{2} - ρ_{4}] Λ_{c}^{2} (k) - [ρ_{4} - \frac{ρ_{3}}{4}] Λ_{c}^{2} (k - 1) \\ - ρ_{3} [1 - η_{c} δ^{2} | | ϕ_{c} (k) | |^{2}] e_{c}^{2} (k) \\ - ρ_{2} [g_{m} - η_{a} | | ϕ_{a} (k) | |^{2} g_{M}^{2}] | | Λ_{a} (k) + \frac{[1 - η_{a} | | ϕ_{a} (k) | |^{2} g (k)] L (k)}{g_{m} - η_{a} | | ϕ_{a} (k) | |^{2} g_{M}^{2}} | |^{2} \\ + V_{M} \\ \leq - V_{e} e^{2} (k) - V_{a} Λ_{a}^{2} (k) - V_{c 0} Λ_{c}^{2} (k) - V_{c 1} Λ_{c}^{2} (k - 1) \\ - V_{c} e_{c}^{2} (k) + V_{M}, \end{aligned}

(A14) where

(A15)

\begin{aligned} V_{M} & \geq ρ_{1} d_{m}^{2} + ρ_{3} ε_{c M}^{2} + \frac{ρ_{3}}{8} β_{a M}^{2} + [\frac{ρ_{3}}{8} (γ - 1)^{2} + \frac{ρ_{2}}{g_{o}}] L_{M}^{2} \\ - ρ_{2} [g_{m} - η_{a} | | ϕ_{a} (k) | |^{2} g_{M}^{2}] | | Λ_{a} (k) \\ + \frac{[1 - η_{a} | | ϕ_{a} (k) | |^{2} g (k)] L (k)}{g_{m} - η_{a} | | ϕ_{a} (k) | |^{2} g_{M}^{2}} | |^{2}, \end{aligned}

(A15)

(A16)

\begin{aligned} V_{e} & = \frac{ρ_{1}}{3} - \frac{ρ_{3}}{4} p, \end{aligned}

(A16)

(A17)

\begin{aligned} V_{a} & = ρ_{2} g_{m} - ρ_{1} g_{M}^{2} - \frac{ρ_{3}}{8} q, \end{aligned}

(A17)

(A18)

\begin{aligned} V_{c 0} & = ρ_{3} δ^{2} - ρ_{4}, \end{aligned}

(A18)

(A19)

\begin{aligned} V_{c 1} & = ρ_{4} - \frac{ρ_{3}}{4} \end{aligned}

(A19) and

(A20)

V_{c} = ρ_{3} [1 - η_{c} δ^{2} | | ϕ_{c} (k) | |^{2}] .

(A20) According to the conditions in (EquationA2) – (EquationA5),

V_{e}

V_{a}

V_{c 0}

and

V_{c 1}

are always positive. Furthermore, by setting the membership functions of FRENa and MiFRENc, the upper limits exist such that

(A21)

0 < | | ϕ_{a} (k) | |^{2} \leq ν_{a}

(A21) and

(A22)

0 < | | ϕ_{c} (k) | |^{2} \leq ν_{c} .

(A22) Combining with (Equation34) and (Equation35), it leads to

(A23)

g_{m} - η_{a} | | ϕ_{a} (k) | |^{2} g_{M}^{2} > 0

(A23) and

(A24)

V_{M} \geq ρ_{1} d_{m}^{2} + ρ_{3} ε_{c M}^{2} + \frac{ρ_{3}}{8} β_{a M}^{2} + [\frac{ρ_{3}}{8} (γ - 1)^{2} + \frac{ρ_{2}}{g_{o}}] L_{M}^{2} .

(A24) By this mean, we have

(A25)

\begin{aligned} | e (k) | \geq Ω_{e} ≐ \sqrt{\frac{V_{M}}{V_{e}}}, \end{aligned}

(A25)

(A26)

\begin{aligned} | Λ_{a} (k) | \geq Ω_{a} ≐ \sqrt{\frac{V_{M}}{V_{a}}} \end{aligned}

(A26) and

(A27)

| Λ_{c} (k) | \geq Ω_{c} ≐ \sqrt{\frac{V_{M}}{V_{c 0}}} .

(A27) The proof is completed here.

◻

Output Feedback Controller for a Class of Unknown Nonlinear Discrete Time Systems Using Fuzzy Rules Emulated Networks and Reinforcement Learning

Abstract

1. Introduction

2. Controlled Plant as a Class of Nonlinear Discrete-Time Systems

3. RL Controller

3.1. Controller or Action Network

3.2. Estimated Cost–Function or Critic Network

Table 1. MiFRENc: IF–THEN rules.

4. Learning Algorithms and Performance Analysis

4.1. Action Network Learning Law

4.2. Critic Network Learning Law

4.3. Closed-Loop Analysis

5. Simulation and Experimental Systems

5.1. Simulation System and Results

Table 2. Initial setting $β_{◻} (1)$ : simulation system.

5.2. Experimental System and Results

Table 3. Initial setting $β_{◻} (1)$ : experimental system.

6. Conclusions

Disclosure statement

Notes on contributors

C. Treesatayapun

References

Appendix 1.

Proof of Theorem 4.1

Information for

Open access

Opportunities

Help and information

Output Feedback Controller for a Class of Unknown Nonlinear Discrete Time Systems Using Fuzzy Rules Emulated Networks and Reinforcement Learning

Abstract

1. Introduction

2. Controlled Plant as a Class of Nonlinear Discrete-Time Systems

3. RL Controller

3.1. Controller or Action Network

3.2. Estimated Cost–Function or Critic Network

Table 1. MiFRENc: IF–THEN rules.

4. Learning Algorithms and Performance Analysis

4.1. Action Network Learning Law

4.2. Critic Network Learning Law

4.3. Closed-Loop Analysis

5. Simulation and Experimental Systems

5.1. Simulation System and Results

Table 2. Initial setting β◻(1): simulation system.

5.2. Experimental System and Results

Table 3. Initial setting β◻(1): experimental system.

6. Conclusions

Disclosure statement

Additional information

Funding

Notes on contributors

C. Treesatayapun

References

Appendix 1.

Proof of Theorem 4.1

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 2. Initial setting $β_{◻} (1)$ : simulation system.

Table 3. Initial setting $β_{◻} (1)$ : experimental system.