Search in:

Journal of Turbulence Volume 21, 2020 - Issue 9-10: The machine learning: Guest edited by Federico Toschi

Submit an article Journal homepage

Open access

3,521

Views

CrossRef citations to date

Altmetric

Articles

Controlling Rayleigh–Bénard convection via reinforcement learning

Gerben Beintemaa Department of Applied Physics, Eindhoven University of Technology, Eindhoven, The Netherlands

Alessandro Corbettaa Department of Applied Physics, Eindhoven University of Technology, Eindhoven, The NetherlandsCorrespondence[email protected]

Luca Biferaleb Department of Physics and INFN, University of Rome Tor Vergata, Rome, Italy

Federico Toschia Department of Applied Physics, Eindhoven University of Technology, Eindhoven, The Netherlands;c Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands;d CNR-IAC, Rome, Italy

Pages 585-605 | Received 31 Mar 2020, Accepted 30 Jun 2020, Published online: 29 Jul 2020

Cite this article
https://doi.org/10.1080/14685248.2020.1797059
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF

Figures & data

Figure 1. Schematics of linear (a) and Reinforcement Learning (b) control methods applied to a Rayleigh–Bénard system and aiming at reducing convective effects (i.e. the Nusselt number). The system consists of a domain with height, H, aspect ratio, $Γ = 1$ , no-slip boundary conditions, constant temperature $T_{C}$ on the top boundary, adiabatic side boundaries and a controllable bottom boundary where the imposed temperature $T_{B} (x, t)$ can vary in space and time (according to the control protocol) while keeping a constant average $⟨ T_{B} (x, t) ⟩_{x} = T_{H}$ . Because the average temperature of the bottom plate is constant, the Rayleigh number is well defined and constant over time. The control protocol of the linear controller (a) works by calculating the distance measure $E (x, t)$ from the ideal state (cf. Equation Equation10(10) ${\hat{T}}_{B} (x, t) = R ({\tilde{T}}_{B} (x, t)) = - R ((k_{P} - k_{D} \partial_{t}) E (x, t))$ (10) –Equation11(11) $E (x, t) = ⟨ u_{y} (x, y, t) ⟩_{y} / V_{0},$ (11) ) and, based on linear relations, applies temperature corrections to the bottom plate. The RL method (Figure b) uses a Neural Network controller which acquires flow state from a number of probes at fixed locations and returns a temperature profile (see details in Figure ). The parameters of the Neural Network are automatically optimised by the RL method, during training. Moreover, in the RL case, the temperature fluctuations at the bottom plate are piece-wise constant and can have only prefixed temperature value between two states, hot or cold (cf. Equation Equation12(12) ${\tilde{T}}_{k} \in {C, - C} .$ (12) ). Operating with discrete actions reduces significantly the complexity and the computational resources needed for training.

$Figure 1. Schematics of linear (a) and Reinforcement Learning (b) control methods applied to a Rayleigh–Bénard system and aiming at reducing convective effects (i.e. the Nusselt number). The system consists of a domain with height, H, aspect ratio, Γ=1, no-slip boundary conditions, constant temperature TC on the top boundary, adiabatic side boundaries and a controllable bottom boundary where the imposed temperature TB(x,t) can vary in space and time (according to the control protocol) while keeping a constant average ⟨TB(x,t)⟩x=TH. Because the average temperature of the bottom plate is constant, the Rayleigh number is well defined and constant over time. The control protocol of the linear controller (a) works by calculating the distance measure E(x,t) from the ideal state (cf. Equation Equation10(10) T^B(x,t)=R(T~B(x,t))=−R((kP−kD∂t)E(x,t))(10) –Equation11(11) E(x,t)=⟨uy(x,y,t)⟩y/V0,(11) ) and, based on linear relations, applies temperature corrections to the bottom plate. The RL method (Figure 1b) uses a Neural Network controller which acquires flow state from a number of probes at fixed locations and returns a temperature profile (see details in Figure 2). The parameters of the Neural Network are automatically optimised by the RL method, during training. Moreover, in the RL case, the temperature fluctuations at the bottom plate are piece-wise constant and can have only prefixed temperature value between two states, hot or cold (cf. Equation Equation12(12) T~k∈{C,−C}.(12) ). Operating with discrete actions reduces significantly the complexity and the computational resources needed for training.$

Figure 2. Sketch of the neural network determining the policy adopted by the RL-based controller (policy network, π, cf. Figure (b) for the complete setup). The input of the policy network is a state vector, $s$ , which stacks the values of temperature and both velocity components for the current and the previous D−1 = 3 timesteps. Temperature and velocity are read on an evenly spaced grid of size $G_{X} = 8$ by $G_{Y} = 8$ . Hence, $s$ has dimension $n_{f} \cdot G_{X} \cdot G_{Y} \cdot D = 3 \cdot 8 \cdot 8 \cdot 4 = 768$ . The policy network π is composed of two fully connected feed forward layers with $n_{neurons} = 64$ neurons and $σ (\cdot) = tanh (\cdot)$ activation. The network output is provided by one fully connected layer with $σ (\cdot) = ϕ (\cdot)$ activation (Equation Equation16(16) $y_{k} = ϕ (z_{k}) = \frac{1}{1 + \exp (- z_{k})}, k = 1, 2, \dots, n_{s}$ (16) ). This returns the probability vector $y = [y_{1}, y_{2}, \dots, y_{n_{s}}]$ . The kjth bottom segment has temperature C with probability $y_{k}$ (Equation Equation17(17) $π_{k} ({\tilde{T}}_{k} = + C | s_{i}) = Bernoulli (p = y_{k}) .$ (17) ). This probability distribution gets sampled to produce a proposed temperature profile $\tilde{T} = ({\tilde{T}}_{1}, {\tilde{T}}_{2}, \dots, {\tilde{T}}_{n_{s}})$ . The final temperature fluctuations ${\hat{T}}_{1}, {\hat{T}}_{2}, \dots, {\hat{T}}_{n_{s}}$ are generated with the normalization step in Equation (EquationA7(A7) $\begin{aligned} {\hat{T}}_{B} (x, t) = \frac{{\tilde{T}}_{B}^{″} (x, t)}{max_{x^{'}} (1, | {\tilde{T}}_{B}^{″} (x, t) | / C)} . \end{aligned}$ (A7) ) (cf. Equations (Equation7(7) $T_{H} = ⟨ T_{B} (x, t) ⟩_{x},$ (7) ) and (Equation9(9) ${\hat{T}}_{B} (x, t) | \leq C \forall x, t,$ (9) )). The temperature profile obtained is applied to the bottom plate during a time interval $Δ t$ (control loop), after which the procedure is repeated.

Table 1. Rayleigh–Bénard system and simulation parameters.

Display Table

Table 2. Parameters used for the linear controls in Lattice Boltzmann units (cf. Equation (Equation10(10) ${\hat{T}}_{B} (x, t) = R ({\tilde{T}}_{B} (x, t)) = - R ((k_{P} - k_{D} \partial_{t}) E (x, t))$ (10) ); note: a P controller is obtained by setting $k_{D} = 0$ ). At $Ra = 10^{7}$ we were unable to find PD controllers performing better than P controllers, which were anyway ineffective.

Display Table

Figure 3. Comparison of Nusselt number, averaged over time and ensemble, measured for uncontrolled and controlled Rayleigh–Bénard systems (note that all the systems are initialised in a natural convective state). We observe that the critical Rayleigh number, $Ra_{c}$ , increases when we control the system, with $Ra_{c} = 10^{4}$ in case of state-of-the-art linear control and ${Ra}_{c} = 3 \cdot 10^{4}$ in case of the RL-based control. Furthermore, for $Ra > {Ra}_{c}$ , the RL control achieves a Nusselt number consistently lower than what measured in case of the uncontrolled system and for state-of-the-art linear controls (P and PD controllers have comparable performance at all the considered Rayleigh numbers, see also [Citation13]). The error bars are estimated as $μ_{Nu} / \sqrt{N}$ , where N = 161 is the number of statistically independent samplings of the Nusselt number.

Figure 4. Time evolution of the Nusselt number at four different Rayleigh regimes, with the control starting at t = 0. The time axis is in units of control loop length, $Δ t$ (cf. Table ). Up to $Ra = 3 \cdot 10^{4}$ (a,b), the RL control is able to stabilise the system (i.e. $Nu \approx 1$ ), which is in contrast with linear methods that result in a unsteady flow. At $Ra = 10^{5}$ (c), the RL control is also unable to fully stabilise the system, yet, contrarily to the linear case, it still results in a flow having stationary Nu. For $Ra = 10^{6}$ (d) the performance of RL is not as stable as at lower $Ra$ , the control however still manages to reduce the average Nusselt number significantly. (a) $Ra = 1 \cdot 10^{4}$ , (b) $Ra = 3 \cdot 10^{4}$ , (c) $Ra = 1 \cdot 10^{5}$ and (d) $Ra = 1 \cdot 10^{6}$ .

Figure 5. Instantaneous stream lines at different times, at $Ra = 10^{5}$ , comparison of cases without control (a), with linear control (b) and with RL-based control (c). Note that the time is given in units of $Δ t$ (i.e. control loop length, cf. Table . Note that the snapshots are taken at different instants). RL controls establish a flow regime like a ‘double cell mode’ which features a steady Nusselt number (see Figure c). This is in contrast with linear methods which rather produce a flow with fluctuating Nusselt. The double cell flow field has a significantly lower Nusselt number than the uncontrolled case, as heat transport to the top boundary can only happen via diffusion through the interface between the two cells. This ‘double cell’ control strategy is established by the RL control with any external supervision.

Figure 6. Instantaneous stream lines at $Ra = 3 \cdot 10^{6}$ , comparison of cases without control (a), with linear control (b), and with RL-based control (c). We observe that the RL control still tries to enforce a ‘double cell’ flow, as in the lower Rayleigh case (Figure ). The structure is, however, far less pronounced and convective effects are much stronger. This is likely due to the increased instability and chaoticity of the system, which increases the learning and controlling difficulty. Yet, we can observe a small alteration in the flow structure (cf. lower cell, larger in size than in uncontrolled conditions) which results in a slightly lower Nusselt number.

Figure 6. Instantaneous stream lines at Ra=3⋅106, comparison of cases without control (a), with linear control (b), and with RL-based control (c). We observe that the RL control still tries to enforce a ‘double cell’ flow, as in the lower Rayleigh case (Figure 5). The structure is, however, far less pronounced and convective effects are much stronger. This is likely due to the increased instability and chaoticity of the system, which increases the learning and controlling difficulty. Yet, we can observe a small alteration in the flow structure (cf. lower cell, larger in size than in uncontrolled conditions) which results in a slightly lower Nusselt number.

Figure 7. Performance loss due to an artificial delay imposed to an RL controller operating on the Lorenz attractor. The controller, operating on the y variable of the system (reduced model for the RBC horizontal temperature fluctuations) aims at either maximising (LA oscillator) or minimising (LA stabiliser) the number of sign changes of the x (in RBC terms the angular velocity of the flow). The delay, on the horizontal axis, is scaled on the Lyapunov time, $λ^{- 1}$ , of the system (with λ the largest Lyapunov exponent). In case of a delay in the control loop comparable in size to the Lyapunov time, the control performance diminishes significantly.

Figure 8. Average Nusselt number for an RL agent observing the Rayleigh–Bénard environment at different probe density, all of which are lower than the baseline employed in Section . The grid sizes (i.e information) used to sample the state of the system at the Ra considered does not seem to play key role in limiting the final control performance of the RL agent.

Table A1. Rayleigh–Bénard environments considered. For each Rayleigh number we report the LBM grid employed (size $N_{X} \times N_{Y}$ ), the uncontrolled Nusselt number measured from LBM simulations and a validation reference from the literature.

Display Table

Figure A1. Performance of RL during training (case of the Rayleigh–Bénard system). We report the average Nusselt number and its fluctuations computed among a batch of 512 concurrent training environments. (a) ‘low’ Ra ( $0 ≲ Ra ≲ 3 \cdot 10^{4}$ ) in which the control achieves $Nu \approx 1$ in a stable way, (b)‘mid’ Ra ( $3 \cdot 10^{4} ≲ Ra ≲ 1 \cdot 10^{6}$ ) which still gives stable learning behaviour but converges to $Nu > 1$ and, lastly, (c) ‘high’ Ra ( $1 \cdot 10^{6} ≲ Ra$ ) in which the higher chaoticity of the system makes a full stabilization impossible. (a) $Ra = 3 \cdot 10^{4}$ (‘low’). (b) $Ra = 3 \cdot 10^{5}$ (‘mid’) and (c) $Ra = 3 \cdot 10^{6}$ (‘high’).

Figure A2. System trajectories with RL control, respectively aiming at minimising (a) and maximising (b) the number of x sign changes. In (a.1) and (b.1), we report two reference uncontrolled cases which differ only by the initialization state, $x (- 5), y (- 5), z (- 5)$ . Panel (a) shows that the RL agent is able to fully stabilise the system on an unstable equilibrium by using a complex strategy in three steps (I: controlling the system such that it approaches x, y, z = 0 which results in a peak (II) which after going through x = 0 ends close enough to an unstable equilibrium (III) such that the control is able to fully stabilise the system). Furthermore, Figure (a) shows that the RL agent is able to find and stabilise a unstable periodic orbits with a desired property of a high frequency of sign changes of x.

Remillieux MC, Zhao H, Bau HH. Suppression of Rayleigh–Bénard convection with proportional-derivative controller. Phys Fluids. 2007;19(1):017102. doi: 10.1063/1.2424490

Web of Science ®Google Scholar

Ouertatani N, Cheikh NB, Beya BB, et al. Numerical simulation of two-dimensional Rayleigh–Bénard convection in an enclosure. Comp Rend Mécaniq. 2008;336(5):464–470. doi: 10.1016/j.crme.2008.02.004

Web of Science ®Google Scholar

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Controlling Rayleigh–Bénard convection via reinforcement learning

Table 1. Rayleigh–Bénard system and simulation parameters.

Table A1. Rayleigh–Bénard environments considered. For each Rayleigh number we report the LBM grid employed (size $N_{X} \times N_{Y}$ ), the uncontrolled Nusselt number measured from LBM simulations and a validation reference from the literature.

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Controlling Rayleigh–Bénard convection via reinforcement learning

Figures & data

Table 1. Rayleigh–Bénard system and simulation parameters.

Table A1. Rayleigh–Bénard environments considered. For each Rayleigh number we report the LBM grid employed (size NX×NY), the uncontrolled Nusselt number measured from LBM simulations and a validation reference from the literature.

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Table A1. Rayleigh–Bénard environments considered. For each Rayleigh number we report the LBM grid employed (size $N_{X} \times N_{Y}$ ), the uncontrolled Nusselt number measured from LBM simulations and a validation reference from the literature.