Full article: AI-based Lagrange optimization for designing reinforced concrete columns

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Structural engineers face several code-restricted design decisions. Codes impose many conditions and requirements to the designs of structural frames, such as columns and beams. However, it is difficult to intuitively find optimized solutions, while satisfying all code requirements simultaneously. Engineers commonly make design decisions based on empirical observations. Optimization techniques can be employed to make more rational engineering decisions, which result in designs that can meet various code restrictions simultaneously. Lagrange optimization techniques with constraints, not based on explicit parameterization, are implemented to make rational engineering decisions and find minimized or maximized design values by solving nonlinear optimization problems under strict constraints imposed by design codes. It is difficult to express objective functions analytically directly in terms of design variables to use derivative methods, such as Lagrange multipliers. This study proposes the use of neural network to approximate well-behaved objective functions and other output parameters into one universal function that can also give a generalizable solution for operating Jacobian and Hessian matrices to solve the Lagrangian function. The proposed method was applied successfully in optimizing a cost of a reinforced concrete column under various design requirements. An efficacy of optimal results was also proven by 5 million datasets.

Graphical Abstract

KEYWORDS:

1. Introduction

Several studies, including (Aghaee, Yazdi, and Tsavdaridis Citation2014), (Fanaie, Aghajani, and Dizaj Citation2016), (Madadi et al. Citation2018), (Nasrollahi et al. Citation2018), and (Paknahad, Bazzaz, and Khorami Citation2018), have been conducted to optimize reinforced concrete (RC) structures. These studies mainly focused on minimizing the manufacturing and construction costs; only a few considered structural capabilities against external forces, which are influenced by design codes. Studies of the optimization of RC beams have been reported by (Shariati et al. Citation2010), (Fanaie, Aghajani, and Shamloo Citation2012), (Toghroli et al. Citation2014), (Awal, Shehu, and Ismail Citation2015), (Kaveh and Shokohi Citation2015), (Safa et al. Citation2016), (Shah et al. Citation2016), (Korouzhdeh, Eskandari-Naddaf, and Gharouni-Nik Citation2017), and (Heydari and Shariati Citation2018).

(Barros, Martins, and Barros Citation2005) presented stress–strain diagrams using the conventional Lagrangian multiplier method (LMM) to develop nominal moment strengths based on the optimal area of the upper and lower sections of steel for four classes of concrete. (Shariat et al. Citation2018) obtained analytical objective functions for the cost of frames as a function of the design parameters for structural systems in a limited circumstance; however, it is generally very difficult to derive analytical objective functions that represent the entire behavior of structural components such as columns and beams. (Villarrubia et al. Citation2018) employed artificial neural networks (ANNs) to approximate objective functions to optimized analytical objective functions. They approximated objective functions using nonlinear regression that can optimize problems and employing a multilayer perceptron when the use of linear programming or Lagrange multipliers was not feasible.

2. Significance of the study

There are numerous available computer-aided engineering tools, including CAD packages, FEM software, self-write calculation codes, that are used to study the performance of structures. Objective functions, however, could constitute a mixture of numerical simulations, analytical calculations, and catalog selections, which makes it difficult to apply differentiation to derivative optimization methods, such as the Lagrange multiplier. Some nonderivative optimization methods such as Genetic algorithms were applied in structural design problems (Rajeev and Krishnamoorthy Citation1998)(Camp, Pezeshk, and Hansson Citation2003) as they do not require any derivatives to find an optimal solution. However, computing times of nonderivative methods heavily rely on the computational speed of engineering tools because each trial requires one run of software. In this study, the use of artificial neural networks was adopted to universally approximate objective functions obtained from any computer-aided engineering tools. New objective functions, hence, not only enhance the computational speed compared to conventional software but also can be differentiated and implemented in Lagrange optimization.

Optimization and sensitivity analyses using computational LMM can be based on ANNs. The optimized results were verified using rectangular RC columns. The analysis was conducted to obtain the minimum design cost for reinforced concrete columns, as specified by the American Concrete Institute (ACI) regulations (2014) (ACI-318 Citation2014). Moreover, a sensitivity analysis was performed on the cost with respect to the effective parameters, including the rebar ratios and failure criteria. Accordingly, various failure criteria were developed to be used in designing RC columns. Numerical examples were also presented to better illustrate the design steps. Complex but inaccurate analytical objective functions, such as describing the cost of structural frame and emissions of CO₂, were replaced by ANNs-based objective functions. The sensitivity analysis of the LMM revealed that the best optimal values based on constraints can be identified for specific situations. Optimization using artificial intelligence (AI)-based objective functions based on large datasets without the need for primary optimization knowledge can effectively aid the selection of design parameters for best practices.

The goodness of proposed method is that its performance would be less dependent on problem types such as column, beam, frame, seismic design, etc. but relies on characteristics of big datasets of considered problem. Once big data is good enough to generate approximation objective function as well as other parameters using ANN, an optimization solution is then generalizable by using AI-based Lagrange method. Therefore, applications of proposed method would not restrict to optimizing RC columns only but also can expand to other problems, such as optimizing beam-column connection (Ye et al. Citation2021), GFRP RC Columns (Sun et al. Citation2020), RC shear wall (Zhang et al. Citation2019)(Yazdani, Fakhimi, and Alitalesh Citation2018) subjected to lateral impact loading (Zhang, Gholipour, and Mousavi Citation2021), or even severe impulsive loading (Abedini and Zhang Citation2021), etc.

3. Lagrange procedures based on ANNs

Joseph–Louis LMM optimizes objective functions with constraints, identifying the saddle point of the Lagrange function, as mentioned by (Walsh Citation1975) and (Kalman Citation2009), which can be identified among local stationary points based on the Hessian matrix (which are differentiated twice) (Silberberg and Wing. Citation2001). To find the stationary (saddle) points of a Lagrangian function, the function must be formed as a function of the constraining input variables and the Lagrange multiplier $λ$ ((Protter and Morrey Citation1985)). This can be achieved by solving systems of nonlinear differential equations that lead to the identification of the maximum or minimum of the Lagrange function subjected to inequality and equality constraints (Hoffmann and Bradley Citation2004).

3.1. Optimization using LMM and Newton–Raphson method

LMM finds stationary points (saddle points; maximum or minimum of a Lagrange function) when a Lagrange function, $L$ , is considered as a function of the variables $x = [x_{1}, x_{2}, . . ., x_{n}]^{T}$ and the Lagrange multiplier for both equality and inequality constraints, $λ_{c} = [λ_{c_{1}}, λ_{c_{2}}, . . ., λ_{c_{m}}]^{T}$ and $λ_{v} = {[λ_{v_{1}}, λ_{v_{2}}, \dots, λ_{v_{l}}]}^{T}$ , respectively, as shown in EquationEq. (1)(1) $L (x, λ_{c}, λ_{v}) = f (x) - λ_{c}^{T} c (x) - λ_{v}^{T} S v (x)$ (1) :

(1)

L (x, λ_{c}, λ_{v}) = f (x) - λ_{c}^{T} c (x) - λ_{v}^{T} S v (x)

(1)

where $f (x)$ is a multivariate objective function subjected to equality and inequality constraints, $c (x) = {[c_{1} (x), c_{2} (x), \dots, c_{m} (x)]}^{T} = 0$ and $v (x) = {[v_{1} (x), v_{2} (x), \dots, v_{l} (x)]}^{T} \geq 0$ , respectively. The diagonal matrix of the inequality, $S$ (EquationEq. (2(2) $S = [\begin{matrix} s_{1} & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & s_{l} \end{matrix}]$ (2) )), activates the inequality to equality if the condition of inequality is not satisfied or deactivate it if considered parameters are within the range defined by inequality constraints.

(2)

S = [\begin{matrix} s_{1} & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & s_{l} \end{matrix}]

(2)

where $s_{i}$ is the status of the inequality constraint $v_{i}$ ; $s_{i} = 1$ when $v_{i}$ is active and $s_{i} = 0$ when $v_{i}$ is inactive. The stationary points of Lagrange functions (EquationEq. (1(1) $L (x, λ_{c}, λ_{v}) = f (x) - λ_{c}^{T} c (x) - λ_{v}^{T} S v (x)$ (1) )) can then be identified by solving the partial differential equations with respect to $x$ , $λ_{c}$ , and $λ_{v}$ (EquationEq. (3(3) $\nabla L (x, λ_{c}, λ_{v}) = [\begin{matrix} \begin{matrix} \nabla f (x) - {\b J}_{c} {(x)}^{T} λ_{c} - {\b J}_{v} {(x)}^{T} S λ_{v} \\ - c (x) \end{matrix} \\ - S v (x) \end{matrix}]$ (3) )), finding slopes of the parallel tangential lines of the objective functions, $f (x)$ , and constraints, $c (x)$ and $v (x)$ .

(3)

\nabla L (x, λ_{c}, λ_{v}) = [\begin{matrix} \begin{matrix} \nabla f (x) - {\b J}_{c} {(x)}^{T} λ_{c} - {\b J}_{v} {(x)}^{T} S λ_{v} \\ - c (x) \end{matrix} \\ - S v (x) \end{matrix}]

(3)

where

${\b J}_{c} (x) = [\begin{matrix} \nabla c_{1} (x) \\ \nabla c_{2} (x) \\ ⋮ \\ \nabla c_{m} (x) \end{matrix}]$ and

(4)

{\b J}_{v} (x) = [\begin{matrix} \nabla v_{1} (x) \\ \nabla v_{2} (x) \\ ⋮ \\ \nabla v_{l} (x) \end{matrix}]

(4)

are the Jacobian of the constraint vectors $c$ and $v$ , respectively, at $x$ . The main advantage of Lagrange multipliers is that they are added to constrained optimization problems to convert the problems into unconstrained optimization problems. Lagrangian functions are formulated based on the relationships between the gradients of objective functions and those of the constraints of the original problems (Beavis and Dobbs,Citation1990) such that the derivative test for unconstrained problems can still be applied. Lagrange algorithms linearize restrictions and objective functions at a specific space point by employing derivatives and partial derivatives that are solved based on equality constraints, as shown in EquationEq. (3)(3) $\nabla L (x, λ_{c}, λ_{v}) = [\begin{matrix} \begin{matrix} \nabla f (x) - {\b J}_{c} {(x)}^{T} λ_{c} - {\b J}_{v} {(x)}^{T} S λ_{v} \\ - c (x) \end{matrix} \\ - S v (x) \end{matrix}]$ (3) . The Newton–Raphson method is employed in solving a set of partial differential equations representing a tangential line of Lagrange functions (EquationEqs. (3)(3) $\nabla L (x, λ_{c}, λ_{v}) = [\begin{matrix} \begin{matrix} \nabla f (x) - {\b J}_{c} {(x)}^{T} λ_{c} - {\b J}_{v} {(x)}^{T} S λ_{v} \\ - c (x) \end{matrix} \\ - S v (x) \end{matrix}]$ (3) and (Equation5(5) $\nabla L (x^{(0)} + Δ x, λ_{c}^{(0)} + Δ λ_{c}, λ_{v}^{(0)} + Δ λ_{v}) \approx \nabla L (x^{(0)}, λ_{c}^{(0)}, λ_{v}^{(0)}) + [{\b H}_{L} (x^{(0)}, λ_{c}^{(0)}, λ_{v}^{(0)})] [\begin{matrix} Δ x \\ Δ λ_{c} \\ Δ λ_{v} \end{matrix}]$ (5) )), $\nabla L (x, λ_{c}, λ_{v})$ , which needs to be differentiated one more time to linearize the partial differential Lagrange functions with respect to $x$ , $λ_{c}$ , and $λ_{v}$ (EquationEq. (5(5) $\nabla L (x^{(0)} + Δ x, λ_{c}^{(0)} + Δ λ_{c}, λ_{v}^{(0)} + Δ λ_{v}) \approx \nabla L (x^{(0)}, λ_{c}^{(0)}, λ_{v}^{(0)}) + [{\b H}_{L} (x^{(0)}, λ_{c}^{(0)}, λ_{v}^{(0)})] [\begin{matrix} Δ x \\ Δ λ_{c} \\ Δ λ_{v} \end{matrix}]$ (5) )), leading to finding the stationary points of the Lagrange function, $L (x, λ_{c}, λ_{v})$ . Linear approximation of tangential line of Lagrange functions can be predicted as $x^{0} + Δ x$ , which is very close to $x^{0}$ , as shown in EquationEq. (5(5) $\nabla L (x^{(0)} + Δ x, λ_{c}^{(0)} + Δ λ_{c}, λ_{v}^{(0)} + Δ λ_{v}) \approx \nabla L (x^{(0)}, λ_{c}^{(0)}, λ_{v}^{(0)}) + [{\b H}_{L} (x^{(0)}, λ_{c}^{(0)}, λ_{v}^{(0)})] [\begin{matrix} Δ x \\ Δ λ_{c} \\ Δ λ_{v} \end{matrix}]$ (5) ), where partial differential equations $, \nabla L (x, λ_{c}, λ_{v})$ , is differentiable at $x^{0}$ . The Newton–Raphson method is based on first-order approximation, which works for any system of equations whose functions are differentiable in the considered region.

(5)

\nabla L (x^{(0)} + Δ x, λ_{c}^{(0)} + Δ λ_{c}, λ_{v}^{(0)} + Δ λ_{v}) \approx \nabla L (x^{(0)}, λ_{c}^{(0)}, λ_{v}^{(0)}) + [{\b H}_{L} (x^{(0)}, λ_{c}^{(0)}, λ_{v}^{(0)})] [\begin{matrix} Δ x \\ Δ λ_{c} \\ Δ λ_{v} \end{matrix}]

(5)

where $[{\b H}_{L} (x^{(0)}, λ_{c}^{(0)}, λ_{v}^{(0)})]$ is the Hessian matrix of Lagrangian $L$ at $x^{(0)}, λ_{c}^{(0)}, λ_{v}^{(0)}$ . Since seeking $\nabla L (x^{(0)} + Δ x, λ_{c}^{(0)} + Δ λ_{c}, λ_{v}^{(0)} + Δ λ_{v}) = 0$ , ${[\begin{matrix} Δ x & Δ λ_{c} & Δ λ_{v} \end{matrix}]}^{T}$ can be computed from EquationEq. (6)(6) $[\begin{matrix} Δ x \\ Δ λ_{c} \\ Δ λ_{v} \end{matrix}] \approx - {[{\b H}_{L} (x^{(0)}, λ_{c}^{(0)}, λ_{v}^{(0)})]}^{- 1} \nabla L (x^{(0)}, λ_{c}^{(0)}, λ_{v}^{(0)})$ (6) . Generally, the variable $x$ and Lagrange multipliers $λ$ can be updated after every iteration as EquationEq. (7)(7) $[\begin{matrix} x^{(k + 1)} \\ λ_{c}^{(k + 1)} \\ λ_{v}^{(k + 1)} \end{matrix}] = [\begin{matrix} x^{(k)} \\ λ_{c}^{(k)} \\ λ_{v}^{(k)} \end{matrix}] - {[{\b H}_{L} (x^{(k)}, λ_{c}^{(k)}, λ_{v}^{(k)})]}^{- 1} \nabla L (x^{(k)}, λ_{c}^{(k)}, λ_{v}^{(k)})$ (7) .

(6)

[\begin{matrix} Δ x \\ Δ λ_{c} \\ Δ λ_{v} \end{matrix}] \approx - {[{\b H}_{L} (x^{(0)}, λ_{c}^{(0)}, λ_{v}^{(0)})]}^{- 1} \nabla L (x^{(0)}, λ_{c}^{(0)}, λ_{v}^{(0)})

(6)

(7)

[\begin{matrix} x^{(k + 1)} \\ λ_{c}^{(k + 1)} \\ λ_{v}^{(k + 1)} \end{matrix}] = [\begin{matrix} x^{(k)} \\ λ_{c}^{(k)} \\ λ_{v}^{(k)} \end{matrix}] - {[{\b H}_{L} (x^{(k)}, λ_{c}^{(k)}, λ_{v}^{(k)})]}^{- 1} \nabla L (x^{(k)}, λ_{c}^{(k)}, λ_{v}^{(k)})

(7)

The first derivative, $\nabla L (x^{(k)}, λ_{c}^{(k)}, λ_{v}^{(k)})$ , can be obtained from EquationEq. (3(3) $\nabla L (x, λ_{c}, λ_{v}) = [\begin{matrix} \begin{matrix} \nabla f (x) - {\b J}_{c} {(x)}^{T} λ_{c} - {\b J}_{v} {(x)}^{T} S λ_{v} \\ - c (x) \end{matrix} \\ - S v (x) \end{matrix}]$ (3) ), while the Hessian matrix, $\b [{\b H}_{L} (x^{(k)}, λ_{c}^{(k)}, λ_{v}^{(k)})]$ , of second-order partial derivative of Lagrange function, $L (x, λ_{c}, λ_{v}) = f (x) - λ_{c}^{T} c (x) - λ_{v}^{T} S v (x)$ (EquationEq. (1(1) $L (x, λ_{c}, λ_{v}) = f (x) - λ_{c}^{T} c (x) - λ_{v}^{T} S v (x)$ (1) )), is derived as followed:

(8)

[{\b H}_{L} (x^{(k)}, λ_{c}^{(k)}, λ_{v}^{(k)})] = \break [\begin{matrix} {\b H}_{L} (x^{(k)}) & - {\b J}_{c} {(x^{(k)})}^{T} & - {(S {\b J}_{v} (x^{(k)}))}^{T} \\ - {\b J}_{c} (x^{(k)}) & 0 & 0 \\ - S {\b J}_{v} (x^{(k)}) & 0 & 0 \end{matrix}]

(8)

(9)

{\b H}_{L} (\b x) = {\b H}_{f} (x) - \sum_{i = 1}^{m} λ_{c_{i}} {\b H}_{c_{i}} (x) - \sum_{i = 1}^{l} s_{i} λ_{v_{i}} {\b H}_{v_{i}} (x)

(9)

where ${\b H}_{f} (x)$ , ${\b H}_{c_{i}} (x)$ , and ${\b H}_{v_{i}} (x)$ are Hessian matrices of the objective function ( $f (x)$ ), equality constraint ( $c_{i}$ ), and inequality constraint ( $v_{i}$ ), respectively, with respect to the variable vector ( $x$ ). The procedure for Newton–Raphson approximation is repeated until convergence is achieved.

3.2. Generalization of objective functions and their derivatives using neural network

3.2.1. Formulation of universal approximation function using neural network

One problem in Lagrange optimization is that the objective function, $f (x)$ , and/or the output parameters that appear in constraints, $c (x)$ and $v (x)$ , are sometimes complex or impossible to derive into twice-differentiable functions in order to employ the Lagrange optimization method. Even when it is possible, deriving the Jacobian and Hessian matrices of the objective function, as well as the constraints, is not only difficult and expensive, but also a nongeneralizable solution for any optimization problems (e.g. optimizing columns, beams, and/or any structural systems).

In this study, an artificial neural network was employed to approximate any well-behaved objective functions and other parameters into one universal function, as shown in EquationEq. (10(10) $\begin{aligned} f (x) = g^{(D)} \\ ({f_{l i n}}^{(L)} (W^{(L)} f_{t}^{(L - 1)} (\begin{matrix} W^{(L - 1)} \dots f_{t}^{(1)} \\ (W^{(1)} g^{(N)} (x) + b^{(1)}) \dots + b^{(L - 1)} \end{matrix}) + b^{(L)})) \end{aligned}$ (10) ), which can also give a generalizable solution for the Jacobian and Hessian matrices.

3.2.2. Neural network-based universal approximation function

(10)

\begin{aligned} f (x) = g^{(D)} \\ ({f_{l i n}}^{(L)} (W^{(L)} f_{t}^{(L - 1)} (\begin{matrix} W^{(L - 1)} \dots f_{t}^{(1)} \\ (W^{(1)} g^{(N)} (x) + b^{(1)}) \dots + b^{(L - 1)} \end{matrix}) + b^{(L)})) \end{aligned}

(10)

where $x$ is the input (vector of features); $L$ the number of layers, including hidden and output layers; $W^{(l)}$ the weight matrix between layer $l - 1$ and layer $l$ ; $b^{(l)}$ the bias matrix of layer $l$ ; and $g^{(N)}$ and $g^{(D)}$ the normalization and denormalization functions, respectively. Min-max normalization function as shown in EquationEquation (10b)(10b) $\overset{ˉ}{x} = g^{(N)} (x) = α_{x} (x - x_{m i n}) + {\overset{ˉ}{x}}_{m i n}$ (10b) is conducted in this study.

(10b)

\overset{ˉ}{x} = g^{(N)} (x) = α_{x} (x - x_{m i n}) + {\overset{ˉ}{x}}_{m i n}

(10b)

(10c)

α_{x} = \frac{{\overset{ˉ}{x}}_{m a x} - {\overset{ˉ}{x}}_{m i n}}{x_{m a x} - x_{m i n}}

(10c)

where $\overset{ˉ}{x}$ is a normalization of $x$ between the minimum and maximum value of ${\overset{ˉ}{x}}_{m i n} = - 1$ and ${\overset{ˉ}{x}}_{m a x} = 1$ , respectively. A coefficient $α_{x}$ is the ratio of normalization data range ( ${\overset{ˉ}{x}}_{m a x} - {\overset{ˉ}{x}}_{m i n}$ ) to original data range ( $x_{m a x} - x_{m i n}$ ) as shown in EquationEq. (10c)(10c) $α_{x} = \frac{{\overset{ˉ}{x}}_{m a x} - {\overset{ˉ}{x}}_{m i n}}{x_{m a x} - x_{m i n}}$ (10c) . Similarly, a denormalization function is expressed in EquationEq. (10d)(10d) $x = g^{D} (\overset{ˉ}{x}) = \frac{1}{α_{x}} (x - {\overset{ˉ}{x}}_{m i n}) + x_{m i n}$ (10d) .

(10d)

x = g^{D} (\overset{ˉ}{x}) = \frac{1}{α_{x}} (x - {\overset{ˉ}{x}}_{m i n}) + x_{m i n}

(10d)

The activation functions (tansig, tanh) shown in , $f_{t}^{l}$ at layer $l$ , were implemented to formulate nonlinear relationships between the networks. As mentioned by Goodfellow (Bengio, Goodfellow, and Courville Citation2017), the hyperbolic tangent activation function (tansig, tanh) generally performs better than a sigmoid activation function. The function takes any real value as input and outputs values in the range −1 to 1. The bigger the input, the closer the output value to 1.0, whereas the smaller the input the closer the output to −1.0. also illustrates first and second derivatives of tansig/tanh activation function, which are needed for Jacobian and Hessian calculations. A linear activation function, $f_{l i n}^{(L)}$ , was selected for the output layer because the output values are unbounded. For example, an output of safety factor ( $S F$ ) varies from 0.5 (normalized as −1) to 2 (normalized as 1) in training datasets, however, $S F$ could be either greater than 2 (normalized as 1) or small than 0.5 (normalized as −1) depending on design values. sigmoid or reLu activation functions for an output layer are even worse for this case because their lower bounds are 0 for any normalized safety factor smaller than 0 (denormalized as 1.25), which means they may have vanishing problems in these ranges. Linear activation at the output layer, on the other hand, predicts output values which is not influenced by activation functions.

Figure 1. Tansig activation function and its derivatives.

3.2.3. Formulation of Jacobian matrix for universal approximation function

In neural networks, universal approximation functions can also be expressed as a series of composite mathematical operations, as shown in EquationEq. (11a)(11a) $y = f (x) = z^{(D)} \circ z^{(L)} \circ z^{(L - 1)} \circ \dots \circ z^{(1)} \circ z^{(N)}$ (11a) :

(11a)

y = f (x) = z^{(D)} \circ z^{(L)} \circ z^{(L - 1)} \circ \dots \circ z^{(1)} \circ z^{(N)}

(11a)

where $z^{l}$ is the output vector at layer $l$ , which can be calculated using EquationEq. (11b)(11b) $y = z^{(D)} = g^{(D)} (y) = \frac{1}{α_{y}} (z^{(L)} - {\overset{ˉ}{y}}_{m i n}) + y_{m i n}$ (11b) :

z^{(N)} = g^{(N)} (x) = α_{x} ⊙ (x - x_{m i n}) + {\overline{x}}_{m i n}

z^{(1)} = f_{t}^{(1)} (W^{(1)} z^{(N)} + b^{(1)})

z^{(l)} = f_{t}^{(l)} (W^{(l)} z^{(l - 1)} + b^{(l)})

\dots

z^{(L - 1)} = f_{t}^{(L - 1)} (W^{(L - 1)} z^{(L - 2)} + b^{(L - 1)})

z^{(L)} = f_{l i n}^{(L)} (W^{(L)} z^{(L - 1)} + b^{(L)})

(11b)

y = z^{(D)} = g^{(D)} (y) = \frac{1}{α_{y}} (z^{(L)} - {\overset{ˉ}{y}}_{m i n}) + y_{m i n}

(11b)

where $⊙$ denotes the Hadamard (element-wise) product operation. $α_{y}$ , ${\overset{ˉ}{y}}_{m i n}$ , and $y_{m i n}$ are normalization parameters of output $y$ ; whereas $α_{x} = {[α_{x_{1}}, α_{x_{2}}, \dots, α_{x_{n}}]}^{T}$ , $x_{m i n} = {[x_{1, m i n}, x_{2, m i n}, \dots, x_{n, m i n}]}^{T}$ , ${\overline{x}}_{m i n} = {[{\overset{ˉ}{x}}_{1, m i n}, {\overset{ˉ}{x}}_{2, m i n}, \dots, {\overset{ˉ}{x}}_{n, m i n}]}^{T}$ are vectors contained normalization parameters of input vectors $x$ .

In calculus, a chain rule is employed to calculate derivatives of such composite functions, as shown in Eq. (11). Formally, the Jacobian matrix of $z^{(l)}$ with respect to $x$ can be derived as the Jacobian matrix of $z^{(l)}$ with respect to $z^{(l - 1)}$ multiplied by that of $z^{(l - 1)}$ with respect to $x$ (EquationEq. (12(12) $J^{(l)} = \frac{\partial z^{(l)}}{\partial x} = \frac{\partial z^{(l)}}{\partial z^{(l - 1)}} \frac{\partial z^{(l - 1)}}{\partial x} = \frac{\partial z^{(l)}}{\partial z^{(l - 1)}} J^{(l - 1)}$ (12) ))

(12)

J^{(l)} = \frac{\partial z^{(l)}}{\partial x} = \frac{\partial z^{(l)}}{\partial z^{(l - 1)}} \frac{\partial z^{(l - 1)}}{\partial x} = \frac{\partial z^{(l)}}{\partial z^{(l - 1)}} J^{(l - 1)}

(12)

The Jacobian matrix for universal approximation, $J^{(L)}$ , is then computed by forward propagation as follows:

J^{(N)} = \frac{\partial z^{(N)}}{\partial x} = \frac{\partial (α_{x} ⊙ (x - x_{m i n}) + {\overline{x}}_{m i n})}{\partial x} = I_{n} ⊙ α_{x}

J^{(1)} = (1 - {(z^{(1)})}^{2}) ⊙ W^{(1)} J^{(N)}

\dots

J^{(L - 1)} = (1 - {(z^{(l - 1)})}^{2}) ⊙ W^{(L - 1)} J^{(L - 2)}

J^{(L)} = W^{(L)} J^{(L - 1)}

(13)

J^{(D)} = \frac{1}{α_{y}} J^{(L)}

(13)

3.2.4. Formulation of Hessian matrix for universal approximation function

This section summaries derivations of AI based Hessian matrix which was developed by MathWorks Technical Support Department ((MATLAB Citation2020b). The Jacobian matrix of the hidden layer $l$ and $m^{(l)}$ neurons with respect to $n$ input variables ( $x = {[x_{1}, x_{2}, \dots, x_{n}]}^{T}$ ) is a matrix of size $m^{(l)} \times n$ . Then, the corresponding Hessian is a third-order tensor as illustrated in , which could be an expensive operation. A convenient method is to calculate explicitly the slices of the Hessian of the second derivative and then obtain the full Hessian by accurately reshaping the slices ((MATLAB Citation2020b)). A slice of the Hessian, $H_{i}^{(l)}$ , is a derivative of Jacobian $J^{(l)}$ with respect to one of the input elements, $x_{i}$ (EquationEq. (14)(14) $H_{i}^{(l)} = \frac{\partial^{2} z^{(l)}}{\partial x_{i} \partial z^{(l - 1)}} J^{(l - 1)} + \frac{\partial z^{(l)}}{\partial z^{(l - 1)}} H_{i}^{(l - 1)}$ (14) ).

(14)

H_{i}^{(l)} = \frac{\partial^{2} z^{(l)}}{\partial x_{i} \partial z^{(l - 1)}} J^{(l - 1)} + \frac{\partial z^{(l)}}{\partial z^{(l - 1)}} H_{i}^{(l - 1)}

(14)

Figure 2. Size configuration of output vector, Jacobian, and Hessian of hidden layer l.

where $\partial^{2} z^{(l)} / \partial x_{i} \partial z^{(l - 1)}$ can be obtained by applying the chain rule as:

(15)

\frac{\partial^{2} z^{(l)}}{\partial x_{i} \partial z^{(l - 1)}} = \frac{\partial^{2} z^{(l)}}{\partial^{2} z^{(l - 1)}} ⊙ \frac{\partial z^{(l - 1)}}{\partial x_{i}}

(15)

The expression $\partial z^{(l - 1)} / \partial x_{i}$ can be written as $i_{i}^{(l - 1)}$ , which is the i-th column of Jacobian $J^{(l - 1)}$ . Substituting EquationEq. (15)(15) $\frac{\partial^{2} z^{(l)}}{\partial x_{i} \partial z^{(l - 1)}} = \frac{\partial^{2} z^{(l)}}{\partial^{2} z^{(l - 1)}} ⊙ \frac{\partial z^{(l - 1)}}{\partial x_{i}}$ (15) to EquationEq. (14(14) $H_{i}^{(l)} = \frac{\partial^{2} z^{(l)}}{\partial x_{i} \partial z^{(l - 1)}} J^{(l - 1)} + \frac{\partial z^{(l)}}{\partial z^{(l - 1)}} H_{i}^{(l - 1)}$ (14) ), we obtain

(16)

H_{i}^{(l)} = \frac{\partial z^{(l)}}{\partial z^{(l - 1)}} H_{i}^{(l - 1)} + \frac{\partial^{2} z^{(l)}}{\partial {(z^{(l - 1)})}^{2}} ⊙ i_{i}^{(l - 1)} ⊙ J^{(l - 1)}

(16)

Applying forward propagation in EquationEq. (16(16) $H_{i}^{(l)} = \frac{\partial z^{(l)}}{\partial z^{(l - 1)}} H_{i}^{(l - 1)} + \frac{\partial^{2} z^{(l)}}{\partial {(z^{(l - 1)})}^{2}} ⊙ i_{i}^{(l - 1)} ⊙ J^{(l - 1)}$ (16) ), the slice of Hessian at the final layer, $L$ , can be obtained as:

H_{i}^{(N)} = \frac{\partial^{2} z^{(N)}}{\partial x_{i} \partial x} = 0

H_{i}^{1} = - 2 z^{(1)} ⊙ (1 - {(z^{(1)})}^{2}) ⊙ i_{i}^{(N)} ⊙ {(W^{(1)})}^{2} J^{(N)}

\dots

H_{i}^{(L - 1)} = (1 - {(z^{(L - 1)})}^{2}) ⊙ W^{(L - 1)} H_{i}^{(L - 2)} - 2 z^{(L - 1)} ⊙ (1 - {(z^{(L - 1)})}^{2}) ⊙ i_{i}^{(L - 2)} ⊙ {(W^{(L - 1)})}^{2} J^{(L - 2)}

H_{i}^{(L)} = W^{(L)} H_{i}^{(L - 1)}

(17)

H_{i}^{(D)} = \frac{1}{α_{y}} H_{i}^{(L)}

(17)

Finally, a Hessian matrix at the output layer, $L$ , is obtained by reshaping $n$ Hessian slices of $H_{i}^{(D)}$ , which are row vectors of size $1 \times n$ , into an $n \times n$ matrix, as expressed in EquationEq. (18)(18) $H^{(D)} = {[\begin{matrix} H_{1}^{(D)} \\ ⋮ \\ H_{n}^{(D)} \end{matrix}]}_{(n \times n)}$ (18) :

(18)

H^{(D)} = {[\begin{matrix} H_{1}^{(D)} \\ ⋮ \\ H_{n}^{(D)} \end{matrix}]}_{(n \times n)}

(18)

The Lagrange optimization problem presented in Section 3.1 can then be solved in a generalizable way using a neural network to generate AI-based objective functions, Jacobian, and Hessian matrices, as shown in EquationEqs. (10)(10) $\begin{aligned} f (x) = g^{(D)} \\ ({f_{l i n}}^{(L)} (W^{(L)} f_{t}^{(L - 1)} (\begin{matrix} W^{(L - 1)} \dots f_{t}^{(1)} \\ (W^{(1)} g^{(N)} (x) + b^{(1)}) \dots + b^{(L - 1)} \end{matrix}) + b^{(L)})) \end{aligned}$ (10) , (Equation13(13) $J^{(D)} = \frac{1}{α_{y}} J^{(L)}$ (13) ), and (Equation18(18) $H^{(D)} = {[\begin{matrix} H_{1}^{(D)} \\ ⋮ \\ H_{n}^{(D)} \end{matrix}]}_{(n \times n)}$ (18) ), respectively.

The concept of AI-based Lagrange Optimization method is explained following three steps as shown in .

Figure 3. AI-based LaGrange optimization flowchart.

- Step 1: Structural big datasets are generated from conventional design software, such as AutoCol. A proper number of big datasets needed for training should be selected carefully based on level of complexity of considered problem.

- Step 2: AI-based objective functions are achieved based on neural networks trained on big datasets obtained from Step 1. Accuracies of AI-based model are considerably affected by not only number of big datasets but also neural network parameters, such as number of hidden layers, neurons, and required epochs, etc. Therefore, a proper framework of neural network should be employed to get good training results.

- Step 3: Lagrange multiplier method is applied to optimize AI-based objective functions. The aim of AI-based objective functions is to approximate any well-behaved objective functions and generalize calculation procedure of Jacobian and Hessian matrices, which are needed to solve KKT conditions based on Newton–Raphson method.

4. Application of AI-based Lagrange method on optimizing RC columns

In this study, a conventional structural software (AutoCol) is employed to generate big datasets for neural network training. Column configuration ( $b \times h$ ), rebar ratio ( $ρ_{s}$ ), and material properties are conducted to evaluate structural performance of RC column, such as design axial force ( $ϕ P_{n}$ ), design bending moment ( $ϕ M_{n}$ ), and rebar strain ( $ε_{s}$ ), etc., against factored load pair ( $P_{u}$ - $M_{u}$ ). and describe seven inputs $(b, h, ρ_{s}, f_{c}^{'}, f_{y}, P_{u}, M_{u})$ needed to calculate nine corresponding output parameters $(ϕ P_{n}, ϕ M_{n}, S F, b / h, ε_{s}, C I_{c}, C O_{2}, W_{c}, α_{e / h})$ .

Table 1. Summary of RC column parameters.

Display Table

Figure 4. RC column.

4.1. Formulation of objective function and other parameters based on ANNs

The nine output parameters, including the objective function ( $C I_{c}$ ), are functions of seven variables that are complex and difficult to not only implement analytically but also find their Jacobian and Hessian matrices for solving the Lagrange function (EquationEq. (7)(7) $[\begin{matrix} x^{(k + 1)} \\ λ_{c}^{(k + 1)} \\ λ_{v}^{(k + 1)} \end{matrix}] = [\begin{matrix} x^{(k)} \\ λ_{c}^{(k)} \\ λ_{v}^{(k)} \end{matrix}] - {[{\b H}_{L} (x^{(k)}, λ_{c}^{(k)}, λ_{v}^{(k)})]}^{- 1} \nabla L (x^{(k)}, λ_{c}^{(k)}, λ_{v}^{(k)})$ (7) ). AI-based neural networks are developed to universally approximate all the output parameters, as expressed in EquationEq. (10(10) $\begin{aligned} f (x) = g^{(D)} \\ ({f_{l i n}}^{(L)} (W^{(L)} f_{t}^{(L - 1)} (\begin{matrix} W^{(L - 1)} \dots f_{t}^{(1)} \\ (W^{(1)} g^{(N)} (x) + b^{(1)}) \dots + b^{(L - 1)} \end{matrix}) + b^{(L)})) \end{aligned}$ (10) ), and hence, a unique process of finding their Jacobian and Hessian matrices can be applied. For example, the objective function of column cost index (CI_c) is obtained using EquationEq. (19)(19) $\underset{[1 \times 1]}{\underset{⏟}{C I_{c}}} = g_{C I_{c}}^{(D)} (f_{l}^{(L)} (\underset{[1 \times 80]}{\underset{⏟}{W_{C I_{c}}^{(L)}}} f_{t}^{(L - 1)} (\underset{[80 \times 80]}{\underset{⏟}{W_{C I_{c}}^{(L - 1)}}} \break \dots f_{t}^{(1)} (\underset{[80 \times 7]}{\underset{⏟}{W_{C I_{c}}^{(1)}}} \underset{[7 \times 1]}{\underset{⏟}{g_{C I_{c}}^{(N)} (x)}} + \underset{[80 \times 1]}{\underset{⏟}{b_{C I_{c}}^{(1)}}}) \dots + \underset{[80 \times 1]}{\underset{⏟}{b_{C I_{c}}^{(L - 1)}}}) + \underset{[1 \times 1]}{\underset{⏟}{b_{C I_{c}}^{(L)}}}))$ (19) based on the given input parameters $(b, h, ρ_{s}, f_{c}^{'}, f_{y}, P_{u}, M_{u})$ . The equation is forward-network-based weight-bias functions with $L$ layers and 80 neurons, which are linked using weighted interconnections and bias through an activation function, thereby performing nonlinear numerical computations. An activation function (tansig, tanh), as shown in , is used in EquationEq. (19)(19) $\underset{[1 \times 1]}{\underset{⏟}{C I_{c}}} = g_{C I_{c}}^{(D)} (f_{l}^{(L)} (\underset{[1 \times 80]}{\underset{⏟}{W_{C I_{c}}^{(L)}}} f_{t}^{(L - 1)} (\underset{[80 \times 80]}{\underset{⏟}{W_{C I_{c}}^{(L - 1)}}} \break \dots f_{t}^{(1)} (\underset{[80 \times 7]}{\underset{⏟}{W_{C I_{c}}^{(1)}}} \underset{[7 \times 1]}{\underset{⏟}{g_{C I_{c}}^{(N)} (x)}} + \underset{[80 \times 1]}{\underset{⏟}{b_{C I_{c}}^{(1)}}}) \dots + \underset{[80 \times 1]}{\underset{⏟}{b_{C I_{c}}^{(L - 1)}}}) + \underset{[1 \times 1]}{\underset{⏟}{b_{C I_{c}}^{(L)}}}))$ (19) .

The input, $x = {[b, h, ρ_{s}, f_{c}^{'}, f_{y}, P_{u}, M_{u}]}^{T}$ , for networks is related to neurons of fully connected successive layers using weights at each neuron and bias in each hidden layer. The layers are then summed up for the outputs, such as CI_c, and CO₂ emissions, and W_c ((Hong Citation2019)). The neural networks are formulated to be able to generalize trends (recognized as machine learning) between inputs and outputs to obtain objective functions rather than being based on analytical engineering mechanics or knowledge (Berrais Citation1999).

(19)

\underset{[1 \times 1]}{\underset{⏟}{C I_{c}}} = g_{C I_{c}}^{(D)} (f_{l}^{(L)} (\underset{[1 \times 80]}{\underset{⏟}{W_{C I_{c}}^{(L)}}} f_{t}^{(L - 1)} (\underset{[80 \times 80]}{\underset{⏟}{W_{C I_{c}}^{(L - 1)}}} \break \dots f_{t}^{(1)} (\underset{[80 \times 7]}{\underset{⏟}{W_{C I_{c}}^{(1)}}} \underset{[7 \times 1]}{\underset{⏟}{g_{C I_{c}}^{(N)} (x)}} + \underset{[80 \times 1]}{\underset{⏟}{b_{C I_{c}}^{(1)}}}) \dots + \underset{[80 \times 1]}{\underset{⏟}{b_{C I_{c}}^{(L - 1)}}}) + \underset{[1 \times 1]}{\underset{⏟}{b_{C I_{c}}^{(L)}}}))

(19)

Similarly, the rest of the outputs $(ϕ P_{n}, ϕ M_{n}, S F, b / h, ε_{s}, C O_{2}, W_{c}, α_{e / h})$ can also be formulated as functions of seven inputs based on forward ANNs, as shown in . present training results of forward networks, in which three types of hidden layer (1, 2, and 5) with 80 neurons are implemented. Structural datasets of 100,000 are randomly divided into three small subsets: training set, validation set, and test set. According to Brian Ripley (Ripley Citation1996), a training set (70% of big datasets) is a set of examples used for learning, that is to fit parameters, whereas a validation set is used to tune a neural network to avoid overfitting. A test subset, on the other hand, is independent of the training dataset, which does not affect training procedure. It is, therefore, used only to access the performance of a fully specified classifier. Hence, in , a MSE of test set (MSE T. Perf) is suitable for evaluating the goodness of designs, indicating capability of training model against unseen datasets.

Table 2. Training accuracies based on PTM.

Display Table

4.2. Optimization of RC column using AI-based Lagrange method

The neural network models of RC column presented in are used to minimize cost index (CI_c) of a RC column under several design requirements as shown in . All design requirements can be expressed in term of equality constraints $c (x) = {[c_{1} (x), \dots, c_{6} (x)]}^{\b T}$ as stated in . Besides, the rebar ratio, $ρ_{s}$ , should only be constrained following the ACI-318 code requirements ( $ρ_{s, m i n} \leq ρ_{s} \leq ρ_{s, m a x}$ ), which are expressed in terms of two inequality constraints: $v_{1} (x) = ρ_{s} - ρ_{s, m i n} \geq 0$ and $v_{2} (x) = - ρ_{s} + ρ_{s, m a x} \geq 0$ ().

Table 3. Column design scenarios for optimization.

Display Table

Table 4. Summary of equality and inequality constraints.

Display Table

In EquationEq. (19(19) $\underset{[1 \times 1]}{\underset{⏟}{C I_{c}}} = g_{C I_{c}}^{(D)} (f_{l}^{(L)} (\underset{[1 \times 80]}{\underset{⏟}{W_{C I_{c}}^{(L)}}} f_{t}^{(L - 1)} (\underset{[80 \times 80]}{\underset{⏟}{W_{C I_{c}}^{(L - 1)}}} \break \dots f_{t}^{(1)} (\underset{[80 \times 7]}{\underset{⏟}{W_{C I_{c}}^{(1)}}} \underset{[7 \times 1]}{\underset{⏟}{g_{C I_{c}}^{(N)} (x)}} + \underset{[80 \times 1]}{\underset{⏟}{b_{C I_{c}}^{(1)}}}) \dots + \underset{[80 \times 1]}{\underset{⏟}{b_{C I_{c}}^{(L - 1)}}}) + \underset{[1 \times 1]}{\underset{⏟}{b_{C I_{c}}^{(L)}}}))$ (19) ), the CI_c function for forward optimization is defined as $C I_{c} = f_{C I_{c}}^{F W} (x)$ , as a function of seven input parameters. According to EquationEqs. (1)(1) $L (x, λ_{c}, λ_{v}) = f (x) - λ_{c}^{T} c (x) - λ_{v}^{T} S v (x)$ (1) and (Equation3(3) $\nabla L (x, λ_{c}, λ_{v}) = [\begin{matrix} \begin{matrix} \nabla f (x) - {\b J}_{c} {(x)}^{T} λ_{c} - {\b J}_{v} {(x)}^{T} S λ_{v} \\ - c (x) \end{matrix} \\ - S v (x) \end{matrix}]$ (3) ), the Lagrange optimization function of cost index, $L_{C I_{c}}^{F W}$ , and its KKT conditions are then, expressed as functions of the input variables $x = {[b, h, ρ_{s}, f_{c^{'}}, f_{y}, P_{u}, M_{u}]}^{T}$ and Lagrange multiplier of equality and inequality constraints, $λ_{c} = {[λ_{c_{1}}, λ_{c_{2}}, \dots, λ_{c_{6}}]}^{T}$ and $λ_{v} = {[λ_{v_{1}}, λ_{v_{2}}]}^{T}$ , respectively, as shown in EquationEqs. (20(20) $L_{C I_{c}}^{F W} (x, λ_{\b c}, λ_{\b v}) = f_{C I_{c}}^{F W} (x) - λ_{c}^{T} c (x) - λ_{v}^{T} S v (x)$ (20) ) and (Equation21(21) $\nabla L_{C I_{c}}^{F W} (x, λ_{c}, λ_{v}) = [\begin{matrix} \begin{matrix} \nabla f_{C I_{c}}^{F W} (x) - {\b J}_{c} {(x)}^{T} λ_{c} - {\b J}_{v} {(x)}^{T} S λ_{v} \\ - c (x) \end{matrix} \\ - S v (x) \end{matrix}]$ (21) ).

CI_c Lagrangian function:

(20)

L_{C I_{c}}^{F W} (x, λ_{\b c}, λ_{\b v}) = f_{C I_{c}}^{F W} (x) - λ_{c}^{T} c (x) - λ_{v}^{T} S v (x)

(20)

KKT condition:

(21)

\nabla L_{C I_{c}}^{F W} (x, λ_{c}, λ_{v}) = [\begin{matrix} \begin{matrix} \nabla f_{C I_{c}}^{F W} (x) - {\b J}_{c} {(x)}^{T} λ_{c} - {\b J}_{v} {(x)}^{T} S λ_{v} \\ - c (x) \end{matrix} \\ - S v (x) \end{matrix}]

(21)

It is well-known that the Newton–Raphson method relies heavily on a good initial vector assumed as $x^{(0)} = {[b, h, ρ_{s}, f_{c^{'}}, f_{y}, P_{u}, M_{u}]}^{T}$ to expedite the run progress as well as enhance accuracy. A good initial vector is predetermined based on simple equality and active inequality; $c_{2} (x)$ , $c_{3} (x), c_{4} (x), c_{5} (x)$ and $v_{1} (x), v_{2} (x)$ ().

Inequality constraint ( ${\b v}_{1}$ ) is activated

Initial vector when $v_{1}$ is activated is obtained from

(22)

x^{(0)} = {[b, h, ρ_{s}, f_{c^{'}}, f_{y}, P_{u}, M_{u}]}^{T} = {[b, h, 0.01, 40, 500, 1000, 3000]}^{T}

(22)

where five input variables ( $ρ_{s}$ , $f_{c^{'}}$ , $f_{y}$ , $P_{u}$ , $M_{u}$ ) are predetermined based on simple equality constraints $c_{2} (x)$ , $c_{3} (x)$ , $c_{4} (x)$ , and $c_{5} (x)$ and active inequality $v_{1} (x)$ . The initial vector $x^{(0)} = {[b, h, 0.01, 40, 500, 1000, 3000]}^{T}$ is used to find the saddle point of the Lagrange optimization function, as expressed in EquationEq. (20(20) $L_{C I_{c}}^{F W} (x, λ_{\b c}, λ_{\b v}) = f_{C I_{c}}^{F W} (x) - λ_{c}^{T} c (x) - λ_{v}^{T} S v (x)$ (20) ), based on the Newton–Raphson method. Unknown input parameters (b, h) for the initial vector are random in large datasets which are to be determined during optimization.

The Newton–Raphson method is implemented to solve partially differentiated EquationEq. (21)(21) $\nabla L_{C I_{c}}^{F W} (x, λ_{c}, λ_{v}) = [\begin{matrix} \begin{matrix} \nabla f_{C I_{c}}^{F W} (x) - {\b J}_{c} {(x)}^{T} λ_{c} - {\b J}_{v} {(x)}^{T} S λ_{v} \\ - c (x) \end{matrix} \\ - S v (x) \end{matrix}]$ (21) using one initial Lagrange multiplier vector, $[λ_{c 1}, λ_{c 2}] = [0, 0]$ , and 5² initial vectors of $x^{(0)} = {[b, h, 0.01, 40, 500, \break 1000, 3000]}^{T}$ , in which b and h are randomly distributed within a training data range. The initial values of 0 for the Lagrange multipliers are used because they do not have boundaries; they can be any number while the Newton–Raphson model calculates the exact Lagrange multipliers. $[b, h] = [1070.1, 1070.1]$ is the best value among 25 trials based on the network with five hidden layers and 80 neurons, producing an optimal value of 202,275.4 when inequality ( $v_{1}$ ) is activated, as shown in ). The optimized results based on one and two hidden layers and 80 neurons are also shown in ) and (b) with optimal values of $C I_{c} =$ 185,069.2, and 201,942.3, respectively. Similarly, the optimized results of case 2 (Inequality constraint ( $v_{2}$ ) is activated) and case 3 (none is activated) are obtained ().

Table 5. Lowest cost index of columns (CI_c) for given constraints based on forward networks

Display Table

The optimized design results obtained using the Lagrange multipliers based on forward networks trained with one, two, and five layers and 80 neurons are listed in ), (b), and (c), respectively, and are compared with those obtained using a structural software (AutoCol) (), (b), and (c)). In ), the largest error of 11.79% of SF was demonstrated with one layer and 80 neurons. With two and five layers and 80 neurons () and (c)), reduced errors of 1.55% and 2.7% for the design moment ( $ϕ M_{n}$ ) and design axial force ( $ϕ P_{n}$ ) were obtained, respectively, by LMM based on forward networks.

Table 6. Accuracies of optimized CI_c based on forward Lagrange optimizations (a) Based on forward training with 1 layer – 80 neurons.

Display Table

4.3. Verification of CI_c by large datasets

The goodness of optimal designs presented in is evaluated by large datasets as shown in . The lowest CI_c for large datasets is 195,716, where five million datasets are filtered through $f_{c^{'}} = 40$ MPa, $f_{y} = 500$ MPa, $S F = 1$ , and $b / h = 1$ . The accuracy is demonstrated with CI_c of 184,684.6 (−5.64%; one layer, )), 195,507.9 (−0.11%; two layers, )), and 195,671.1 (−0.02%; five layers, )) based on AI-based network compared with the optimal CI_c (195,716) of large datasets. The CI_c obtained by the AI-based Lagrange method with two and five layers and that obtained in large datasets are in good agreement.

Figure 5. Verification of CI_c based on large datasets.

4.4. P–M diagram

The optimal CI_c-based P–M interaction diagrams for RC columns that satisfy various design criteria (Table 4.1(b) and Table 4.2) are plotted on the basis of forward networks, as shown in . The P–M diagrams pass through P_u and M_u, as indicated by a solid point. In , the P–M diagrams indicated by Legends 1, 2, and 3 were constructed with the parameters shown in the dashed black box in ), (b), and (c), respectively, which were obtained by AI-based Lagrange optimization. These were used to construct P–M diagrams since the accuracies of the parameters obtained by ANN are acceptable. The P–M diagrams were plotted using AutoCol with appropriate input parameters. The interaction diagrams shown in meet the minimum CI_c, as listed in . CI_c was optimized by Lagrange multiplier-based forward neural networks with one, two, and five layers and 80 neurons. The dotted curve cannot pass through P_u and M_u (solid point), indicating that the training accuracy of a model with one layer is not sufficient. All three optimized P–M diagrams (Legends 1, 2, and 3) shown in would converge, passing through one P–M diagram when the training accuracies are sufficient.

Figure 6. Axial force–moment (P–M) interaction diagrams for optimal CI_c.

5. Conclusions

This study presents hybrid optimization techniques, with which objective functions are proposed based on ANNs. Lagrange optimization techniques with constraints were implemented to achieve rational engineering decisions and find minimized or maximized design values by solving nonlinear optimization problems under strict constraints, conditions, and requirements imposed by design codes for the design of structural frames such as columns and beams. This study helps engineers make the final design decisions, not on the engineers’ empirical observations but on more rational designs, to meet various design requirements, including code restrictions and/or architectural criteria while the objective parameters, such as cost index (CI_c), CO₂ emissions, and structural weight (W_c), are optimized. The conclusions drawn from the study are as follows:

(1) Constructing objective functions and their derivatives, which is a challenge in the Lagrange multiplier method, can now be generalized to any structural design optimization by using ANN to generate universal AI-based objective functions, Jacobian, and Hessian matrices.

(2) Automatic designs of structural frames are proposed for realistic engineering applications to identify design solutions that optimize all design requirements simultaneously, thereby achieving design decisions not based on engineers’ intuition. The results of the sensitivity analysis of LMM show that the best optimal values based on constraints can be identified for specific situations such as official design codes.

(3) Analytical objective functions were difficult to obtain. This study developed the objective functions for CI_c, CO₂ emissions, and W_c based on AI-based networks to be implemented for LMM to find the optimized solutions.

(4) The Karush–Kuhn–Tucker conditions were considered to account for inequality constraints, leading to automatic designs of structural frames that meet various code restrictions simultaneously.

(5) Optimization process for CI_c was performed, and negligible errors were obtained, which were verified using large structural datasets. Engineering calculations also validated the design accuracies when the optimized CI_c was implemented in the designs.

(6) P–M diagrams were are uniquely designed to optimize columns. The proposed optimization will offer generic designs for many types of structures, including machinery and structural frames.

(7) The AI-based objective functions developed in this study can be implemented in broad areas, including engineering, general science, and economics.

A generalizable optimization method proposed herein can be applied to any optimization problem once a sufficient number of data can be collected for establishing approximated objective functions and other parameters to formulate ANN. New objective functions not only enhance the computational speed compared to conventional software but also produce a generalizable calculation method for Jacobian and Hessian matrices for Lagrange optimization. In future work, comprehensive design optimization of a dynamic design of tall buildings would be performed based on AI-based Lagrange optimization. One concerning problem is that computational time of Lagrange optimization is heavily dependent on quantity of inequality constraints because a number of running case (active inequality) corresponds to combinations of inequality constraints. Likewise, a building design is composed of many design requirements considered as inequality constraints, such as lateral displacement, story drift, design strength of each component, and required total base shears, etc.

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT 2019R1A2C2004965).

Disclosure statement

No potential conflict of interest was reported by the author.

Additional information

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government [MSIT 2019R1A2C2004965].

Notes on contributors

Won-Kee Hong

Dr. Won-Kee Hong is a Professor of Architectural Engineering at Kyung Hee University. Dr. Hong received his Masters and Ph.D. degrees from UCLA, and he worked for Englelkirk and Hart, Inc. (USA), Nihhon Sekkei (Japan) and Samsung Engineering and Construction Company (Korea) before joining Kyung Hee University (Korea). He also has professional engineering licenses from both Korea and the USA. Dr. Hong has more than 30 years of professional experience in structural engineering. His research interests include new approaches to construction technologies based on value engineering with hybrid composite structures. He has provided many useful solutions to issues in current structural design and construction technologies as a result of his research combining structural engineering with construction technologies. He is the author of numerous papers and patents, both in Korea and the USA. Currently, Dr. Hong is developing new connections that can be used with various types of frames, including hybrid steel–concrete precast composite frames, precast frames and steel frames. These connections would contribute to the modular construction of heavy plant structures and buildings as well. He recently published a book titled as ”Hybrid Composite Precast Systems: Numerical Investigation to Construction„ (Elsevier).

Manh Cuong Nguyen

Manh Cuong Nguyen is currently enrolled as a Combined Master & Ph.D. candidate in the Department of Architectural Engineering at Kyung Hee University, Republic of Korea. His research interest includes precast structures. is currently enrolled as a Combined Master & Ph.D. candidate in the Department of Architectural Engineering at Kyung Hee University, Republic of Korea. His research interest includes precast structures.

References

Abedini, M., and C. Zhang. 2021. “Dynamic Vulnerability Assessment and Damage Prediction of RC Columns Subjected to Severe Impulsive Loading.” Structural Engineering and Mechanics 77 (4): 441–461. doi:10.12989/sem.2021.77.4.441.
Web of Science ®Google Scholar
ACI-318. “Building Code Requirements for Reinforced.” 2014.
Google Scholar
Aghaee, K., M. A. Yazdi, and K. D. Tsavdaridis. 2014. “Mechanical Properties of Structural Lightweight Concrete Reinforced with Waste Steel Wires.” Magazine of Concrete Research 66 (1): 1–9. doi:10.1680/macr.14.00232.
Google Scholar
Awal, A. A., I. A. Shehu, and M. Ismail. 2015. “Effect of Cooling Regime on the Residual Performance of High-volume Palm Oil Fuel Ash Concrete Exposed to High Temperatures.” Construction and Building Materials 98: 875–883. doi:10.1016/j.conbuildmat.2015.09.001.
Web of Science ®Google Scholar
Barros, M. H. F. M., R. A. F. Martins, and A. F. M. Barros. 2005. “Cost Optimization of Singly and Doubly Reinforced Concrete Beams with EC2-2001.” Structural and Multidisciplinary Optimization 30 (3): 236–242. doi:10.1007/s00158-005-0516-2.
Web of Science ®Google Scholar
Beavis, B., and I. Dobbs. 1990. Optimisation and stability theory for economic analysis .Cambridge University press.
Google Scholar
Bengio, Y., I. Goodfellow, and A. Courville. 2017. Deep Learning. Vol. 1. Massachusetts, USA: MIT press.
Google Scholar
Berrais, A. 1999. “Artificial Neural Networks in Structural Engineering: Concept and Applications.” Engineering Sciences 12: 1.
Google Scholar
Camp, C. V., S. Pezeshk, and H. Hansson. 2003. “Flexural Design of Reinforced Concrete Frames Using a Genetic Algorithm.” Journal of Structural Engineering 129 (1): 105–115. doi:10.1061/(ASCE)0733-9445(2003)129:1(105).
Web of Science ®Google Scholar
Fanaie, N., S. Aghajani, and S. Shamloo. 2012. “Theoretical Assessment of Wire Rope Bracing System with Soft Central Cylinder”. Proceedings of the 15th World Conference on Earthquake Engineering. Advances in Structural Engineering
Google Scholar
Fanaie, N., S. Aghajani, and E. A. Dizaj. 2016. “Theoretical Assessment of the Behavior of Cable Bracing System with Central Steel Cylinder.” Advances in Structural Engineering 19 (3): 463–472. doi:10.1177/1369433216630052.
Web of Science ®Google Scholar
Heydari, A., and M. Shariati. 2018. “Buckling Analysis of Tapered BDFGM Nano-beam under Variable Axial Compression Resting on Elastic Medium.” Structural Engineering & Mechanics 66 (6): 737–748. doi:10.12989/sem.2018.66.6.737.
Web of Science ®Google Scholar
Hoffmann, L. D., and G. L. Bradley. 2004. Calculus for Business, Economics, and the Social and Life Sciences.8th. 575–588. ISBN 0-07-242432–X. New York: McGraw Hill Education
Google Scholar
Hong, W. K. 2019. Hybrid Composite Precast Systems: Numerical Investigation to Construction, 427–478. Kidlington, OX5 1GB, United Kingdom: Woodhead Publishing, Elsevier
Google Scholar
Kalman, D. 2009. “Leveling with Lagrange: An Alternate View of Constrained Optimization.” Mathematics Magazine 82 (3): 186–196. doi:10.1080/0025570X.2009.11953617.
Google Scholar
Kaveh, A., and F. Shokohi. 2015. “Optimum Design of Laterally Supported Castellated Beams Using CBO Algorithm.” Steel & Composite Structures 18 (2): 305–324. doi:10.12989/scs.2015.18.2.305.
Web of Science ®Google Scholar
Korouzhdeh, T., H. Eskandari-Naddaf, and M. Gharouni-Nik. 2017. “An Improved Ant Colony Model for Cost Optimization of Composite Beams.” Applied Artificial Intelligence 31 (1): 44–63. doi:10.1080/08839514.2017.1296681.
Web of Science ®Google Scholar
Madadi, A., H. Eskandari-Naddaf, R. Shadnia, and L. Zhang. 2018. “Characterization of Ferrocement Slab Panels Containing Lightweight Expanded Clay Aggregate Using Digital Image Correlation Technique.” Construction and Building Materials 180: 464–476. doi:10.1016/j.conbuildmat.2018.06.024.
Web of Science ®Google Scholar
MATLAB. 2020b. Version 9.9.0 (R2020b). Natick, Massachusetts: MathWorks .
Google Scholar
Nasrollahi, S., S. Maleki, M. Shariati, A. Marto, and M. Khorami. 2018. “Investigation of Pipe Shear Connectors Using Push Out Test.” Steel & Composite Structures 27 (5): 537–543. doi:10.12989/scs.2018.27.5.537.
Web of Science ®Google Scholar
Paknahad, M., M. Bazzaz, and M. Khorami. 2018. “Shear Capacity Equation for Channel Shear Connectors in Steel-concrete Composite Beams.” Steel & Composite Structures 28 (4): 483–494. doi:10.12989/scs.2018.28.4.483.
Web of Science ®Google Scholar
Protter, M. H., and C. B. Morrey Jr. 1985. Intermediate Calculus. 2nd. 267. New York: Springer.ISBN 0-387-96058-9.
Google Scholar
Rajeev, S., and C. S. Krishnamoorthy. 1998. “Genetic Algorithm–based Methodology for Design Optimization of Reinforced Concrete Frames.” Computer-Aided Civil and Infrastructure Engineering 13 (1): 63–74. doi:10.1111/0885-9507.00086.
Google Scholar
Ripley, B. D. “Pattern Classification and Neural Networks.” 1996.
Google Scholar
Safa, M., M. Shariati, Z. Ibrahim, A. Toghroli, S. B. Baharom, N. M. Nor, and D. Petkovic. 2016. “Potential of Adaptive Neuro Fuzzy Inference System for Evaluating the Factors Affecting Steel-concrete Composite Beam’s Shear Strength.” Steel & Composite Structures 21 (3): 679–688. doi:10.12989/scs.2016.21.3.679.
Web of Science ®Google Scholar
Shah, S. N. R., N. R. Sulong, R. Khan, M. Z. Jumaat, and M. Shariati. 2016. “Behavior of Industrial Steel Rack Connections.” Mechanical Systems and Signal Processing 70-71: 25–740. doi:10.12989/scs.2016.21.3.679.
Web of Science ®Google Scholar
Shariat, M., M. Shariati, A. Madadi, and K. Wakil. 2018. “Computational Lagrangian Multiplier Method by Using for Optimization and Sensitivity Analysis of Rectangular Reinforced Concrete Beams.” Steel & Composite Structures 29 (2): 243–256. doi:10.12989/scs.2018.29.2.243.
Web of Science ®Google Scholar
Shariati, M., S. N. H. Ramli, S. Maleki, and K. M. M. Arabnejad “Experimental and Analytical Study on Channel Shear Connectors in Light Weight Aggregate Concrete.” Proceedings of the 4th International Conference on Steel & Composite Structures. Sydney, Autralia. 2010; 21–23. 1 0.3 850/978-981-08-6218-3_CC-Fr031.
Google Scholar
Silberberg, E., and S. Wing. 2001. The Structure of Economics: A Mathematical Analysis. Third. 134–141. Boston: Irwin McGraw-Hill.ISBN 0-07-234352-4
Google Scholar
Sun, L., Z. Yang, Q. Jin, and W. Yan. 2020. “Effect of Axial Compression Ratio on Seismic Behavior of GFRP Reinforced Concrete Columns.” International Journal of Structural Stability and Dynamics 20 (6): 2040004. doi:10.1142/S0219455420400040.
Web of Science ®Google Scholar
Toghroli, A., M. Mohammadhassani, M. Suhatril, M. Shariati, and Z. Ibrahim. 2014. “Prediction of Shear Capacity of Channel Shear Connectors Using the ANFIS Model.” Steel & Composite Structures 17 (5): 623–639. doi:10.12989/scs.2014.17.5.623.
Google Scholar
Villarrubia, G., J. F. De Paz, P. Chamoso, and F. De la Prieta. 2018. “Artificial Neural Networks Used in Optimization Problems.” Neurocomputing 272: 10–16. doi:10.1016/j.neucom.2017.04.075.
Web of Science ®Google Scholar
Walsh, G. R. 1975. Saddle-point Property of Lagrangian Function. Methods of Optimization. 39–44. New York:John Wiley & Sons. ISBN 0-471-91922-5.
Google Scholar
Yazdani, M., A. Fakhimi, and M. Alitalesh (2018). “Numerical Analysis of Effective Parameters in Direct Shear Test by Hybrid Discrete–finite Element Method.”
Google Scholar
Ye, M., J. Jiang, H. M. Chen, H. Y. Zhou, and D. D. Song. 2021. “Seismic Behavior of an Innovative Hybrid Beam-column Connection for Precast Concrete Structures.” Engineering Structures 227: 111436. doi:10.1016/j.engstruct.2020.111436.
Web of Science ®Google Scholar
Zhang, C., G. Gholipour, and A. A. Mousavi. 2021. “State-of-the-Art Review on Responses of RC Structures Subjected to Lateral Impact Loads.” Arch Computat Methods Eng 28: 2477–2507. doi:10.1007/s11831-020-09467-5.
Web of Science ®Google Scholar
Zhang, C., Z. Alam, L. Sun, Z. Su, and B. Samali. 2019. “Fibre Bragg Grating Sensor‐based Damage Response Monitoring of an Asymmetric Reinforced Concrete Shear Wall Structure Subjected to Progressive Seismic Loads.” Structural Control & Health Monitoring 26 (3): e2307. doi:10.1002/stc.2307.
Web of Science ®Google Scholar

AI-based Lagrange optimization for designing reinforced concrete columns

ABSTRACT

Graphical Abstract

1. Introduction

2. Significance of the study

3. Lagrange procedures based on ANNs

3.1. Optimization using LMM and Newton–Raphson method

3.2. Generalization of objective functions and their derivatives using neural network

3.2.1. Formulation of universal approximation function using neural network

3.2.2. Neural network-based universal approximation function

3.2.3. Formulation of Jacobian matrix for universal approximation function

3.2.4. Formulation of Hessian matrix for universal approximation function

4. Application of AI-based Lagrange method on optimizing RC columns

Table 1. Summary of RC column parameters.

4.1. Formulation of objective function and other parameters based on ANNs

Table 2. Training accuracies based on PTM.

4.2. Optimization of RC column using AI-based Lagrange method

Table 3. Column design scenarios for optimization.

Table 4. Summary of equality and inequality constraints.

Table 5. Lowest cost index of columns (CI_c) for given constraints based on forward networks

Table 6. Accuracies of optimized CI_c based on forward Lagrange optimizations (a) Based on forward training with 1 layer – 80 neurons.

4.3. Verification of CI_c by large datasets

4.4. P–M diagram

5. Conclusions

Acknowledgments

Disclosure statement

Notes on contributors

Won-Kee Hong

Manh Cuong Nguyen

References

Information for

Open access

Opportunities

Help and information

AI-based Lagrange optimization for designing reinforced concrete columns

ABSTRACT

Graphical Abstract

1. Introduction

2. Significance of the study

3. Lagrange procedures based on ANNs

3.1. Optimization using LMM and Newton–Raphson method

3.2. Generalization of objective functions and their derivatives using neural network

3.2.1. Formulation of universal approximation function using neural network

3.2.2. Neural network-based universal approximation function

3.2.3. Formulation of Jacobian matrix for universal approximation function

3.2.4. Formulation of Hessian matrix for universal approximation function

4. Application of AI-based Lagrange method on optimizing RC columns

Table 1. Summary of RC column parameters.

4.1. Formulation of objective function and other parameters based on ANNs

Table 2. Training accuracies based on PTM.

4.2. Optimization of RC column using AI-based Lagrange method

Table 3. Column design scenarios for optimization.

Table 4. Summary of equality and inequality constraints.

Table 5. Lowest cost index of columns (CIc) for given constraints based on forward networks

Table 6. Accuracies of optimized CIc based on forward Lagrange optimizations (a) Based on forward training with 1 layer – 80 neurons.

4.3. Verification of CIc by large datasets

4.4. P–M diagram

5. Conclusions

Acknowledgments

Disclosure statement

Additional information

Funding

Notes on contributors

Won-Kee Hong

Manh Cuong Nguyen

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 5. Lowest cost index of columns (CI_c) for given constraints based on forward networks

Table 6. Accuracies of optimized CI_c based on forward Lagrange optimizations (a) Based on forward training with 1 layer – 80 neurons.

4.3. Verification of CI_c by large datasets