219
Views
5
CrossRef citations to date
0
Altmetric
Original Articles

MODELING OF THIN FILM PROCESS DATA USING A GENETIC ALGORITHM-OPTIMIZED INITIAL WEIGHT OF BACKPROPAGATION NEURAL NETWORK

, &
Pages 168-178 | Published online: 04 Feb 2009

Abstract

Artificial neural network, particularly the backpropagation neural network (BPNN), has been used to construct a prediction model of plasma processes. In building a BPNN model, many training factors are typically involved and the most difficult factor is the initial weight distribution (IWD). In this study, a technique to optimize the IWD effect on BPNN prediction performance is presented. This was accomplished by using genetic algorithm (GA). The experimental data were collected from the etching of silica thin films in a CHF3-CF4 inductively coupled plasma. The etch process was statistically characterized and the etch responses to model include silica etch rate, Al etch rate, Al selectivity, and silica profile angle denoted as anisotropy. The effect of GA parameters (mutation and crossover probabilities) was also evaluated by conducting a 42 full factorial experiment. The performances of GA-BPNN models were compared to those for conventional models. The comparison revealed an improved prediction of GA-BPNN models for all etch responses. The improvement was even more than 15% for all but the Al etch rate data. The proven improvements support the finding that the presented technique is effective in optimizing an IWD effect on BPNN modeling.

In manufacturing integrated circuits, plasma plays a key role in depositing or etching fine patterns. Due to the inherent complex reactions between the process (or equipment) parameters and plasma, it is extremely difficult to predict plasma characteristics. To circumvent this difficulty, neuro-fuzzy systems in conjunction with statistical experimental design have been applied in constructing a prediction model of plasma-applied processes data (Himmel and May 1993; Geisler, Lee, and May Citation2000; Rietman and Lory Citation1993; Venkateswaran, Rai, Govindan, and Meyyappan Citation2002; Kim, Bae, and Han Citation2006; Kim, Park, and Lee Citation2006). Among many neural network paradigms, the feed-forward, multi-layered perceptrons typically called the backpropagation neural network (BPNN) (Rummelhart and McClelland 1986) are the most widely used in modeling complex chemical and physical data. Despite a high prediction accuracy, building a BPNN model is complicated by the presence of many training factors (Kim and Park Citation2001). These include the number of hidden neurons, the type of activation functions, gradients of the functions, and the initial weight distribution (IWD). Several attempts were made to optimize training factor effects on the BPNN prediction performance. This includes a training factor effect, model-based optimization (Kim and Park Citation2001) and a direct optimization of factor effects (Kim and Bae Citation2005) by using a genetic algorithm (GA) (Goldberg Citation1989). Among the training factors, the most difficult one to optimize is the IWD (Sarkar Citation1996; Bartlett Citation1998). In general, the IWD is randomly generated within a numerical interval such as [ − 1, + 1] before the start of neural network training. Depending on the generated IWD, the prediction performance of BPNN may vary considerably. Unlike other training factors, finding an optimized IWD is hardly possible due to the randomness in IWD as stated earlier. Hence, the optimization of IWD is frequently conducted by adjusting IWD experimentally (Kim et al. Citation2006; Kim and Park Citation2001). To overcome this difficulty, classification- (Teo, Wang, and Lin 2001) or rule (Kim Citation2005)-based optimization techniques were proposed and evaluated with time-series data. Owing to the nature of evolutional optimization, meanwhile, the GA is expected to be a viable means in optimizing training factor effects as demonstrated once in optimizing multi-training factor effects (Kim and Park Citation2001; Kim and Bae Citation2005) as well as in the optimization of a plasma-enhanced, chemical vapor deposition process (Han and May Citation1997). In the context of plasma-driven thin film process data, however, the GA was not applied for the optimization of IWD. Moreover, the effect of two probability parameters (mutation and crossover probabilities) on the GA optimization was not examined.

In this study, a new technique was used to optimize an IWD effect on the BPNN model of plasma process data·GA was used to search for an optimized IWD. Moreover, the effects of GA parameters stated earlier were examined by means of a statistical experiment. The presented technique was evaluated with the plasma etch data. The etching of silica thin films was conducted in CHF3-CF4 inductively coupled plasma. The performances of GA-BPNN models were compared to those for the conventional models.

EXPERIMENTAL DETAILS

A schematic of an experimental ICP etch system is shown in Figure . The process chamber serving as a ground plane is pumped by a turbomolecular pump while the pressure was controlled by a downstream throttle valve. Gases are introduced via a multi-hole showerhead, and a radio frequency (RF) bias power operating at 13.56 MHz is fed to the lower electrode with a diameter of 203.2 mm via a matching network. The upper chamber section was modified by a ceramic cylinder, through which RF source power is coupled via a multi-turn helical coil operating at 2 MHz. The cylinder is closed at the upper end by a grounded electrode.

FIGURE 1 Schematic of inductively coupled plasma equipment.

FIGURE 1 Schematic of inductively coupled plasma equipment.

Test patterns were fabricated on p-type silicon wafers of a 5-inch diameter. A buffer-clad layer of about 25-µm thickness was deposited by the flame hydrolysis deposition method. The core layer of 8-µm thickness was subsequently formed, which was formed as SiO2-P2O5-B2O3-GeO2. Here the percentages of P, B, and Ge were all less than 1%. After the evaporation of AlSi (1%) of about a 400-nm thickness, the waveguide was patterned using a contact aligner. The AlSi (1%) layer was first etched in a BCl3/Cl2/CHF3 plasma using the plasma therm RIE system, where the RF power and pressure were set to 150 W and 1.33 Pa, respectively. To remove the layer completely, the layer was subsequently 30% overetched and the resist was then solvent-stripped. Following this, the silica core layer was etched in a CF4/CHF3 plasma using a plasma therm 690 ICP etch system. The factors that were varied include source power, bias power, and gas ratio. The total flow rate of gases, CHF3 and CF4, was set to 60 sccm and the flow rate of CHF3 was varied from 10 sccm to 50 sccm. The experimental ranges for the source power, bias power, and gas ratio were 100–800 W, 100–400 W, and 0.2–5.0, respectively. The gas flow rate ratio was defined as the flow rate of CHF3 divided by the flow rate of CF4. To characterize and optimize the etch process, a 23 full factorial experiment (Montgomery Citation1991) was used along with one center point. The resulting nine experiments were used to train the BPNN. Six additional experiments were conducted to prepare the test data for model evaluation. A total of 15 experiments were therefore conducted.

The responses modeled include a silica etch rate, an Al etch rate, an Al selectivity, and a silica profile angle. Using a Hitachi S800 scanning electron microscope (Hitachi Inc, Japan), the vertical etch rate of silica, R, was measured. Another vertical etch rate of an Al protective layer was also measured. The Al selectivity was defined as the ratio of silica etch rate to Al etch rate. The anisotropy (A) that quantifies profile angles was calculated as

where U and L represent the widths of the original and etched patterns, respectively. The width of U was 20 µm.

BACKPROPAGATION NEURAL NETWORK

A typical architecture of BPNN is exhibited in Figure . The architecture of BPNN adopted in this study was comprised of three input neurons, four hidden neurons, and one output neuron. The number of input neurons is consistent with that of the process parameters. The number of output neurons is equal to that of the etch response to model. The Euclidean distance in the weight space the network attempts to minimize is the accumulated error (E) of all the input-output pairs. The E is expressed as

FIGURE 2 Schematic of a backpropagation neural network.

FIGURE 2 Schematic of a backpropagation neural network.
where q is the number of output neurons, d j is the desired output of the jth neuron in the output layer, and ou t j is the calculated output of that same neuron. In the BP algorithm, this error is to be minimized via gradient descent optimization, in which the weights are adjusted in the direction of decreasing E in Equation (Equation2). A basic weight update scheme, commonly known as the generalized delta rule (Rummelhart and McClelland 1986), is expressed as
where W i, j, k is the connection strength between the jth neuron in the layer (k − 1) and the ith neuron in the layer k, and ΔW i, j, k is the calculated change in the weight to minimize E in Equation (Equation2) and is defined as

The parameters, m and η, indicate the iteration number and learning rate, respectively. By adjusting weighted connections recursively using the rule in Equation (Equation3) for all the units in the network, the accumulated E over all the input vectors is minimized.

RESULT

Conventional Models

First, for comparison purposes, BPNN models were constructed in a conventional way. For this, the IWD was varied from ±0.4 to ±1.4 with an increment of 0.2. The other training factors were set to their default values. More specifically, the number of hidden neurons, training tolerance, and gradients of neuron activation functions were set to 4, 0.1, and 1, respectively. Each of training factors were explained in detail in our previous work (Kim and Park Citation2001). It should be noted that for a given IWD, 100 models were repeatedly constructed and among them only the best model was selected. The best model here corresponds to the model of the smallest prediction error measured with the test data mentioned earlier. The prediction error was quantified in the root mean-square error (RMSE), defined as

where P is the size of testing data set, Tp is the measured etch response for the pth input, and Op is the corresponding prediction. As an example, the silica etch rate was modeled and the resulting RMSEs of the BPNN models are plotted in Figure as a function of IWD. As seen in Figure , one smallest RMSE of about 267 Å/min is obtained at ±1.0. In this way, other etch responses were modeled and their best performances are shown in Table . In Table , the smallest RMSEs and optimized IWDs are included.

FIGURE 3 Prediction performance of a silica etch rate model as a function of initial weight distribution.

FIGURE 3 Prediction performance of a silica etch rate model as a function of initial weight distribution.

TABLE 1 Prediction Errors at Optimized Initial Weight Distribution

A GA-BPNN Models

A GA is widely used to search for optimized parameters satisfying given constraints. In GA optimization, the parameters involved are the size of the initial population, the crossover probability, and the mutation probability. In this study, the size of the initial population was set to 100. Each solution consisted of many slots whose number is equal to that of the initial weights optimized. From the given BPNN architecture earlier, the number of initial weights to optimize was 16. A random generator was used to assign random values to each slot. In GA optimization, the suitability of each solution was calculated by the fitness function expressed as

The RMSE in Equation (Equation6) represents the training error calculated with the training data composed of six experiments as previously mentioned. As the termination criterion, the generation number was set to 100. For convenience, the BPNN model optimized by using GA is referred to as GA-BPNN. The efficacy of GA optimization may differ with the combination of crossover and mutation probabilities, denoted as Pc and Pm, respectively. To examine their effects on GA optimization, a 42 full factorial experiment (Montgomery Citation1991) was conducted for the experimental ranges of 0.80–0.95 and 0.05–0.20 for Pc and Pm, respectively. The increment for each probability was the same 0.05. As a result, in modeling each etch response, a total of 16 GA optimizations were conducted.

First, the silica etch rate was modeled and the results are shown in Figure . Each point in Figure represents the prediction error corresponding to the smallest training error. As shown in Figure , the variation of the GA-BPNN model is quite complex depending on the combination. At 0.95 Pc and 0.15 Pm, one smaller prediction error is obtained and it is numerically about 202 Å/min. In the same way, other etch responses were modeled and the results are shown in Figures for the Al etch rate, anisotropy, and Al selectivity models, respectively. The smallest prediction errors determined in each figure are shown in Table along with the specific combinations. Meanwhile, the errors shown in Table were compared to those in Table . This comparison reveals the improvement of GA-BPNN models over conventional BPNN models. The calculated improvements are shown in Figure . As shown in Figure , the GA-BPNN models demonstrated an improved prediction for all etch responses. The improvement is even more than 15% for all but the Al etch rate data. These improvements reveal that the proposed technique is effective to the optimization of IWD.

FIGURE 4 Prediction performance of a silica etch rate model as a function of combinations of Pc and Pm.

FIGURE 4 Prediction performance of a silica etch rate model as a function of combinations of Pc and Pm.

FIGURE 5 Prediction performance of an Al etch rate model as a function of combinations of Pc and Pm.

FIGURE 5 Prediction performance of an Al etch rate model as a function of combinations of Pc and Pm.

FIGURE 6 Prediction performance of an anisotropy model as a function of combinations of Pc and Pm.

FIGURE 6 Prediction performance of an anisotropy model as a function of combinations of Pc and Pm.

FIGURE 7 Prediction performance of an Al selectivity model as a function of combinations of Pc and Pm.

FIGURE 7 Prediction performance of an Al selectivity model as a function of combinations of Pc and Pm.

TABLE 2 Prediction Errors at Optimized Pc and Pm

FIGURE 8 Comparison of GA-BPNN and BPNN models.

FIGURE 8 Comparison of GA-BPNN and BPNN models.

CONCLUSIONS

A new prediction model of plasma processes was presented. This was achieved by applying a GA to search for an optimized set of IWD. Moreover, the effects of GA parameters were optimized by means of a statistical experiment. The technique was evaluated with the plasma etching data statistically characterized. Compared to conventional BPNN models, GA-BPNN models demonstrated an improved prediction for all etch responses. This indicates that a complex IWD effect can be effectively optimized by the presented technique. It is expected that more improvement might be achieved by applying this technique just after the experimental or systematic optimization of the other training factors.

This work supported by Sejong University.

REFERENCES

  • Bartlett , P. L . 1998 . The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network . IEEE Trans Information Theory 44 ( 2 ): 525 – 536 .
  • Geisler , P. C. S. G. Lee , and G. S. May . 2000 . Neurofuzzy modeling of chemical vapor deposition process . IEEE Trans. Semicond. Manufact. 13 ( 1 ): 46 – 60 .
  • Goldberg , D. E . 1989 . Genetic Algorithms in Search, Optimization & Machine Learning . Boston , MA , USA : Addison-Wesley .
  • Han , S. S . and G. S. May . 1997 . Using neural network process models to perform PECVD silicon dioxide recipe synthesis via genetic algorithms . IEEE Trans. Semicond. Manufact. 10 ( 2 ): 279 – 287 .
  • Himmel , C. D. and G. S. May . 1993 . Advantages of plasma etch modeling using neural networks over statistical techniques . IEEE Trans. Semicond. Manufact. 6 ( 2 ): 103 – 111 .
  • Kim , D . 2005 . Improving prediction performance of neural networks in pattern classification . Internat. J. Com. Math 82 ( 4 ): 391 – 399 .
  • Kim , B. and J. Bae . 2005 . Prediction of plasma processes using neural network and genetic algorithm . Solid-State Electronics. 49 ( 10 ): 1576 – 1580 .
  • Kim , B. and S. Park . 2001 . An optimal neural network plasma model: A case study . Chemometr. Intell. Lab. Syst. 56 ( 1 ): 39 – 50 .
  • Kim , B. , J. Bae , and S. S. Han . 2006. Prediction of radio frequency power effect on silicon nitride deposition using a genetic algorithm based neural network. Surf. Eng. February:63–68.
  • Kim , B. , J. Park , and K. Lee . 2006 . Temperature effect on deposition rate of silicon nitride film . Appl. Surf. Sci. 252 ( 12 ): 4138 – 4145 .
  • Montgomery , D. C. 1991 . Design and Analysis of Experiments . New York : John Wiley & Sons .
  • Rietman , E. and E. Lory . 1993 . Use of neural networks in semiconductor manufacturing processes: An example for plasma etch modeling . IEEE Trans. Semicond. Manufact. 16 ( 4 ): 343 – 347 .
  • Rummelhart , D. E. and J. L. McClelland . 1986 . Parallel Distributed Processing . Cambridge : MIT. Press .
  • Sarkar , D. 1996 . Randomness in generalization ability: A source to improve it . IEEE Trans. Neural Networks 7 ( 3 ): 676 – 685 .
  • Teo , K. K , L. P Wang , and Z. P. Lin . 2001 . Wavelet packet multi-layer perceptron for chaotic time series prediction: Effects of weight initialization . Lecture Notes in Computer Science. 2074 : 310 – 317 .
  • Venkateswaran , S. , M. M. Rai , T. R Govindan , and M. Meyyappan . 2002 . Neural network modeling of growth process . J. Electrochem. Soc. 149 ( 2 ): G137 – G142 .

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.