Full article: Dimensionality reduction, and function approximation of poly(lactic-co-glycolic acid) micro- and nanoparticle dissolution rate

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Prediction of poly(lactic-co-glycolic acid) (PLGA) micro- and nanoparticles’ dissolution rates plays a significant role in pharmaceutical and medical industries. The prediction of PLGA dissolution rate is crucial for drug manufacturing. Therefore, a model that predicts the PLGA dissolution rate could be beneficial. PLGA dissolution is influenced by numerous factors (features), and counting the known features leads to a dataset with 300 features. This large number of features and high redundancy within the dataset makes the prediction task very difficult and inaccurate. In this study, dimensionality reduction techniques were applied in order to simplify the task and eliminate irrelevant and redundant features. A heterogeneous pool of several regression algorithms were independently tested and evaluated. In addition, several ensemble methods were tested in order to improve the accuracy of prediction. The empirical results revealed that the proposed evolutionary weighted ensemble method offered the lowest margin of error and significantly outperformed the individual algorithms and the other ensemble techniques.

Keywords:

Introduction

Predicting the poly(lactic-co-glycolic acid) (PLGA) micro- and nanoparticle dissolution profiles presents a complex and vital problem. The complexity of the problem can be understood from the fact that academic literature^Citation1^–^Citation18 provides 300 potential factors that may influence the dissolution of the PLGA protein particles.^Citation19 After analyzing the collected dataset, the primary approach adopted in most research has been to reduce the dimensionality of the dataset. Dimensionality reduction techniques transform high-dimensional datasets into low-dimensional datasets, thereby improving the model’s computational speed, predictability, and generalization ability. Dimensionality reduction techniques are classified into two categories: feature selection, and feature extraction. The feature selection technique is useful when the available dataset has a large dimension and relatively few cases (samples), whereas the feature extraction technique is useful when the dataset has a large dimension and high redundancy.^Citation20

The dataset in the present research had a large dimension, and the features appeared to have high redundancy. Therefore, it was not immediately clear to us whether we should use feature selection or feature extraction. Hence, we explored both feature selection and feature extraction techniques in order to find the best possible solution. Several regression models were employed to evaluate the relationship between the obtained input variables (features) and output variable.

In the scope of the present study, our focus was on PLGA nano- and microsphere dissolution properties and drug release rate. Szlęk et al^Citation21 and Fredenberg et al^Citation22 described that drug release from the PLGA matrix is mainly governed by two mechanisms: diffusion, and degradation/erosion. Several factors influencing the diffusion and degradation rates of PLGA as described by Kang et al, ^{Citation23,Citation24} Blanco and Alonso,^Citation25 and Mainardes and Evangelista^Citation26 include pore diameters, matrix active pharmaceutical ingredient (API) interactions, API-API interactions, and the composition of the formulation. Szlęk et al^Citation21 developed a predictive model to describe the underlying relationship between those influencing factors on the drug’s release profile, and they focused on feature selection, artificial neural network, and genetic programming approaches to come up with a suitable prediction model. In the past, several mathematical models, including the Monte Carlo and cellular automata microscopic models, were proposed by Zygourakis and Markenscoff,^Citation27 and Gopferich.^Citation28 A partial differential equations model was proposed by Siepmann et al^Citation29 to address the influence of underlying PLGA properties on the drug’s release rate/protein dissolution.

The highlights of the present article are as follows:

a comprehensive discussion on the drug release problem and dataset collection mechanisms;
a comprehensive discussion on various computational tools used to reduce dimensionality of dataset;
a concise discussion on the elementary regression models available in the literature;
a concise discussion on the ensemble methods used for making ensembles of the elementary regression models;
a comprehensive discussion and conclusion on the experimental results mentioned in the present article.

Methodology

A description of the problem

PLGA micro- and nanoparticles could play a significant role in the medical application and toxicity evaluation of PLGA-based multi-particulate dosages.^Citation30 PLGA micro-particles are important diluents used to produce drugs in their correct dosage form. Apart from playing the role as a filler, PLGA as an excipient, and alongside pharmaceutical APIs, plays other crucial roles in various ways. It helps in the dissolution of drugs, thus increasing the absorbability and solubility of drugs.^{Citation31,Citation32} It helps in pharmaceutical manufacturing processes by improving API powders’ fow and non-stickiness.

The dataset collected from various academic literature^Citation1^–^Citation18 contains 300 input features categorized into four groups, including protein descriptor, plasticizer, formulation characteristics, and emulsifier. A detailed description of the dataset is given in . For example, the formulation characteristics group contains features such as PLGA-inherent viscosity, PLGA molecular weight, lactide-to-glycolide ratio, inner and outer phase polyvinyl alcohol (PVA) concentration, PVA molecular weight, inner phase volume, encapsulation rate, mean particle size, PLGA concentration, and experimental conditions (dissolution pH, the number of dissolution additives, dissolution additive concentration, and production method and dissolution time). The protein descriptor, plasticizer, and emulsifier feature groups contain 85, 98, and 101 features, respectively. The regression model sought to predict the dissolution percentage or solubility of PLGA, which is dependent on the features mentioned above. In order to avoid over-fitting, collected data were preprocessed by adding noise to them. The dataset was then normalized, in other words, scaled within the range −1.0 and 1.0.

Table 1 The PLGA dataset description

Download CSV Display Table

Dimensionality reduction

Feature selection tools

Feature selection techniques enable us to identify the most relevant input feature from the available set of input features and allows us to avoid expensive (both in time and cost) experimental examination while developing a prediction model.^Citation33

Backward feature elimination

Backward feature elimination filtering starts with the maximum number of features (in this case, it starts with 300 features) and eliminates them one-by-one in an iterative manner. At each iteration, the resulting accuracy of prediction is evaluated for all combinations of the remaining attributes. The subsets of attributes with the high accuracies are propagated to the next iteration. Finally, the subset with the highest degree of accuracy (the lowest root mean square error [RMSE]) is selected as the best subset.

Correlation-based feature selection

Correlation-based feature selection assesses the value of a group of attributes that concern the individual predictive ability of each feature, as well with the possibility of repetition among the features.^Citation34

Classifier-based feature selection

Classifier-based feature selection evaluates attribute subsets on training data and uses a classifier to estimate the merits of a set of attributes. A search algorithm is then applied to search for a suitable feature from among all the available feature sets.

Wrapper feature selection

Wrapper-based feature selection evaluates attribute sets by using a learning scheme, and then uses cross-validation (CV) to estimate the accuracy of the learning scheme for a particular set of attributes.^Citation35 A search algorithm is then applied to search for a suitable feature set from among all the available feature sets.

Feature extraction

When it is affordable to easily generate test features, feature extraction techniques may be useful for dimensionality reduction. A regression model with a reduced input dimension may perform as well as it can if it has a complete set of features.^Citation20 Therefore, feature extraction helps in reducing the computational overhead that may be incurred when using a complete input dimension.

Principle component analysis

Principle component analysis (PCA) is a linear dimensionality reduction technique that transforms correlated data into uncorrelated data in a reduced dimension by finding a linear basis of reduced dimensionality for data with maximal variance. More specifically, it transfers correlated variables into a set of linearly uncorrelated variables called principle components.^{Citation36,Citation37}

Factor analysis

Factor analysis (FA), as opposed to PCA, determines whether a number of features of interest are linearly related to a smaller/reduced number of newly-defined features called factors. In other words, it discovers a reduced number of relatively independent features by mapping correlated features to a small set of features known as factors.^Citation38

Independent component analysis

Independent component analysis (ICA), proposed by Hyvärinen and Oja^Citation39 and Hyvärinen,^Citation40 is a linear dimension reduction technique that transforms multidimensional feature vectors into components that are statistically as independent as possible. More specifically, ICA maps the observed variables (features) to a small number of latent variables (features) that are non-Gaussian and mutually independent, and they are called the independent components of the observed data.^Citation41

Kernel PCA

Kernel PCA (kPCA) is an extension of PCA that uses kernel methods. kPCA computes the principal eigenvectors of the kernel matrix, rather than those of the covariance matrix.^Citation42 Reformulating PCA in the kernel space is straightforward, since a kernel matrix is similar to the inner product of the data points in the high-dimensional space that is constructed using the kernel function. Typically, Gaussian, tangent hyperbolic, polynomial, and other functions are used for the kernel.

Multidimensional scaling

Multidimensional scaling (MDS) is a non-linear dimension reduction technique that maps high-dimensional data representation into a low-dimensional representation while retaining the pairwise distances between the data points as much as possible. More specifically, MDS is used to analyze similarities or proximities between pairs of data points.^Citation43

Function approximation algorithms

A regression/prediction model tries to build the relationship between independent variables X (input) and dependent variables y (output).^Citation44 Moreover, it tries to find unknown parameters β such that the error (2) is minimized, given t predicted output y as:

\hat{y} = f (X, β) .

(1)

Let $e_{i} = ({\hat{y}}_{i} - y_{i})$ be the difference between the dependent variable y and the predicted value $\hat{y}$ . Therefore, the RMSE ξ, over data samples of size n may be given as:

ξ = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} e_{i}^{2} .}

(2)

Regression models such as linear regression (LReg), Gaussian process regression (GPReg), multilayer perceptron (MLP), and sequential minimal optimization regression (SMOReg) are as follows.

LReg

LReg is the simplest predictive model where independent variables (|X| = n×p), dependent variable (|y|=p), with noise/error (|ε|=p), may be written as:

y_{i} = β_{1} x_{i} 1 + β_{2} x_{i} 2 + \dots + β_{p} x_{i} p = x_{i}^{T} β ε_{i} .

(3)

GPReg

The GPReg described by Rasmussen and Williams^Citation45 and Rasmussen and Nickisch^Citation46 is easily identified by its mean function m(x) and covariance function k (x, x'). This is a natural generalization of the Gaussian distribution, whose mean m and covariance k are a vector and a matrix, respectively. Gaussian distribution is defined over vectors, whereas the Gaussian process is defined over functions f. Therefore, we may write:

f ~ G P (m, k) .

(4)

Considering a zero mean, linear and non-linear covariance functions may be given as:

k (x, x^{'}) = α x^{T} x^{'} + γ,

(5)

k (x, x^{'}) = α \exp (- \frac{γ}{2} {(x - x^{'})}^{T} (x - x^{'})),

(6)

where a and y are the parameters of the basis function.

MLP

MLP is a feed-forward neural network having one or more hidden layers in between the input and output layers.^{Citation47,Citation48} A neuron in an MLP first computes a linear-weighted combination of real-valued inputs, and then limits its amplitude using a non-linear activation function. In the presented research, MLP was trained using the backpropagation algorithm^Citation49 and the resilient propagator.^Citation50

Reduced error pruning tree

Reduced error pruning (REP) tree is a fast decision tree learner. It builds a decision tree based on information gain or reduction of the variance and prunes it using reduced-error pruning with over-fitting.^{Citation51,Citation52}

SMOReg

Sequential minimal optimization (SMO), an algorithm for the training of support vector regression proposed by Smola and Schölkopf,^{Citation53,Citation54} and Schölkopf and Burges,^Citation55 is an extension of the SMO algorithm proposed by Platt^Citation56 for the support vector machine classifier. The idea of support vector regression is based on the computation of a linear regression function in a high-dimensional feature space where the input data are mapped using a non-linear function; support vector regression tries to minimize the generalization error in order to achieve generalized performance.

The ensemble of function approximators

Getting the best regression algorithm is not a trivial task. Apart from having a plethora of options as listed in the present section, one has to decide what the optimal sets of parameters for each algorithm are. There is generally very little guidance available to address the question of how to select an algorithm and adjust its parameters for a specific problem. In such cases, experimental tests can help the user to make decisions. Still, in many cases the obtained results are not satisfactory or even not acceptable. In such situations, the ensemble approach can be used. Basically, it relies on the assumption that the properly-modeled fusion of responses of several elementary predictors will produce more accurate results and reduce the regression error.^Citation57 Formally, let Π be a set of k predictors given as:

Π = {f_{1}, f_{2}, \dots, f_{k}},

(7)

where f_k indicates the state of the kth predictor. Each of the predictors is trained independently. The ensemble system fuses the outputs produced by the predictors in set Π. In the simplest form, the ensemble can take the form of a simple average called the mean output regression, given as:

F^{'} (x) = \frac{1}{k} \sum_{i = 1}^{k} f_{i} (x),

(8)

where F′ is an ensemble system. The natural advantage of this model is its simplicity, since the output of the ensemble can easily be obtained by simple mathematical transformation without the necessity of setting any additional parameters. On the other hand, the main drawback of this model is that it treats all the elementary predictors as equally important, regardless of their quality. Weak predictors affect the final output to the same degree as strong ones. As a result, the quality of the ensemble is close to the average of all its constituents. Better results can be obtained when the contribution of a particular predictor depends on its quality. The greater the accuracy of the predictor, the greater its weight in the ensemble. The ensemble method is therefore called the quality weighted output regression, given as:

F^{'} (x) = \frac{1}{k} \sum_{i = 1}^{k} w_{i} f_{i} (x),

(9)

where

\sum_{i = 1}^{k} w_{i} = 1

. In its simplest form, the weights should be counter-proportional to the RMSE of the given predictors. However, in more advanced algorithms, the weights can be set over the course of time, eg, be an application of evolutionary algorithms.

Diversity of the ensemble

There are several issues that have to be dealt with in order to make the application of the ensemble approach effective. One of the most essential issues is maintaining diversity among the predictors in the ensemble. Collecting a set of similar regression algorithms does not allow users to take any advantage from their fusion. Diversity can be ensured by applying one of the following procedures:

collecting predictors based on different models;
differentiating elementary predictor inputs.

In the first approach, it is assumed that different regression algorithms naturally make errors that are uncorrelated, even when they are trained on the same data. The second group consists of algorithms that create an ensemble based on the same regression model, but diversity is caused by training each of them on data partitions (as diversity occurred due to data partition in the Bagging algorithm) or using heterogeneous feature sets (the techniques used in random subspace [RS] algorithms).

RS algorithms

RS is a method of constructing an ensemble of predictors where a pseudorandom procedure is used to select components of a feature vector separately for each ensemble constituent. The output of the ensemble is then obtained by averaging the outputs.^Citation58

Bagging algorithms

Breiman^Citation59 introduced the bagging method, which is basically a combination of multiple predictors. At first, subsets are prepared by cutting the original dataset using bootstrapping. A sequence of predictors is then allowed to run over the subsets of the dataset. Finally, the results from each of the predictors are aggregated using voting in order to get the final results. This method is supposed to enhance the performance of ensemble systems and reduce variances in order to improve predictability.^{Citation60,Citation61}

The evolutionary weighted ensemble

The evolutionary weighted ensemble (EWE) is used to make decisions, based on EquationEquation 8 $F^{'} (x) = \frac{1}{k} \sum_{i = 1}^{k} f_{i} (x),$ (8) . The learning process searches for a set of weight that minimizes the RMSE of the ensemble, and for that purpose, the learning set is used. Therefore, the objective function for the learning procedure or the ensemble system can be written as:

R M S E^{F^{'}} (w_{1}, w_{2}, \dots, w_{k}) = \sqrt{\frac{1}{N} \sum_{i}^{N} \sum_{j}^{k} w_{j} f_{j} {(x_{i} - y_{i})}^{2}},

(10)

where x_i, and y_i denote the ith input–target pair in the learning set that consists of a total of N samples.

We used the evolutionary algorithm,^Citation62 which processes a population of possible solutions encoded as chromosomes. An overview of the EWE training procedure is presented in . The components of the EWE algorithms are defined as follows:

Figure 1 Evolutionary weighted ensemble algorithm.

Initial population

The first step in the learning algorithm is generating an initial population. This consists of an arbitrarily chosen number of individuals with randomly selected weights that are scaled in order to ensure that their sum is 1.

Evaluation of the population

Each individual is evaluated using an objective function. Obtained values determine the further behavior of the algorithm, especially selection procedures.

Selection of the elite

The stability of the learning procedure is maintained by selecting two individuals with the smallest RMSE values. Those individuals, called the elite, are not affected by mutation or crossover operators and join the offspring population.

Selection of the parents

Only selected individuals participate in generating offspring for the new generation population. The selection is based on their fitness and is done in a probabilistic manner, ie, the smaller the RMSE of an individual, the greater the probability of its selection.

Mutation

The mutation operator of an evolutionary algorithm is supposed to ensure some amount of diversity within the population. In a classical implementation, it adds random noise to the chromosomes of selected individuals.

Crossover

The crossover operator exchanges data between two selected parents and forms two new individuals and for that purpose, a standard 1-point crossover procedure can be used in which the cutting point is selected randomly.

Offspring generation

At the end of each generation the merging elite, the mutated individuals and children created by the crossover operator creates offspring. The new population substitutes the previous one and the entire process is repeated until a satisfactory solution is found, or the maximum iteration reached.

Experiment setup and results

To accomplish dimensionality reduction and identification of the corresponding regression model, the experiment was set up as follows: the dataset obtained for the PLGA dissolution profile had 300 features; therefore, the primary objective was to reduce the dimensions of the dataset. Hence, to accomplish this, the feature selection and feature extraction techniques discussed earlier were used. Subsequently, elementary prediction models were employed and their performances were assessed using ten-fold cross validation (10-CV) sets. Selection of the prediction model was based on the average of the RMSE computed over a set of ten results. In the final part of our experiment, we explored ensemble methods in order to exploit the elementary regression/prediction models.

Feature selection method results

After cleaning and preprocessing the dataset, a feature selection treatment was used in which we used a backward feature elimination technique with the GPReg, LReg, SMOReg and REP prediction models. The parameter settings of the prediction models are provided in . The combination of attributes that offers the lowest RMSE was considered as the optimal feature set. For example, the optimal feature set obtained using the GPReg, LReg, MLP, SMOReg and REP regression models are 18, 32, 31, 30, and 31 with RMSE values of (resulting from a normalized dataset) 0.143, 0.156, 0.121, 0.153, and 0.126, respectively. The backward feature elimination results were convening in terms of RMSE. Therefore, for each of the predictors, we selected the feature sets with the smaller attributes, ie, set with ten, five, and one attribute.

Table 2 Parameters setting of the respective regression models used for the feature selection and feature extraction experiments

Download CSV Display Table

We have stochastic feature selection techniques such as correlation-based, classifier-based, and wrapper-based methods. These feature selection methods were used to determine the merits (predictability) of different combinations of features. After assigning the merits of the several sets of features, the best first search (BFS) and the greedy search (greedy) methods were used to select the desired optimal feature set. Interestingly, in the present problem, when we used correlation-based feature selection, both the BFS and greedy searches produced identical feature sets with five attributes. The classifier-based feature selection was patched with GPReg, MLP, and LR eg, respectively, in order to evaluate the merits of the feature set. Subsequently, BFS and greedy searches were used to determine the optimal feature set. Therefore, we had class-GPReg-BFS, class-GPReg-greedy, class-MLP-BFS, class-MLP-greedy, class-LReg-BFS, and class-LReg-greedy feature selection methods, indicating a classifier-based method with GPReg as a feature set merit evaluator and BFS as the method to select the optimal feature set. Similarly, wrapper-GP-greedy, wrapper-MLP-greedy, and wrapper-LReg-greedy indicate a combination of wrapper-based feature selections, where GPReg, MLP, and LReg were used to evaluate the feature set. Interestingly, both BFS and greedy searches offered identical feature sets. A list of feature selection methods and the corresponding selected features are illustrated in .

Table 3 Experimental results for 10-CV datasets prepared with distinct random partitions of the complete dataset using feature selection technique (Identification of regression model)

Download CSV Display Table

Results of the feature extraction technique

Unlike feature selection, feature extraction finds a new set of reduced features by computing linear or non-linear combinations of features from the available dataset. A comprehensive result is presented in , which illustrates the performance of feature extraction methods and regression models.

Table 4 Experimental results for 10-CV datasets prepared with distinct random partitions of the complete dataset using feature extraction techniques

Download CSV Display Table

Dimensionality reduction tools offered by van der Maaten et al^Citation20 were used for the feature extraction. PCA and FA linear dimensionality reduction methods, and non-linear dimensionality reduction methods such as kPCA and MDS were used to reduce the dimensions of the dataset from 300 to 50, 30, 20, 10, and 5. ICA was used to reduce the dimension of the dataset from 300 to 50. Results obtained using ICA are as follows. The mean RMSE and variance corresponding to GPReg, LReg, MLP, and SMOReg are 14.83, 17.23, 13.94, and 17.92 and 3.61, 2.34, 2.77, and 2.87, respectively. It may be observed from that lower dimensions offer less significant improvement in terms of RMSE. However, if we compare the best results (the result of reducing the dimension to 50) of PCA (an RMSE of 13.59 corresponding to MLP) and ICA (an RMSE of 13.94 corresponding to MLP) with the result using all features (an RMSE of 16.812 corresponding to GPReg), it is evident that reducing the dimension significantly improves the performance of the prediction model. Examining , an RMSE and variance comparison between chosen regression models applied on the dataset reduced it to a dimension of 50 using ICA, PCA, FA, kPCA, and MDS feature extraction techniques; we may conclude that the feature extraction using PCA performed best, both in terms of RMSE and variance, when the MLP regression model was used, whereas the feature extraction using ICA was second to PCA when MLP was used. When it came to GPReg, ICA had an edge over PCA.

Figure 2 Results of the feature extraction experiment for the reduced dimension set of 30 features: a comparison between the regression models. a comparison using average RMSE (A); a comparison using variances (B).

Abbreviations: RMSE, root mean square error; ICA, independent component analysis; PCA, principle component analysis; FA, factor analysis; kPCA, kernel PCA; MDS, multidimensional scaling; GPReg, Gaussian process regression; LReg, linear regression; MLP, multilayer perception; SMOReg, sequential minimal optimization regression.

The regression model and ensemble results

In order to identify a suitable regression model, we chose several regression models. The parameter settings corresponding to the regression models are given in . A comprehensive feature selection result using 10-CV is presented in . Examining , we may therefore draw the following conclusions. First of all, in , we arranged the feature selection methods according to ascending order of the number of features selected by the feature selection methods; the first row of that indicates no feature selection (ie, all 300 features were used), is exceptional. We compared the results of the prediction models arranged in the columns in . The feature selection process was able to find the most significant features that influenced the drug release rate. It may be observed that feature vectors from all the mentioned feature selection methods obtained a reduced set of the most influential features. Therefore, a general theory may be drawn about how and which features are the most dominant with regard to the PLGA drug release rate.

It is worth mentioning that the best result presented by Szlęk et al^Citation21 is an RMSE of 15.4, considering eleven selected features using MLP and 17 features with an RMSE of 14.3 using MLP. From , it may be observed that when considering all 300 features, the best result we can achieve is by using REP, resulting in an RMSE of 13.05 (the average of the 10-CV result). Therefore, any regression model tested with a reduced feature set must compete with this result. In our study, the best result was obtained with the feature set using the wrapper-GPReg-greedy method with RSME of 14.88, 20.22, 15.20, 13.31, and 20.86 using the GPReg, LReg, MLP, REP and SMOReg elementary models, respectively. Therefore, we may consider the features “fused ring count”, “heteroaromatic ring count”, “largest ring system size”, “chain atom count”, “chain bond count”, and “quaternary structure” from the protein descriptors group of features; “PVA concentration inner phase”, “PVA concentration outer phase”, “PVA molecular weight”, and “PLGA to placticizer” from the formulation characteristics group of features; and “acetylsalicylic acid”, “Szeged index”, and “pH=12 logD” from the plasticizer group of features. From the emulsifier group, we have “a(yy)” and “time in days” as being the most influential feature sets obtained using wrapper-GPReg-greedy experiment. A complete list of the feature names can be found in Szlęk et al.^Citation2

After we obtained the best features, we resorted to using ensemble techniques. A comprehensive comparison of the results obtained using the ensemble methods and other elementary regression models is given in . From the results presented in , it is evident that some of the listed ensemble methods provides better results than that of the result produced by the best elementary predictor ie, reduced error pruning tree. The average RMSE obtained by ensemble such as RS using REP, RS using MLP, RS using GPReg, Bagging using REP, Bagging using MLP, mean output regression, quality weighted output regression, and EWE are 13.85, 18.20, 18.72, 11.49, 12.30, 10.43, 10.06, and 7.67, respectively.

Table 5 A comprehensive conclusion of the results obtained from each regression model, including the ensemble techniques used

Display Table

Discussion and analysis

In this article, experimental results obtained using both feature selection and feature extraction techniques are offered. The primary objective of the experiments was to find the lowest RMSE. In addition, we took advantage of the feature selection methods to obtain the best set of features. Our benchmark for the present experiment was the RMSE obtained using the complete set of features, ie, 300 features, and the results obtained by Szlęk et al.^Citation21 The results obtained by the feature selection, feature extraction, and ensemble experiments are provided in –, respectively. The wrapper-based feature selection technique provided us the set of the most significant features. On the other hand, PCA offered a new set of features with solutions that were better than the solutions obtained with the complete dataset. The ensemble methods were only used for the feature selection methods. The ensemble methods enabled us to exploit all the evaluated regression models. Therefore, the best result (lowest RMSE) out of all the trained regressors was obtained using the EWE ensemble method. As mentioned above, predicting the PLGA dissolution rate is an important problem for the pharmaceutical industry. More significantly, identifying the influencing factors (features) is crucial for predicting the PLGA dissolution rate.

Conclusion

Analyzing the effectiveness of the ensemble methods should be based on a comparison of the results obtained using the best elementary predictors. In our case, among the tested simple predictors, the lowest RMSE was reached with REP (13.34). The ensemble methods should improve regression accuracy over the best elementary predictor. The EWE ensemble method offered the lowest RMSE, which proves that in certain cases, combining the outputs of several predictors allows us to improve overall accuracy. It is essential to ensure diversity among the ensemble’s constituents. Among the tested techniques, an ensemble of five heterogeneous regression algorithms provided the best results. Weighting their outputs was the most effective when weights were set using an evolutionary-based algorithm. Perhaps this is not the best method for creating a diversified ensemble of regression method in general, but it appeared to be the best one for the current problem we considered. We suggest that in all cases, a broad range of experiments with a variety of elementary regression algorithms and ensemble methods be used in order to find the best solution. Nonetheless, the obtained results prove that the proposed EWE method is an effective option for finding a solution to the present problem.

Acknowledgments

This work was supported by the IPROCOM Marie Curie Initial Training Network, funded through the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme FP7/2007–2013/, under REA grant agreement number 316555. This work was also supported by the Polish National Science Center under grant number DEC-2013/09/B/ST6/02264.

Disclosure

The authors report no conflicts of interest in this work.

References

KangFSinghJEffect of additives on the release of a model protein from PLGA microspheresAAPS Pharm Sci Tech20012430
Google Scholar
ZhouXLHeJTDuHJPharmacokinetic and pharmacodynamic profiles of recombinant human erythropoietin-loaded poly(lactic-co-glycolic acid) microspheres in ratsActa Pharmacol Sin20123311374422139004
PubMed Web of Science ®Google Scholar
FanDDe RosaEMurphyMBMesoporous silicon-PLGA composite microspheres for the double controlled release of biomolecules for orthopedic tissue engineeringAdv Funct Mater2012222282293
Web of Science ®Google Scholar
KimTHLeeHParkTGPegylated recombinant human epidermal growth factor (rhEGF) for sustained release from biodegradable PLGA microspheresBiomaterials200223112311231712013178
PubMed Web of Science ®Google Scholar
BlancoDAlonsoMJProtein encapsulation and release from poly (lactide-co-glycolide) microspheres: effect of the protein and polymer properties and of the co-encapsulation of surfactantsEur J Pharm Biopharm19984532852949653633
PubMed Web of Science ®Google Scholar
MokHParkTGWater-free microencapsulation of proteins within PLGA microparticles by spray drying using PEG-assisted protein solubilization technique in organic solventEur J Pharm Biopharm200870113714418515053
PubMed Web of Science ®Google Scholar
BuskeJKönigCBassarabSLamprechtAMühlauSWagnerKGInfluence of PEG in PEG-PLGA microspheres on particle properties and protein releaseEur J Pharm Biopharm2012811576322306701
PubMed Web of Science ®Google Scholar
CorriganOILiXQuantifying drug release from PLGA nanoparticulatesEur J Pharm Sci2009373–447748519379812
PubMed Web of Science ®Google Scholar
PurasGSalvadorAIgartuaMHernándezRMPedrazJLEncapsulation of Aβ (1–15) in PLGA microparticles enhances serum antibody response in mice immunized by subcutaneous and intranasal routesEur J Pharm Sci201144320020621820509
PubMed Web of Science ®Google Scholar
KimHKParkTGMicroencapsulation of dissociable human growth hormone aggregates within poly(D,L-lactic-co-glycolic acid) microparticles for sustained releaseInt J Pharm20012291–210711611604263
PubMed Web of Science ®Google Scholar
HanYTianHHePChenXJingXInsulin nanoparticle preparation and encapsulation into poly(lactic-co-glycolic acid) microspheres by using an anhydrous systemInt J Pharm20093781–215916619465100
PubMed Web of Science ®Google Scholar
HeJFengMZhouXStabilization and encapsulation of recombinant human erythropoietin into PLGA microspheres using human serum albumin as a stabilizerInt J Pharm20114161697621699969
PubMed Web of Science ®Google Scholar
GasperMMBlancoDCruzMEAlonsoMJFormulation of L-asparaginase-loaded poly(lactide-co-glycolide) nanoparticles: influence of polymer properties on enzyme loading, activity and in vitro releaseJ Control Release1998521–253629685935
PubMed Web of Science ®Google Scholar
KawashimaYYamamotoHTakeuchiHFujiokaSHinoTPulmonary delivery of insulin with nebulized DL-lactide/glycolide copolymer (PLGA) nanospheres to prolong hypoglycemic effectJ Control Release1999621–227928710518661
PubMed Web of Science ®Google Scholar
UngaroFd’Emmanuele di Villa BiancaRGiovinoCInsulin-loaded PLGA/cyclodextrin large porous particles with improved aerosolization properties: in vivo deposition and hypoglycaemic activity after delivery to rat lungsJ Control Release20091351253419154761
PubMed Web of Science ®Google Scholar
JiangHLJinJFHuYQZhuKJImprovement of protein loading and modulation of protein release from poly(lactide-co-glycolide) microspheres by complexation of proteins with polyanionsJ Microencapsul200421661562415762319
PubMed Web of Science ®Google Scholar
PiroozniaNHasanniaSLotfASGhaneiMEncapsulation of alpha-1 antitrypsin in PLGA nanoparticles: in vitro characterization as an effective aerosol formulation in pulmonary diseasesJ Nanobiotechnology2012102022607686
PubMed Web of Science ®Google Scholar
CastellanosIJFloresGGriebenowKEffect of cyclodextrins on alpha-chymotrypsin stability and loading in PLGA microspheres upon S/O/W encapsulationJ Pharm Sci200695484985816493595
PubMed Web of Science ®Google Scholar
AsteteCESabliovCMSynthesis and characterization of PLGA nanoparticlesJ Biomater Sci Polymer Ed2006173247289
PubMed Web of Science ®Google Scholar
van der MaatenLJPostmaEOvan den HerikHJDimensionality reduction: a comparative reviewTechnical Report TiCC TR 2009-005
Google Scholar
SzlękJPaclawskiALauRJachowiczRMendykAHeuristic modeling of macromolecule release from PLGA microspheresInt J Nanomedicine2013814601461124348037
PubMedGoogle Scholar
FredenbergSWahlgrenMReslowMAxelssonAThe mechanisms of drug release in poly(lactic-co-glycolic acid)-based drug delivery systems: a reviewInt J Pharm20114151–2345221640806
PubMed Web of Science ®Google Scholar
KangJSchwendemanSPPore closing and opening in biodegradable polymers and their effect on the controlled release of proteinsMol Pharm20074110411817274668
PubMed Web of Science ®Google Scholar
KangJLambertOAusbornMSchwendemanSPStability of proteins encapsulated in injectable and biodegradable poly(lactide-co-glycolide)-glucose millicylindersInt J Pharm2008357123524318384984
PubMedGoogle Scholar
BlancoMDAlonsoMJDevelopment and characterization of protein-loaded poly(lactide-co-glycolide) nanospheresEur J Pharm Biopharm1997433287294
Web of Science ®Google Scholar
MainardesRMEvangelistaRCPLGA nanoparticles containing praziquantel: effect of formulation variables on size distributionInt J Pharm20052901–213714415664139
PubMed Web of Science ®Google Scholar
ZygourakisKMarkenscoffPAComputer-aided design of bioerodible devices with optimal release characteristics: a cellular automata approachBiomaterials19961721251358624389
PubMed Web of Science ®Google Scholar
GopferichAMechanisms of polymer degradation and erosionBiomaterials19961721031148624387
PubMed Web of Science ®Google Scholar
SiepmannJFaisantNBenoitJPA new mathematical model quantifying drug release from bioerodible microparticles using Monte Carlo simulationsPharm Res200219121885189312523670
PubMed Web of Science ®Google Scholar
LangerRTirrellDADesigning materials for biology and medicineNature200442848749215057821
PubMed Web of Science ®Google Scholar
BrodbeckKJDesNoyerJRMcHughAJPhase inversion dynamics of PLGA solutions related to drug delivery. Part II. The role of solution thermodynamics and bath-side mass transferJ Control Release199962333334410528071
PubMed Web of Science ®Google Scholar
MakadiaHKSiegelSJPoly Lactic-co-Glycolic Acid (PLGA) as biodegradable controlled drug delivery carrierPolymers (Basel)2011331377139722577513
PubMed Web of Science ®Google Scholar
DashMLiuHFeature Selection for ClassificationIntelligent Data Analysis199713131156
Google Scholar
HallMASmithLAPractical feature subset selection for machine learningMcDonaldCProceedings of the 21st Australasian Computer Science Conference ACSC ’98, Perth, WA, Australia, 4–6, February, 1998BerlinSpringer1998181191
Google Scholar
KohaviRJohnGHWrappers for feature subset selectionArtif Intell1997971273324
Web of Science ®Google Scholar
PearsonKLIII. On lines and planes of closest fit to systems of points in spaceLond Edinburgh Dublin Philosoph Mag J Sci1901211559572
Google Scholar
AbdiHWilliamsLJPrincipal component analysisWiley Interdisciplinary Reviews: Computational Statistics201024433459
Google Scholar
HarmanHHModern Factor AnalysisChicagoUniversity of Chicago Press1960
Google Scholar
HyvärinenAOjaEIndependent component analysis: algorithms and applicationsNeural Netw2000134–541143010946390
PubMed Web of Science ®Google Scholar
HyvärinenAFast and robust fixed-point algorithms for independent component analysisIEEE Trans Neural Netw199910362663418252563
PubMed Web of Science ®Google Scholar
HyvärinenAIndependent component analysis and blind source separation Technical reportHelsinkiHelsinki University of Technology2003 Available from: http://www.cs.helsinki.f/u/ahy-varin/presentations/Berlin05.pdfAccessed December 22, 2014
Google Scholar
SchölkopfBSmolaAMüllerKRKernel principal component analysisGersnerWGermondAHaslerMNicoudJDArtificial Neural Networks — ICANN ’97BerlinSpringer1997583588
Google Scholar
KruskalJBMultidimensional scaling by optimizing goodness of fit to a nonmetric hypothesisPsychometrika1964291127
Web of Science ®Google Scholar
NeterJApplied linear statistical models4th edChicagoIrwin1996
Google Scholar
RasmussenCEWilliamsCKGaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)Cambridge, MAMIT Press2005
Google Scholar
RasmussenCENickischHGaussian processes for machine learning (GPML) toolboxJMLR20101130113015
Google Scholar
HaykinSNeural Networks: A Comprehensive Foundation1st edUpper Saddle River, NJPrentice Hall PRT1994
Google Scholar
WerbosPJBeyond Regression: New Tools for Prediction and Analysis in the Behavioral SciencesPrinceton, NJHarvard University Press1975
Google Scholar
RumelhartDEMcClellandJLParallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol 1: FoundationsCambridge, MAMIT Press1986
Google Scholar
RiedmillerMBraunHA direct adaptive method for faster back-propagation learning: the RPROP algorithmIEEE International Conference on Neural NetworksIEEE1993586591
Google Scholar
QuinlanJRSimplifying decision treesInt J Man-Mach Stud1987273221234
Google Scholar
MohamedWNSallehMNOmarAHA comparative study of reduced error pruning method in decision tree algorithmsIEEE International Conference on Control System, Computing and Engineering (ICCSCE)November 23–25, 2012Penang, Malaysia11 20122012392397
Google Scholar
SmolaAJSchölkopfBLearning with Kernels: Support Vector Machines, Regularization, Optimization, and BeyondCambridge, MAMIT Press1998
Google Scholar
SmolaAJSchölkopfBA tutorial on support vector regressionStatistics Computing2004143199222
Web of Science ®Google Scholar
SchölkopfBBurgesCJCSmolaAJAdvances in Kernel Methods: Support Vector LearningCambridge, MAMIT Press1999
Google Scholar
PlattJProbabilistic outputs for support vector machines and comparisons to regularized likelihood methodsAdvances in Large Margin Classifiers19991036174
Google Scholar
BrownGWyattJHarrisRYaoXDiversity creation methods: A survey and categorisationJournal of Information Fusion200561520
Google Scholar
HoTKThe random subspace method for constructing decision forestsIEEE Transac Pattern Anal Mach Intell1998208832844
Web of Science ®Google Scholar
BreimanLBagging predictorsMach Learn1996242123140
Web of Science ®Google Scholar
SaeysYInzaILarrañagaPA review of feature selection techniques in bioinformaticsBioinformatics200723192507251717720704
PubMed Web of Science ®Google Scholar
BauerEKohaviRAn empirical comparison of voting classification algorithms: Bagging, boosting, and variantsMach Learn1999361–2105139
Web of Science ®Google Scholar
GoldbergDEGenetic Algorithms in Search, Optimization, and Machine LearningBoston, MAAddison-Wesley1989
Google Scholar

Dimensionality reduction, and function approximation of poly(lactic-co-glycolic acid) micro- and nanoparticle dissolution rate

Abstract

Introduction

Methodology

A description of the problem

Table 1 The PLGA dataset description

Dimensionality reduction

Feature selection tools

Backward feature elimination

Correlation-based feature selection

Classifier-based feature selection

Wrapper feature selection

Feature extraction

Principle component analysis

Factor analysis

Independent component analysis

Kernel PCA

Multidimensional scaling

Function approximation algorithms

LReg

GPReg

MLP

Reduced error pruning tree

SMOReg

The ensemble of function approximators

Diversity of the ensemble

RS algorithms

Bagging algorithms

The evolutionary weighted ensemble

Initial population

Evaluation of the population

Selection of the elite

Selection of the parents

Mutation

Crossover

Offspring generation

Experiment setup and results

Feature selection method results

Table 2 Parameters setting of the respective regression models used for the feature selection and feature extraction experiments

Table 3 Experimental results for 10-CV datasets prepared with distinct random partitions of the complete dataset using feature selection technique (Identification of regression model)

Results of the feature extraction technique

Table 4 Experimental results for 10-CV datasets prepared with distinct random partitions of the complete dataset using feature extraction techniques

The regression model and ensemble results

Table 5 A comprehensive conclusion of the results obtained from each regression model, including the ensemble techniques used

Discussion and analysis

Conclusion

Acknowledgments

Disclosure

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date