Abstract
In vitro study of the deposition of drug particles is commonly used during development of formulations for pulmonary delivery. The assay is demanding, complex, and depends on: properties of the drug and carrier particles, including size, surface characteristics, and shape; interactions between the drug and carrier particles and assay conditions, including flow rate, type of inhaler, and impactor. The aerodynamic properties of an aerosol are measured in vitro using impactors and in most cases are presented as the fine particle fraction, which is a mass percentage of drug particles with an aerodynamic diameter below 5 μm. In the present study, a model in the form of a mathematical equation was developed for prediction of the fine particle fraction. The feature selection was performed using the R-environment package “fscaret”. The input vector was reduced from a total of 135 independent variables to 28. During the modeling stage, techniques like artificial neural networks, genetic programming, rule-based systems, and fuzzy logic systems were used. The 10-fold cross-validation technique was used to assess the generalization ability of the models created. The model obtained had good predictive ability, which was confirmed by a root-mean-square error and normalized root-mean-square error of 4.9 and 11%, respectively. Moreover, validation of the model using external experimental data was performed, and resulted in a root-mean-square error and normalized root-mean-square error of 3.8 and 8.6%, respectively.
Video abstract
Point your SmartPhone at the code above. If you have a QR code reader the video abstract will appear. Or use:
Introduction
Dry powder inhalers are frequently used in the treatment of a number of respiratory diseases, such as asthma and chronic obstructive pulmonary disease. The therapeutic efficiency of these drug formulations depends on the deposition of drug particles at different levels of the respiratory system. Aerodynamic diameter is usually applied for the description of particle behavior in the air stream and is determined by particle size, shape, and density. Small particles with an aerodynamic diameter of less than 5 μm are deposited deep in the lungs, but in most cases, they have poor flow properties as well as a highly cohesive nature, leading to low stability and poor quality of formulations.Citation1,Citation2 The addition of a carrier can solve those problems. The fine particle fraction (FPF) represents the mass percentage of drug particles with an aerodynamic diameter below 5 μm, and is used for in vitro assessment of the aerodynamic properties of aerosols. Aerosol generation and in vitro deposition of particles is a complex process that depends on: the properties of the drug and carrier particles, assay conditions, and inhaler characteristics.Citation1,Citation2 Carrier surface characteristics strongly influence adhesion and severance of the particles in a drug. For effective drug delivery, interactions between the drug and carrier have to be powerful enough to produce a homogeneous formulation and at the same time not too potent to liberate drug particles during inhalation. Carrier particles with a rough surface have more binding sites to attach drug particles and less pronounced interactions between particles.Citation3,Citation4 Surface attributes can be quantified and applied in model development with use of tools created by Chinga et al and images from a scanning electron microscope (SEM).Citation5
During recent years, there have been several publications about model development for prediction of particle deposition. Vinchurkar et alCitation6 used a computational fluid dynamics method to model a Mark II Andersen cascade impactor (ACI) and applied it to evaluate the effects of charge on deposition of particles. A commercial computational fluid dynamics code was used for the flow field simulation. The model was validated based on predictions for cut-off d50 diameters for each of eight ACI stages. Kaialy et alCitation7 analyzed factors that influence the FPF in a formulation composed of various types of commercial lactose and salbutamol particles. They found that the elongated, more irregular shape, and rougher surface of the carrier resulted in higher FPF values. The authors used only simple shape and surface factors calculated based on the length and breadth of the carriers. Chen et alCitation8 combined computational fluid dynamics and the discrete element method for simulation of the transport and deposition of spherical particles in the computer model of a three-generation pulmonary airway. These authors used only two physical parameters of particles, like size and density for predictions of FPF. Sturm and HofmannCitation9 presented a system for predicting extrathoracic, bronchial, and alveolar fiber deposition in the human respiratory tract based on breathing conditions, fiber properties, and the morphometric lung model. In most cases, the models available can predict in vitro deposition for a system composed of one type of particle (drug or carrier), and use only one or two formulation-related factors, like the charge, dimensions, or density of the particles. There is a lack of a general model that could support the formulation development process for carrier-based pulmonary delivery systems.
Artificial neural networks and genetic programming methods have been successfully employed for modeling purposes in pharmaceutical technology, eg, prediction of dissolution profiles,Citation10 in vitro-in vivo correlation,Citation11 and prediction of pellet properties.Citation12 The modeling approach was presented as a useful method during construction of expert systems to support formulation development process.Citation13 The present study introduces the concept of using empirical modeling based on data in the literature to obtain a predictive model for in vitro deposition of drug particles. The methodology is based on selection of important variables and advanced modeling tools like artificial neural networks, genetic programming, rule-based systems, and fuzzy logic. The prepared models are validated on a new experimental data set.
Materials and methods
Data set
The database was acquired from the literature. Scientific articles were scanned and included in the analysis based on the following criteria: detailed information about drug and carrier (name and particle size); availability of an SEM image of the carrier; and description of assay conditions (flow rate, impactor and inhalator type).
After review of approximately 800 papers from the Scopus and PubMed databases, eleven met the inclusion criteria. The created database contained information about FPF for three various impactors, ie, the ACI, the next-generation impactor, and the multi-stage liquid impinger. A detailed list of the source publications is included in Table S1. Formulations were composed of five different types of substances as carriers, ie, trehalose, mannitol, lactose, erythritol, and hydroxyapatite. Based on the SEM pictures of carriers, 13 variables describing surface properties were calculated, including the arithmetical mean deviation, root mean square deviation, skewness of the assessed profile (Rsk), kurtosis of the assessed profile, lowest valley, highest peak, total height of the profile, average height of an unleveled surface, mean polar facet orientation, variation of the polar facet orientation, direction of azimuthal facets, mean resultant vector, and surface area.Citation14 The carrier shape analysis was performed based on SEM pictures using ImageJ,Citation15 as described in the section on surface and shape analysis. The procedure resulted in six parameters describing the shape of carriers: the circularity, longest distance between two points (Feret), angle between the Feret’s diameter and a line parallel to the x axis of the image (FeretAngle), minimum caliper diameter (MinFeret), ratio between particle height and particle width, and roundness.Citation16 The database contained information about formulations composed of nine active pharmaceutical ingredients (API), ie, salbutamol, budesonide, ciprofloxacin, cyclosporine A, disodium cromoglycate, fluticasone propionate, formoterol fumarate, ipratropium bromide, and salmeterol. The chemical structure and properties of the drug molecules were encoded by chemical descriptors computed using Marvin cxcalc plugin, UK (version 6.1; ChemAxon, Budapest, Hungary)Citation17 based on three-dimensional optimized structures. Moreover, the mass percentage of API in the formulation, the carrier, and the API particle size distribution, inhaler device type (Novolizer®, Aerolizer®, Rotahaler®, powder dispatchment tube, SetA, SetD), flow rate (L/min) during the experiment, were included according to data found in the articles. The complete structure of the database is shown in and the full database is available in Table S2. Overall, there were 91 data records with 136 variables. The FPF was the only dependent variable and the other 135 input variables contained information about the carrier, drug, and assay conditions. The data set was processed to reduce the size of the input vector and to split data according to the 10-fold cross-validation method to check the generalization ability of the models created and simulate their real application to predict in vitro deposition for new formulations and unknown conditions.
Surface and shape analysis
The surface analysis was performed based on the SEM photographs using the SurfCharJCitation14 plugin for ImageJ (version 1.47n), which allows calculation of parameters quantifying surface roughness. All parameters are described in the “data set” section of this article. Prior to analysis, each picture was standardized in terms of scale and grayscale depth (32-bit). Ten randomly chosen square sections of the particle surface with a size of 10 μm ×10 μm were analyzed. Surface roughness was calculated with standard settings of the SurfCharJ plugin with an additional “level surface” option, which allows alignment of the surface by subtracting a regression plane from the surface. The final results for each particle were the average of ten samples. The carrier shape study was performed using ImageJ standard tools for a particle analysis with manual marking of each particle.
Selection of features
The aim of the feature selection was to reduce the number of inputs in the database before the modeling process in order to simplify the models created, to find the most important variables, and to save time and computational resources. The feature selection was performed by “fscaret”Citation18 for R environment (The R Foundation for Statistical Computing, Vienna, Austria).Citation19 The main parameters of the method used are listed in . The results are presented as a ranking of variables with a calculated importance value for each variable. Cut-off points for creation of new databases were set at a 5% gradient decrease.
Model assessment
Model goodness of fit was expressed as root-mean-squared error (RMSE, Equationequation 1(1) ) and normalized root-mean-squared error (NRMSE, Equationequation 2(2) ).
Artificial neural networks
Multilayer perceptron neural networks were created using a “monmlp”Citation20 package for R environment. All of the prepared models had two hidden layers, each one numbering from 4 to 50 nodes. The transfer function for the hidden layer was set as a hyperbolic tangent (tansig), and the linear function was applied for the output layer. Ensemble systems were employed and contained ten or 20 neural networks. Variables were scaled from 0.1 to 0.9, and iteration numbers were set to 10, 50, 80, 100, 200, 400, 500, 800, and 1,000. The multistart technique was used in order to avoid local minima: the “trials” parameter was set to 5.
Rule-based systems
For modeling purposes, two rule-based systems were used, ie, “randomForest”Citation21 and “Cubist”.Citation22 The first one creates models based on a forest of decision trees using random inputs. The following parameters were used during the modeling process: automatic selection number of variables, maximum number of nodes set as 1,000, and number of trees set from 1 to 100. Cubist also creates regression models in a manner of decision trees, but it introduces linear equations at their terminate branches. During the modeling process, the maximum number of rules was fixed at 100, and the number of committees was set from one to 100. The extrapolation parameter, which controls the estimation ability of created models beyond the original observation range, was set to 100. The sample parameter, which is a percentage of the randomly selected data set for model building, was established at zero, which means that no data subsampling was employed and all the models were built on the complete data sets available for each run.
Fuzzy systems
Package for R environment “fugeR”Citation23 was used to create models based on fuzzy logic rules. This tool uses a genetic algorithm to build a fuzzy system based on a given training data set. At the beginning, the system generates a random population and tests it on the available data. Afterwards, the best models are used to generate a new population based on genetic operators like crossover and mutation. The maximum number of rules was set to 3, 40, 50, or 100 (“maxRules”). The maximum number of input variables per rule was set from two to five. The population size was varied from 100 to 5,000.
Genetic programming
Mathematical models were produced with the genetic programming system available from the “rgp”Citation24 package of the R environment. The package implements symbolic regression, a method that allows automatic construction of a mathematical formula by evolutionary algorithms based on experimental data. The model obtained is of a white-box type, so the results are easier to interpret in comparison with artificial neural network models. The size of the chromosome, which is a representation of the maximum length of the equation, was varied from 5 to 100. The more complex the equation, the higher the probability of its overfitting and weak generalization ability, thus the final choice of the optimal model is always a trade-off between its complexity and best achievable goodness of fit criterion. The population size was set to 1,000 and the modeling process was set to 500 million evolution steps divided into 100 stages. After each stage, the models were tested according to the 10-fold cross-validation method. Apart from maximum evolution steps, minimum training error (RMSE) was set as an additional algorithm stop condition. According to the previous results (monmlp, randomForest, and Cubist), its value was established as 5. The genetic programming method was applied to the original database to find the mathematical relationship between FPF, formulation properties, and assay conditions. After selecting the best model, its parameters were optimized to assess the generalization ability. For optimization purposes, the “optimx” package for R environment was used.Citation25 The general scheme of work and the models are presented in and .
Hardware environment
All calculations were performed using 27 workstations with a total of 304 threads working under the openSUSE 13.1 operating system.
Results and discussion
Selection of important variables
The results of the fscaret package were used for selection of the most important variables. Based on this method, 51 new data sets were prepared, containing from 4 to 46 input variables.Citation26 These data sets were used in a further modeling process using artificial neural networks, Cubist and random-Forest tools in the 10-fold cross-validation mode. The best model in terms of generalization ability was found for the input vector of 28 variables (). The most numerous group of selected variables described drug properties, especially electronic characteristics, such as water accessible surface area (ASA), logP, and hydrogen donor bond count at pH 12. According to the definition of logP, hydrophilic substances have a low partition coefficient value and they are more soluble in water than in n-octanol. Hydrophobic substances have a high logP value and their solubility in water is lower than in n-octanol. Chuman et alCitation27 created a model for calculation of logP based on solvation energy and ASA. It could be observed that molecules with a low ASA value are more hydrophilic than chemical compounds with a high ASA.
In conclusion, ASA and logP values are associated with hydrophilic properties of substances related to their dipole moment, thus accounting for the electrostatic behavior of particles. According to Pilcer et alCitation2 the latter could be important for drug-carrier interactions in the same way that van der Waals forces are important. It could also be hypothesized that the drug’s hydrogen donor bond count at pH 12 is probably related to the pKa value of carriers, which in most cases in the collected database is between 12 and 13. This may strengthen our earlier conclusion about the importance of electrostatic interactions between carrier and drug particles. Since no data about the actual charge on the particle surface were available, our reasoning is indirect and based on the assumption that the properties of a chemical compound influence and/or determine the properties of particles containing chemical substance as a main component. The other group of descriptors depicts the surface and shape of the carrier. Both factors are important for deposition of particles and influence adhesion forces between particles. There is still no clear explanation of how surface roughness can influence aerosol performance. Several authors have reported contradictory observations, ie, that both smooth and rough surface of the carrier was beneficial for particle deposition.Citation2,Citation28 Based on those results, it may be hypothesized that the performance of an aerosol is not only related to the roughness of the particles, but is also linked to the shape of spikes and valleys observed on the carrier surface, as these variables were also included in our best model. These findings need further investigation. It was also found that assay conditions like flow rate, type of inhaler device, and impactor type can affect particle deposition.Citation29,Citation30
Modeling
Based on 10-fold cross validation scheme, new data sets with reduced number of inputs were divided into learn-test pairs. A total of 51 different input vectors were used at the first stage of modeling by Cubist, randomForest, and monmlp packages for R environment. The best results were obtained for 28 input vectors for artificial neural network-based models with RMSE and NRMSE equal to 5.76 and 13%, respectively. A comparison of the best models created for all modeling methods is shown in . Thus, a further modeling process with a genetic programming method and fuzzy systems was performed using 28 input vectors. The structure of the data set is presented in . Models created using rule-based systems like Cubist and randomForest showed an NRMSE error that was slightly higher than artificial neural networks (by 3% and 2%, respectively). The best fuzzy logic model had an RMSE and an NRMSE of 5.5 and 12%, respectively. The mathematical model was characterized by an RMSE of 4.9. Moreover, the genetic algorithm performed further automatic input vector reduction, ending up with only eleven input variables selected from the database (EquationEquation 3(3) ). The observed versus predicted plot for the model is presented in .
As a summary of the modeling step, the architecture and settings of the created models are shown in . A more complex mathematical equation derived as an additional model is presented in Table S3. Additionally, GUI-based software implementing the presented models was written in Java and published for free download from the Internet (the details are presented in Table S4).
The features selected for FPF prediction utilize information on drug properties (three variables), assay conditions (two variables), drug content in the formulation, and the properties of the carrier, including surface, type, and particle size. Further analysis of EquationEquation 3(3) revealed that in vitro deposition as predicted by the model increases together with flow rate and decreasing drug content in the formulation (). Steckel and MüllerCitation31 showed that, in most cases, the FPF decreases when the drug content in the formulation increases. Experimental results show that an increased flow rate results in a higher FPF.Citation32–Citation34 The Rsk variable was found to be important for prediction of the FPF. It is asymmetry measure of the probability distribution of surface profile and is a more complex parameter than other surface descriptors, eg, the arithmetical mean deviation, root mean square deviation, lowest valley, highest peak, and total height of the profile.Citation5 A positive value of Rsk indicates preference of low areas on the surface whereas a negative value of Rsk indicates domination of highly elevated surfaces. An Rsk value close to zero means that the distribution of height values is similar to a normal distribution. Moreover, according to the plot of Rsk versus FPF (), it can be observed that a surface with a low Rsk value () has narrow valleys and rough spikes. This could impede interaction between carrier and drug particles. In the case of a higher Rsk, valleys on the particle surface are wider (), so could provide a better chance for interactions between drug and carrier particles. There is a need for further experiments to explain the influence of the surface topology of the carrier on drug-carrier interaction. Further, according to EquationEquation 3(3) , the more hydrophilic the API, the lower the FPF that can be achieved. This finding is probably related to the dipole moment, which is crucial for the strength of van der Waals forces between particles. It is known that a more hydrophilic substance has a larger dipole moment.
Validation
The final mathematical model (EquationEquation 3(3) ) was tested on new experimental data from the College of Engineering, Nanyang Technological University, Singapore. A new formulation composed of a hydroxyapatite carrier () and budesonide as a model drug was tested in an ACI impactor. Before validation, the parameters of EquationEquation 3(3) were optimized based on the complete data set gathered from the literature, so no further cross-validation (10-cv) was employed. The validation data set was prepared according to the described methodology and all inputs used for calculation are shown in . The results are summarized in . FPF values predicted for a flow rate of 30 L/min and 60 L/min were 14.50% and 23.16%, respectively. The RMSE and NRMSE calculated for the external validation data set were 3.8 and 8.6%, respectively. The obtained results indicate that the created model is reliable.
Conclusion
In this study, an approach to empirical modeling based on data in the literature was developed. All stages of the experiment, eg, shape and surface analysis, feature selection, and modeling, were performed using open source software available to everyone for free.Citation19 As a result, the model described by the classical mathematical equation for prediction of in vitro deposition based on characteristics of formulation and assay conditions was obtained. A modeling technique like genetic programming can be useful for modeling of complex processes such as in vitro deposition. The feature selection led to reduction of the input variables from 135 to 28. During development of the model, three key elements were applied:
SurfCharJ plugin and the ImageJ program, which allowed calculation of the carrier’s surface and shape descriptors based on SEM images
Marvin, a tool for computation of molecular descriptors
Tools for feature selection and modeling.
The validation of the model confirmed its applicability for development of new inhalation formulations and decision support systems.
Acknowledgments
This work was funded by a bilateral Poland-Singapore cooperation project (2/3/POL-SIN/2012).
Disclosure
The authors report no conflicts of interest in this work.
References
- RahimpourYHamishehkarHLactose engineering for better performance in dry powder inhalersAdv Pharm Bull20122218318724312791
- PilcerGWauthozNAmighiKLactose characteristics and the generation of the aerosolAdv Drug Deliv Rev201264323325621616107
- HassanMSLauRInhalation performance of pollen-shape carrier in dry powder formulation: effect of size and surface morphologyInt J Pharm20114131–29310221540087
- HassanMSLauRWMEffect of particle shape on dry particle inhalation: study of flowability, aerosolization, and deposition propertiesAAPS Pharm Sci Tech200910412521262
- ChingaGJohnsenPODoughertyRBerliELWalterJQuantification of the 3D microstructure of SC surfacesJ Microsc2007227Pt 325426517760621
- VinchurkarSLongestPWPeartJCFD simulations of the Andersen cascade impactor: Model development and effects of aerosol chargeJ Aerosol Sci2009409807822
- KaialyWTicehurstMNokhodchiADry powder inhalers: mechanistic evaluation of lactose formulations containing salbutamol sulphateInt J Pharm2012423218419422197772
- ChenXZhongWZhouXJinBSunBCFD–DEM simulation of particle transport and deposition in pulmonary airwayPowder Technol2012228309318
- SturmRHofmannWA computer program for the simulation of fiber deposition in the human respiratory tractComput Biol Med200636111252126716212953
- SzlękJPacławskiALauRJachowiczRMendykAHeuristic modeling of macromolecule release from PLGA microspheresInt J Nanomedicine201384601461124348037
- MendykATuszyńskiPKPolakSJachowiczRGeneralized in vitro-in vivo relationship (IVIVR) model based on artificial neural networksDrug Des Devel Ther20137223232
- MendykAKleinebuddePThommesMYooASzlękJJachowiczRAnalysis of pellet properties with use of artificial neural networksEur J Pharm Sci2010413–442142920659554
- MendykAJachowiczRUnified methodology of neural analysis in decision support systems built for pharmaceutical technologyExpert Syst Appl200732411241131
- SurfCharJ plugin for ImageJ Available from: http://www.gcsca.net/IJ/SurfCharJ.htmlAccessed January 1, 2014
- RasbandWSImageJBethesda, MD, USANational Institutes of Health Available from: http://imagej.nih.gov/ij/1997–2014
- Particle analysis with ImageJ Available from: http://rsb.info.nih.gov/ij/docs/menus/analyze.htmlAccessed June 1, 2014
- Marvin ChemAxon Available from: http://www.chemaxon.comAccessed January 1, 2014
- SzlękJfscaret: automated caret feature selection Available from: http://cran.r-project.org/web/packages/fscaret/index.htmlAccessed June 1, 2014
- R Core TeamR: a language and environment for statistical computingVienna, AustriaThe R Foundation for Statistical Computing Available from: http://www.R-project.org/Accessed June 1, 2014
- CannonAJmonmlp: monotone multi-layer perceptron neural network Available from: http://cran.r-project.org/web/packages/monmlp/index.htmlAccessed June 1, 2014
- LiawAWienerMClassification and regression by randomForestR News2002231822
- KuhnMWestonSKeeferCCoulterNC code for Cubist by Ross Quinlan. Cubist: Rule- and Instance-Based Regression ModelingR package version 0.0.18 Available from: http://CRAN.R-project.org/package=CubistAccessed June 1, 2014
- BujardAfugeR: FUzzy GEnetic, a machine learning algorithm to construct prediction model based on fuzzy logicR package version 0.1.2 Available from: http://CRAN.R-project.org/package=fugeRAccessed June 1, 2014
- FlaschOMersmannOBartz-BeielsteinTStorkJZaeffererJrgp: R genetic programming framework Available from: 0.4-0. http://CRAN.R-project.org/package=rgpAccessed June 1, 2014
- NashJCVaradhanRUnifying optimization algorithms to aid software system users: optimx for RJ Stat Softw201143911422003319
- SzlękJA short fscaret package introduction with examples Available from: http://cran.r-project.org/web/packages/fscaret/vignettes/fscaret.pdfAccessed August 28, 2014
- ChumanHMoriATanakaHYamagamiCFujitaTAnalyses of the partition coefficient, log P, using ab initio MO parameter and accessible surface area of solute moleculesJ Pharm Sci200493112681269715389676
- KouXChanLWSteckelHHengPWSPhysico-chemical aspects of lactose for inhalationAdv Drug Deliv Rev201264322023222123598
- DemolyPHagedoornPde BoerAHFrijlinkHWThe clinical relevance of dry powder inhaler performance for drug deliveryRespir Med201410881195120324929253
- TakiMMarriottCZengX-MMartinGPAerodynamic deposition of combination dry powder inhaler formulations in vitro: a comparison of three impactorsInt J Pharm20103881–2405120026261
- SteckelHMüllerBIn vitro evaluation of dry powder inhalers II: influence of carrier particle size and concentration on in vitro depositionInt J Pharm19971543137
- HassanMSLauRInhalation performance of pollen-shape carrier in dry powder formulation with different drug mixing ratios: comparison with lactose carrierInt J Pharm20103861–261419922775
- CoatesMSChanH-KFletcherDFRaperJAInfluence of air flow on the performance of a dry powder inhaler using computational and experimental analysesPharm Res20052291445145316132356
- HoeSTrainiDChanH-KYoungPMThe contribution of different formulation components on the aerosol charge in carrier-based dry powder inhaler systemsPharm Res20102771325133620354768