622
Views
8
CrossRef citations to date
0
Altmetric
Review Article

Predictive modelling of the LD50 activities of coumarin derivatives using neural statistical approaches: Electronic descriptor-based DFTFootnote

, , , , &
Pages 451-461 | Received 16 Feb 2015, Accepted 10 Jun 2015, Published online: 16 Apr 2018

Abstract

A study of structure–activity relationship (QSAR) was performed on a set of 30 coumarin-based molecules. This study was performed using multiple linear regressions (MLRs) and an artificial neural network (ANN). The predicted values of the antioxidant activities of coumarins were in good agreement with the experimental results. Several statistical criteria, such as the mean square error (MSE) and the correlation coefficient (R), were studied to evaluate the developed models. The best results were obtained with a network architecture [8-4-1] (R = 0.908, MSE = 0.032), activation functions (tansig–purelin) and the Levenberg–Marquardt learning algorithm. The model proposed in this study consists of large electronic descriptors that are used to describe these molecules. The results suggested that the proposed combination of calculated parameters may be useful for predicting the antioxidant activities of coumarin derivatives.

1 Introduction

Medicinal plants are both finished products that are destined for consumption and raw material that are used for the production of active substances; they are a source of considerable value for many people and have many therapeutic qualities that have been demonstrated by experience. Coumarins are an important class of these natural products and have a characteristic odour similar to that of freshly mown hay. Coumarins are derived from the metabolism of phenylalanine via cinnamic acid, which can be found in all parts of the plant including the fruits and in the essential oils of seeds [Citation1].

Several studies shown that the coumarins are biologically active molecules that express varied activities. Coumarins can prevent the peroxidation of membrane lipids and capture hydroxyl radicals, superoxides and peroxyls [Citation2]. Coumarins have been shown to be effective in blocking cancer chemically induced by ultraviolet radiation (i.e., anticancer activity). Degree and his team have shown that coumarins paralyze the growth of Saccharomyces cerevisiae. Coumarins also have other biological activities, including anti-platelet aggregation [Citation3], anti-inflammatory [Citation4], anticoagulant [Citation5], antitumor [Citation6], diuretic [Citation7], antimicrobial [Citation8], antiviral and analgesic effects [Citation9].

Antioxidants are now manufactured as essential candidates for fighting against several diseases [Citation10], and much current research has converged on the design and development of new chemical entities with potential antioxidant activities.

Coumarins are a natural source of essential antioxidants; these molecules exhibit activities against free radicals in human tissue via a variety of mechanisms that primarily rely on their structural equivalence with flavonoids and benzophenones [Citation11,Citation12].

The quantitative structure–activity relationship (QSAR) technique [Citation13] has been widely used for years to provide quantitative analyses of the relationships of the structures and biological activities of compounds.

Almost all QSAR techniques are based on the use of molecular descriptors, which are numerical series that codify useful chemical information and enable the identification of correlations between statistical and biological properties [Citation14,Citation15]. Different QSAR studies from different research groups have identified important structural features that are responsible for activities [Citation16,Citation17] and aided the development of toxicity models for diverse chemicals [Citation18Citation21].

Applications of ANNs in the QSAR analyses of the biological potentials (cytotoxicity, binding affinity, enzyme inhibition, etc.) of different compounds have been presented in previous papers [Citation22Citation24]. In those papers, the usefulness of the ANN methodology in the QSAR modelling of the complex input–output relationship has been confirmed. These complex relationships are usually relevant to the prediction of biological activities that depend on many factors (e.g., stereochemistry, lipophilicity, functional groups, the type of organism/cell).

In the present work, we relied on a series of 30 coumarin derivatives studied by Andre Kimura et al. with the aim of developing a predictive QSAR model for the antioxidant activities of the coumarin molecules using calculation methods based on quantum chemistry, molecular structure, molecular geometry, the nature of molecular orbitals and molecular properties.

The more relevant molecular properties were calculated. These properties included the highest occupied molecular orbital energy (EHOMO), the lowest unoccupied molecular orbital energy (ELUMO), the energy gap, the dipole moment (μ), the total energy (ET), the activation energy (Ea), the absorption maximum (λmax) and the factor of oscillation (f(SO)).

We developed a neural model for the prediction of changes in antioxidant activity based on electronic variables, and we show that the best performing model in terms of such predictions is a model that employs transfer functions, the tansig function in the hidden layer and the purelin function in the output layer while using a LM learning algorithm and a PMC-type architecture [8-4-1].

2 Material and methods

2.1 Materials

Andre Kimura Okamoto et al. measured the inhibitory activities (LD50) of series of 30 coumarin molecules against quinolin mutagenicity in Salmonella typhimurium. The following figure illustrates the chemical structures of the studied compounds and their corresponding experimental LD50 activities ().

Fig. 1 Chemicals structures of the studied coumarins.

The experimental toxicities of the studied compounds have been reported in recent work. The range of antioxidant activities varied from 6.07 to 8.03.

2.2 Methods

2.2.1 Theoretical calculations for the molecular modelling

Quantum chemistry finds its place among today's scientific and technological developments as a powerful method for searching for what is supported by experience, and the development of computer technology will only support this trend. GaussView (03) is one of a very large number of molecular modelling software products used in both research and industry.

Density functional theory (DFT) methods were used in this study. These methods have become very popular in recent years because they can achieve precision levels similar to those of other methods in less time and at less cost from the computational perspective. In agreement with the DFT results, the energy of the fundamental state of a polyelectronic system can be expressed through the total electronic density; in fact, the use of the electronic density rather than the wave function for calculating the energy constitutes the fundamental basis of DFT [Citation25,Citation26] and involves the use of the B3LYP functional [Citation27,Citation28] and a 6-31G* basis set. The B3LYP is a version of the DFT method that uses Becke's three-parameter functional (B3) and includes a mixture of HF and DFT exchange terms that are associated with the gradient-corrected correlation functional of Lee, Yang and Parr (LYP). The geometries of all of the species under investigation were determined by optimizing all of the geometrical variables without any symmetry constraints.

2.2.2 Multiple linear regressions

The multiple linear regression statistical technique is used to study the relation between one dependent variable and several independent variables. Multiple linear regression is a mathematical technique that minimizes the differences between actual and predicted values. The multiple linear regression model (MLR) [Citation29Citation31] was generated using the software SYSTAT, version 12 [Citation32] to predict the antioxidant activity LD50s. Multiple regression was also used to select the descriptors for use as the input parameters of a back-propagation artificial neural network (ANN).

2.2.3 Artificial Neural Networks (ANNs)

The ANN analysis was performed by applying the Neural Fitting tool (nntool) toolbox of MATLAB software (v 2014a) to a data set of the antioxidant activities of coumarin derivatives [Citation37].

Artificial neural networks are non-linear empirical models [Citation33,Citation34] that are rarely used in the prediction of biological activities, while their applications are rapidly growing in many disciplines. ANNs are among the interesting alternatives to traditional statistics for data processing. In this work, we explain the key concepts of RNA and the multi-layer perceptron with a greater focus on the latter concept.

2.2.3.1 Architectures of neural networks

Typically, a neural network is defined by its architecture, and such architectures are characterized by the transfer function and how the interconnections between neurons are made. There are several transfer functions, and the selection of a transfer function depends on the problem being solved. Transfer functions are also based on the ease of their implementation and their derivation, which are involved in optimization algorithms.

In our case, the selected network was a multilayer network. This choice was based on the ease and speed of construction and the fact that our problem has a limited number of input variables [Citation35,Citation36].

2.2.3.2 Multilayer perceptron (PMC)

The PMC is a layer propagation network model (). The neurons are organized in layers, i.e., an input layer, an output layer and one or more intermediate layers, which are also called hidden layers.

Fig. 2 Multilayer perception [4-5-1].

Although in theory a PMC can have multiple layers, in practice, single hidden layers are sufficient [Citation38]. A PMC was established to select the transfer functions, to identify the relevant inputs and the number of neurons in the hidden layer and to select an algorithm and then optimize and test the network.

Transfer functions:

Neural networks are used for the approximation of non-linear models. Nonlinearity is introduced by the selected transfer function, particularly in the nodes of the hidden layer. The transfer of the output layer is a linear function. Although in theory any nonlinear function can be used, the functions that are typically selected are generally those that are easy to calculate and drift.

According to Dawson and Wilby [Citation39], the log-sigmoid transfer function (logsig) is the most used and is defined as follows:f(x)=11+exxBoundedbetween0and1

Among these functions, those that we use in the context of this work are primarily the linear transfer function (purelin), which is most frequently used in hydrological modelling, and the hyperbolic tangent sigmoid transfer function (tansig) ().

Fig. 3 Graph and symbol of purelin and tansig.

*Purelin (linear transfer function): the purelin is a neural transfer function. Transfer functions calculate a layer's output based on its net input.

*Tansig (hyperbolic tangent sigmoid transfer function): the tansig is a neural transfer function. Transfer functions calculate a layer's output based in its net input.

3 Results and discussion

In this study, we focused on a series of 30 coumarin derivatives to determine the quantitative relationships between the structures of these derivatives and the biological activity LD50 values. In this section, we employ the same approach that we have already used in previous works [Citation29,Citation31].

shows the values of the calculated parameters obtained from the optimized structures via DFT/B3LYP 6-31G (d) optimization.

Table 1 Values of the obtained parameters by DFT/B3LYP 6-31G (d) optimization of the Studied compounds.

3.1 Multiple linear regressions (MLR)

Many attempts have been made to develop a relationship with the indicator variable of the toxicity LD50, but the best relationship that we obtained with this method was the one that corresponded to a linear combination of several descriptors, i.e., the total energy ET, the energy EHOMO, the energy ELUMO, the activation energy Ea, the dipole moment μ, the absorption maximum λmax and the factor of oscillation f(SO).(1) LD50=19.5634.056×104×ET8.712×103×EHOMO+0.507×102×ELUMO+3.297×102×μ+4141.438×Ea+3.821×102×λmax+1.531×f(so)(1)

Fig. 4 Relationship between the estimated values of DL50, their predictions and their residues established by (MLR).

For our 30 compounds, the correlation between the experimental and calculated toxicities based on this model were quite significant () as indicated by the following statistical values:N=30R=0.637R2=0.406RMSE=0.408

illustrates the very regular distribution of toxicity values that depended on the experimental values.

3.2 Multiple nonlinear regression of the variable antioxidant activity (MNLR)

We also used nonlinear regression model technique to quantitatively improve the structure–activity relationships by accounting for several parameters. MNLR is the most commonly used tool for the study of multidimensional data. The resulting equation was:(2) LD50=23933.1092.812×103×ET+0.187×EHOMO9.337×ELUMO0.507×μ+4141.438×Ea+49.468×λmax1.981×f(so)6.201×107×ET20.39×EHOMO21.179×ELUMO2+0.509×gap2+3.624×102×μ2268.072×Ea23.827×102×λmax2+9.177×fso2(2)

The obtained parameters describing the electronic aspects of the studied molecules were as follows:N=30R=0.755R2=0.571RMSE=0.451

The LD50 value predicted by this model is somewhat similar to the observed value. displays a very regular distribution of the activity values based on the observed values.

Fig. 5 Relationship between the estimated values of DL50, their predictions and their residues established by (NMLR).

The coefficient of correlation obtained from Eq. (2) is quite interesting (0.571). To optimize the standard deviation of the error and complete our model, we employed artificial neural networks (ANNs) in the next section.

As a part of this conclusion, we can state that the toxicity values obtained by nonlinear regression were highly correlated with the toxicity results obtained with the MLR method.

3.3 Artificial neural networks: PMC type

Concerning the classification or prediction of the antioxidant activities of coumarins, the learning of the PMC occurs in a supervised manner; thus, the ranking variable or the variable to be predicted must be known. In the case of the estimation of antioxidant activities, the collections to be observed are those for which we have this information.

The determination of the type of architecture, i.e., a PMC-time neural network, raises the questions of the selection of the number of hidden layers, the number of hidden neurons, the number of iterations and the transfer functions. To answer these questions, we randomly divided our database into three parts: 70% for training, 15% for testing and 15% for validation.

3.4 Choice of the number of hidden layers

presents the calculations for the R and MSE values for one, two, three and four hidden layers.

Table 2 Performance of the system according to the number of hidden layers.

Increases in the number of hidden layers increased the load calculations without any increment in performance. Therefore, we ensured that the use of a single hidden layer was preferable for the PMC model type.

3.5 Choice of transfer functions and the number of iterations

In this study, we used Levenberg–Marquardt (LM) algorithm as the learning algorithm because is qualified for high performance.

In this case, we changed the number of neurons in the hidden layer and the pairs of transfer functions. The performance was evaluated via the mean squared error (MSE) and the correlation coefficient (R).

displays the observed performances for various combinations of torque transfer.

Table 3 Transfer functions torques according to their performance.

shows the variation in the mean squared error (MSE) according to the pair of transfer functions for the Levenberg–Marquardt algorithm (LM).

Fig. 6 MSE Variation with transfer for couples (LM) algorithm.

The results in bold in indicated that the torque transfer functions (i.e., tansig and purlin) produced a correlation coefficient of R = 0.908 and a mean square error of MSE = 2.93 × 10−2 with a network architecture [8-4-1]. With this configuration, we achieved better performance of the LM learning algorithm, and this performance was achieved after six iterations.

Based on these results, we state that the most powerful model for predicting the activity of the antioxidant coumarin was the model that used the tansig transfer function in the hidden layer and the purelin function in the output layer with an LM learning algorithm and a PMC configuration deviation [8-4-1] and contained three layers () as follows:

8 neurons of the grafted layer, which represent electronic independent variables;

4 neurons in the hidden layer; and

one neuron of the output layer that represents the antioxidant activity of the coumarin.

Fig. 7 The architecture of a PMC to 8 input variables, four neurons in the hidden layer and one neuron to the output layer.

The ANN-calculated activity models were developed using the properties of several studied compounds. The correlation between the ANN-calculated and experimental activity values were are very significant as indicated by the R and R2 values.N=30R=0.908R2=0.811RMSE=0.032

These values that indicate the relationship between the estimated LD50 values and their residues as established with artificial neural networks are illustrated in .

Fig. 8 Relationship between the estimated values of DI50, their predictions and their residues established by (ANN).

The obtained squared correlation coefficient R value was 0.908 for this data set of coumarins. This finding confirms that the artificial neural network results were optimal for building the quantitative structure-activity relationship model. Next, we investigated the best linear QSAR regression equations established in this study. Based on the results, a comparison of the qualities of the MLR and ANN models revealed that the ANN models exhibited substantially better predictive capabilities because the ANN approach provided better results than the MLR approach. The ANN was able to establish satisfactory relationships between the electronic descriptors and the activities of the studied compounds.

4 Conclusion

In this work, we applied QSAR regression to predict the activities of several antioxidant compounds that are based on coumarins.

The results revealed that the relationship between the antioxidant activities and the other electronic parameters of the molecules were not linear for the coumarins.

The Levenberg–Marquardt algorithm exhibited better performance in terms of statistical indicators and network architecture [8-4-1] when a non-linear activation function of the tansig type was used in the hidden layer and a linear activation function of the purelin type was used in the output layer. This configuration resulted in very good predictions of the antioxidant activities.

Comparisons of the key statistical terms, such as R and R2, of the different models that involved the use of different statistical tools and various electronic descriptors are illustrated in .

Table 4 The observed and calculated values of DL50 by different methods with their residues.

Acknowledgements

We are grateful to the Association Marocaine des Chimistes Théoriciens (AMCT) for its pertinent help concerning the programs.

Notes

Peer review under responsibility of Taibah University.

References

  • J.L.GuignardAbrégé de botanique1998MassonParis212
  • C.M.AndersonA.HallbergT.HogbergAdvances in the developpement of pharmaceutical antioxidant drugFood Chem.28199665180
  • R.J.OchockaD.RajzerH.KowalskiLamparczykDetermination of coumarins from Chrysanthemum segetum L. by capillary electrophoresisJ. Chromatogr. A7091995197202
  • G.TaguchiS.FujikawaT.YazawaR.KodairaN.HayashidaM.ShimosakaM.OkazakiScopoletin uptake from culture medium and accumulation in the vacuoles after conversion to scopolin in 2.4-D-treatred tobacco cellsPlant Sci.1512000153161
  • T.OjalaS.RamesP.HaansuH.VuorelaR.HiltunenK.HaahtelaP.VuerelaAntimicrobial activity of some coumarin containing herbal plants growing in FinlandJ. Enthopharmacol.732000299305
  • C.N.ChenM.S.WengC.WuJ.k.LinComparison of radical scavenging activity, cytotoxic effects and apoptosis induction in human melanosoma cellsFood Chem.122004175185
  • I.KhanM.V.KulkariM.GopalShahabuddinF M.S.Synthesis and biological evaluation of novel angulary fused polycyclic coumarinsBioorg. Med. Chem. Lett.15200535843587
  • B.ThatiA.NobleR.RowanS.B.CreavenM.Walshd.EganK.KavanaghMechanism of action of coumarin and silver coumarin complexes against the pathogenic yeast Candida albicansToxicol. In Vitro212007801808
  • T.StefanovaN.NikolovaA.MichailovaI.Mitovi.Iancovig.I.ZlabingerH.NeychevEnhanced resistance to Salmonella enterica sero var typhimurium infection in mice after coumarin treatmentMicrob. Infect.92007714
  • R.L.L.De CompadreA.K.DebnathA.J.ShustermanC.HanschLUMO energies and hydrophobicity as determinants of mutagenicity by nitroaromatic compounds in Salmonella typhimuriumEnviron. Mol. Mutagen.15119904455
  • J.S.FeltonM.G.KnizeF.T.HatchM.J.TangaM.E.ColvinHeterocyclic amine formation and the impact of structure on their mutagenicityCancer Lett.1431999127134
  • U.MaranM.KarelsonA.R.KatritzkyA comprehensive QSAR treatment of the genotoxicity of heteroaromatic and aromatic aminesQuant. Struct.–Act. Relatsh.1811999310
  • C.HanschR.M.MuirT.FujitaP.P.MaloneyF.GeigerM.StreichJ. Am. Chem. Soc.85196328172825
  • H.González-DíazS.VilarL.SantanaE.UriarteMedicinal chemistry and bioinformatics – current trends in drugs discovery with networks topological indicesCurr. Top. Med. Chem.710200710151029
  • R.ConcuG.PoddaF.M.UbeiraH.González-DíazReview of QSAR models for enzyme classes of drug targets: theoretical background and applications in parasites, hosts, and other organismsCurr. Pharm. Des.1624201027102723
  • A.SabljicQSAR models for estimating properties of persistent organic pollutants required in evaluation of their environmental fate and riskChemosphere432001363
  • A.SabljicH.GustenH.VerhaarJ.HermensQSAR modelling of soil sorption. Improvements and systematics of logKoc vs. logP correlationsChemosphere31199544894514
  • R.BenigniR.ZitoThe second national toxicology program comparative exercise on the prediction of rodent carcinogenicity: definitive resultsMutat. Res.56620044963
  • D.ZakaryaE.M.LarfaouiA.BoulaamailM.TollabiT.LakhlifiQSARs for a series of inhibitory anilidsChemosphere3613199828092818
  • M.ElhallaouiM.ElasriF.OuazzaniA.MechaqraneT.LakhlifiQuantitative structure–activity relationships of noncompetitive antagonists of the NMDA receptor: a study of a series of MK801 derivative molecules using statistical methods and neural networkInt. J. Mol. Sci.42003249262
  • G.JingZ.ZhouJ.ZhuoQuantitative structure–activity relationship (QSAR) study of toxicity of quaternary ammonium compounds on Chlorella pyrenoidosa and Scenedesmus quadricaudaChemosphere8620127682
  • H.González-DíazD.M.Herrera-IbatáA.Duardo-SánchezC.R.MunteanuR.A.Orbegozo-MedinaA.PazosANN multiscale model of anti-HIV drugs activity vs AIDS prevalence in the US at county level based on information indices of molecular graphs and social networksJ. Chem. Inf. Model.5432014744755
  • H.González-DíazS.ArrasateN.SotomayorE.LeteC.R.MunteanuA.PazosL.Besada-PortoJ.M.RusoMIANN models in medicinal, physical and organic chemistryCurr. Top. Med. Chem.1352013619641
  • E.Tenorio-BorrotoC.G.Peñuelas RivasJ.C.Vásquez ChagoyánN.CastañedoF.J.Prado-PradoX.García-MeraH.González-DíazANN multiplexing model of drugs effect on macrophages; theoretical and flow cytometry study on the cytotoxicity of the anti-microbial drug G1 in spleenBioorg. Med. Chem.2020201261816194
  • C.AdamoV.BaroneChem. Phys. Lett.3302000152160
  • M.J.Frischet al.Gaussian 03, Revision, B., 012003Gaussian, Inc.Pittsburgh, PA
  • A.D.BeckeJ. Chem. Phys.9819931372
  • C.LeeW.YangR.G.ParrPhys. Rev. B371988785789
  • R.HmamouchiA.I.TaghkiM.LarifA.AdadA.AbdellaouiM.BouachrineT.LakhlifiJ. Chem. Pharm. Res.592013198202
  • R.HmamouchiM.LarifA.AdadM.BouachrineT.LakhlifiInt. J. Adv. Res. Comput. Sci. Softw. Eng.422014241251
  • R.HmamouchiM.LarifA.AdadM.BouachrineT.LakhlifiJ. Comput. Methods Mol. Des.4320146171
  • STATITCF Software, Technical Institute of Cereals and Fodder, Paris, France1987
  • D.MantzarisG.AnastassopoulosIntelligent prediction of vesicoureteral reflux diseaseWSEAS Trans. Syst.4200514401449
  • S.BabooI.ShereefAn efficient weather forecasting system using artificial neural networkInt. J. Environ. Sci.12010321326
  • I.ManssouriM.ManssouriB.El KihelFault detection by K-NN algorithm and MLP neuronal networks in distillation columnJ. Inf. Intell. Knowl.vol. 320117275
  • R.NayakL.JainB.TingArtificial neural networks in biomedical engineering: a reviewProc. 1st Asian-Pacific Congr. Comput. Mech.2001887892
  • H.DemuthM.HuganM.BealNeural Network Toolbox. For Use with MATHLAB, User's Guide. Version 92011
  • K.HornikApproximation capabilities of multilayer feedforward networksNeural Netw.421991251257
  • C.W.DawsonR.L.WilbyA comparison of artificial neural networks used for rainfall runoff modellingHydrol. Earth Syst. Sci.32000529540