7,963
Views
3
CrossRef citations to date
0
Altmetric
Reviews

Machine learning in the analysis of biomolecular simulations

&
Article: 2006080 | Received 19 Mar 2021, Accepted 09 Nov 2021, Published online: 10 Jan 2022

References

  • PDB Statistics: Overall growth of released structures per year. Nov 30, 2020. https://www.rcsb.org/stats/growth/growth-released-structures.
  • Hollingsworth SA, Dror RO. Molecular dynamics simulation for all. Neuron. 2018;99:1129–31.
  • Enkavi G, Javanainen M, Kulig W, et al. Multiscale simulations of biological membranes: the challenge to understand biological phenomena in a living substance. Chem Rev. 2019;119:5607–5774.
  • Shirts M, Pande VS. Screen savers of the world unite. Science. 2000;290:1903–1904.
  • Shaw DE, Deneroff MM, Dror RO, et al. Anton, a special-purpose machine for molecular dynamics simulation. Commun ACM. 2008;51:91–97.
  • Shaw DE, Grossman JP, Bank JA, et al. Anton 2: raising the bar for performance and programmability in a special-purpose molecular dynamics supercomputer. SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 41–53 (2014). New Orleans, USA. doi: 10.1109/SC.2014.9.
  • Shaw DE. Millisecond-long molecular dynamics simulations of proteins on a special-purpose machine. Biophys J. 2013;104:45a.
  • Lane TJ, Shukla D, Beauchamp KA, et al. To milliseconds and beyond: challenges in the simulation of protein folding. Curr Opin Struct Biol. 2013;23:58–65.
  • Chandler DE, Strümpfer J, Sener M, et al. Light harvesting by lamellar chromatophores in Rhodospirillum photometricum. Biophys J. 2014;106:2503–2510.
  • Dror RO, Dirks RM, Grossman JP, et al. Biomolecular simulation: a computational microscope for molecular biology. Ann Rev Biophys. 2012;41:429–452.
  • Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction, Second Edition. Springer Series in Statistics; New York, NY. 2009. DOI:10.1007/978-0-387-84858-7.
  • Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589.
  • Hegedűs T, Geisler M, Lukács G, et al. AlphaFold2 transmembrane protein structure prediction shines. bioRxiv. 2021 2021.08.21;457196. DOI:10.1101/2021.08.21.457196.
  • Haiech J, Koscielniak T, Grassy G. Use of TSAR as a new tool to analyze the molecular dynamics trajectories of proteins. J Mol Graph. 1995;13:46–48.
  • Gordon HL, Somorjai RL. Fuzzy cluster analysis of molecular dynamics trajectories. Proteins. 1992;14:249–264.
  • Troyer JM, Cohen FE. Protein conformational landscapes: energy minimization and clustering of a long molecular dynamics trajectory. Proteins. 1995;23:97–110.
  • Karpen ME, Tobias DJ, Brooks III,CL. Statistical clustering techniques for the analysis of long molecular dynamics trajectories: analysis of 2.2-ns trajectories of YPGDV. Biochemistry. 1993;32:412–420.
  • Torda AE, van Gunsteren WF. A lgorithms for clustering molecular dynamics configurations. J Comput Chem. 1994;15:1331–1340.
  • Bellman R. Dynamic programming. Science. 1966;153:34–37.
  • McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys. 1943;5:115–133.
  • Lecun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444.
  • Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philos Trans R Soc A Math Phys Eng Sci. 2016;374:20150202.
  • Stein SAM, Loccisano AE, Firestine SM, et al. Principal components analysis: a review of its application on molecular dynamics data. Annu Rep Comput Chem. 2006;2:233–261.
  • Lange OF, Grubmüller H. Can principal components yield a dimension reduced description of protein dynamics on long time scales? J Phys Chem B. 2006;110:22842–22852.
  • Sittel F, Jain A, Stock G. Principal component analysis of molecular dynamics: on the use of cartesian vs. internal coordinates. J Chem Phys. 2014;141:014111.
  • Amadei A, Linssen ABM, de Groot BL, et al. An efficient method for sampling the essential subspace of proteins. J Biomol Struct Dyn. 1996;13:615–625.
  • Amadei A, Linssen ABM, Berendsen HJC. Essential dynamics of proteins. Proteins. 1993;17:412–425.
  • Daidone I, Amadei A. Essential dynamics: foundation and applications. Wiley Interdiscip Rev Comput Mol Sci. 2012;2:762–770.
  • Lange OF, Schäfer LV, Grubmüller H. Flooding in GROMACS: accelerated barrier crossings in molecular dynamics. J Comput Chem. 2006;27:1693–1702.
  • Branduardi D, Bussi G, Parrinello M. Metadynamics with adaptive Gaussians. J Chem Theory Comput. 2012;8:2247–2254.
  • Spiwok V, Lipovová P, Králová B. Metadynamics in essential coordinates: free energy simulation of conformational changes. J Phys Chem B. 2007;111:3073–3076.
  • Yang YI, Shao Q, Zhang J, et al. Enhanced sampling in molecular dynamics. J Chem Phys. 2019;151:070902.
  • Altis A, Nguyen PH, Hegger R, et al. Dihedral angle principal component analysis of molecular dynamics simulations. J Chem Phys. 2007;126:244111.
  • Sittel F, Filk T, Stock G. Principal component analysis on a torus: theory and application to protein dynamics. J Chem Phys. 2017;147:244101.
  • Wolf A, Kirschner KN. Principal component and clustering analysis on molecular dynamics data of the ribosomal L11·23S subdomain. J Mol Model. 2013;19:539–549.
  • Mu Y, Nguyen PH, Stock G. Energy landscape of a small peptide revealed by dihedral angle principal component analysis. Proteins. 2005;58:45–52.
  • Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–2830.
  • Abraham MJ, Murtola T, Schulz R, et al. Gromacs: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX. 2015;12:19–25.
  • Molgedey L, Schuster HG. Separation of a mixture of independent signals using time delayed correlations. Phys Rev Lett. 1994;72:3634–3637.
  • Pérez-Hernández G, Noé F. Hierarchical time-lagged independent component analysis: computing slow modes and reaction coordinates for large molecular systems. J Chem Theory Comput. 2016;12:6118–6129.
  • Noé F, Clementi C. Kinetic distance and kinetic maps from molecular dynamics simulation. J Chem Theory Comput. 2015;11:5002–5011.
  • Sultan MM, Pande VS. TICA-metadynamics: accelerating metadynamics by using kinetically selected collective variables. J Chem Theory Comput. 2017;13:2440–2447.
  • Scherer MK, Trendelkamp-Schroer B, Paul F, et al. PyEMMA 2: a software package for estimation, validation, and analysis of Markov models. J. Chem. Theory Comput. 2015;11:5525–5542.
  • Hofmann T, Schölkopf B, Smola AJ. Kernel methods in machine learning. 1. Ann Stat. 2008;36:1171–1220.
  • Schwantes CR, Pande VS. Modeling molecular kinetics with tICA and the kernel trick. J Chem Theory Comput. 2015;11:600–608.
  • Antoniou D, Schwartz SD. Toward identification of the reaction coordinate directly from the transition state ensemble using the kernel PCA method. J Phys Chem B. 2011;115:2465–2469.
  • Coifman RR, Lafon S, Lee AB, et al. Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc. Natl. Acad. Sci. 2005;102:7426–7431.
  • Tenenbaum JB, De Silva V, Langford JC. A global geometric framework for nonlinear dimensionality reduction. Science. 2000;290:2319–2323.
  • Ferguson AL, Panagiotopoulos AZ, Debenedetti PG, et al. Systematic determination of order parameters for chain dynamics using diffusion maps. Proc Natl Acad Sci. 2010;107:13597–13602.
  • Rohrdanz MA, Zheng W, Maggioni M, et al. Determination of reaction coordinates via locally scaled diffusion map. J Chem Phys. 2011;134:124116.
  • Kim SB, Dsilva CJ, Kevrekidis IG, et al. Systematic characterization of protein folding pathways using diffusion maps: application to Trp-cage miniprotein. J Chem Phys. 2015;142:085101.
  • Das P, Moll M, Stamati H, et al. Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction. Proc Natl Acad Sci. 2006;103:9885–9890.
  • Spiwok V, Králová B. Metadynamics in the conformational space nonlinearly dimensionally reduced by Isomap. The Journal of Chemical Physics. 2011;135:224504.
  • Hashemian B, Millán D, Arroyo M. Modeling and enhanced sampling of molecular systems with smooth and nonlinear data-driven collective variables. J Chem Phys. 2013;139:214101.
  • Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006;313:504–507.
  • Wehmeyer C, Noé F. Time-lagged autoencoders: deep learning of slow collective variables for molecular kinetics. J Chem Phys. 2018;148:241703.
  • Chen W, Ferguson AL. Molecular enhanced sampling with autoencoders: on-the-fly collective variable discovery and accelerated free energy landscape exploration. J Comput Chem. 2018;39:2079–2102.
  • Ribeiro JML, Bravo P, Wang Y, et al. Reweighted autoencoded variational Bayes for enhanced sampling (RAVE). J Chem Phys. 2018;149:072301.
  • Lamim Ribeiro JM, Tiwary P. Toward achieving efficient and accurate ligand-protein unbinding with deep learning and molecular dynamics through RAVE. J Chem Theory Comput. 2019;15:708–719.
  • Varolgüne YB, Bereau T, Rudzinski JF. Interpretable embeddings from molecular simulations using Gaussian mixture variational autoencoders. Mach Learn: Sci Technol. 2020;1:015012.
  • Wang W, Gómez-Bombarelli R. Coarse-graining auto-encoders for molecular dynamics. PLoS Comput Biol. 2018;15:e1007033.
  • Degiacomi MT. Coupling molecular dynamics and deep learning to mine protein conformational space. Structure. 2019;27:1034–1040.e3.
  • Hub JS, De Groot BL. Detection of functional modes in protein dynamics. PLoS Comput Biol. 2009;5:1000480.
  • Krivobokova T, Briones R, Hub JS, et al. Partial least-squares functional mode analysis: application to the membrane proteins AQP1, Aqy1, and CLC-ec1. Biophys J. 2012;103:786–796.
  • Kaptan S, Assentoft M, Schneider HP, et al. H95 is a pH-dependent gate in Aquaporin 4. Structure. 2015;23:2309–2318.
  • Saboe PO, Rapisarda C, Kaptan S, et al. Role of pore-lining residues in defining the rate of water conduction by Aquaporin-0. Biophys. J. 2017;112:953–965.
  • Izvekov S, Voth GA. A multiscale coarse-graining method for biomolecular systems. J Phys Chem B. 2005;109:2469–2473.
  • Izvekov S, Voth GA. Effective force field for liquid hydrogen fluoride from ab initio molecular dynamics simulation using the force-matching method. J Phys Chem B. 2005;109:6573–6586.
  • Scherer C, Scheid R, Andrienko D, et al. Kernel-based machine learning for efficient simulations of molecular liquids. J Chem Theory Comput. 2020;16:3194–3204.
  • John ST, Csányi G. Many-body coarse-grained interactions using Gaussian approximation potentials. J Phys Chem B. 2017;121:10934–10949.
  • Wang J, Olsson S, Wehmeyer C, et al. Machine learning of coarse-grained molecular dynamics force fields. ACS Cent. Sci. 2019;5:755–767.
  • Murtola T, Falck E, Karttunen M, et al. Coarse-grained model for phospholipid/cholesterol bilayer employing inverse Monte Carlo with thermodynamic constraints. J Chem Phys. 2007;126:075101.
  • Mehmood T, Ahmed B. The diversity in the applications of partial least squares: an overview. J Chemom. 2016;30:4–17.
  • Peters JH, de Groot BL, Levitt M. Ubiquitin dynamics in complexes reveal molecular recognition mechanisms beyond induced fit and conformational selection. PLoS Comput. Biol. 2012;8:e1002704.
  • Sakuraba S, Kono H. Spotting the difference in molecular dynamics simulations of biomolecules. J Chem Phys. 2016;145:074116.
  • Sultan MM, Kiss G, Shukla D, et al. Automatic selection of order parameters in the analysis of large scale molecular dynamics simulations. J Chem Theory Comput. 2014;10:5217–5223.
  • Classification and Regression Trees. Breiman L, Friedman J, Stone CJ, et al. https://books.google.fi/books?id=JwQx-WOmSyQC (Taylor & Francis, 1984).
  • Breiman L. Random forests. Mach Learn. 2001;45:5–32.
  • Šikić M, Tomić S, Vlahoviček K. Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. PLoS Comput Biol. 2009;5:e1000278.
  • Riniker S. Molecular dynamics fingerprints (MDFP): machine learning from MD data to predict free-energy differences. J Chem Inf Model. 2017;57:726–741.
  • Aghaaminiha M, Ghanadian SA, Ahmadi E, et al. A machine learning approach to estimation of phase diagrams for three-component lipid mixtures. Biochim Biophys Acta - Biomembr. 2020;1862:183350.
  • Wang F, Shen L, Zhou H, et al. Machine learning classification model for functional binding modes of TEM-1 β-lactamase. Front. Mol. Biosci. 2019;6:47.
  • Deisenroth MP, Faisal AA, Ong CS. Mathematics for machine learning (Cambridge University Press, 2020).
  • Prinz JH, Wu H, Sarich M, et al. Markov models of molecular kinetics: generation and validation. J. Chem. Phys. 2011;134:174105.
  • Wolf A, Kirschner KN. Principal component and clustering analysis on molecular dynamics data of the ribosomal L11·23S subdomain. J Mol Model. 2013;19:539–549.
  • Sgourakis NG, Merced-Serrano M, Boutsidis C, et al. Atomic-level characterization of the ensemble of the Aβ(1-42) monomer in water using unbiased molecular dynamics simulations and spectral algorithms. J Mol Biol. 2011;405:570–583.
  • Abramyan TM, Snyder JA, Thyparambil AA, et al. Cluster analysis of molecular simulation trajectories for systems where both conformation and orientation of the sampled states are important. J Comput Chem. 2016;37:1973–1982.
  • Bremer PL, De Boer D, Alvarado W, et al. Overcoming the heuristic nature of k-means clustering: identification and characterization of binding modes from simulations of molecular recognition complexes. J Chem Inf Model. 2020;60:3081–3092.
  • Ester M, Kriegel H-P, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD96 Proceedings 226–231 (1996). Portland, Oregon, USA. https://dl.acm.org/doi/10.5555/3001460.3001507
  • Wang K, Chodera JD, Yang Y, et al. Identifying ligand binding sites and poses using GPU-accelerated Hamiltonian replica exchange molecular dynamics. J Comput Aided Mol Des. 2013;27:989–1007.
  • Galindo-Murillo R, Cheatham TE. DNA binding dynamics and energetics of cobalt, nickel, and copper metallopeptides. ChemMedChem. 2014;9:1252–1259.
  • Kim M, Choi SH, Kim J, et al. Density-based clustering of small peptide conformations sampled from a molecular dynamics simulation. Journal of Chemical Information and Modeling. 2009;49:2528–2536.
  • Campello RJGB, Moulavi D, Zimek A, et al. Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans Knowl Discov Data. 2015;10:1–51.
  • Melvin RL, Xiao J, Godwin RC, et al. Visualizing correlated motion with HDBSCAN clustering. Protein Sci. 2018;27:62.
  • Maragakis P, Van Der Vaart A, Karplus M. Gaussian-mixture umbrella sampling. J Phys Chem B. 2009;113:4664–4673.
  • Westerlund AM, Delemotte L. InfleCS: clustering free energy landscapes with Gaussian mixtures. J Chem Theory Comput. 2019;15:6752–6759.
  • Debnath J, Parrinello M. Gaussian mixture-based enhanced sampling for statics and dynamics. J Phys Chem Lett. 2020;11:5076–5080.
  • Plante A, Shore DM, Morra G, et al. A machine learning approach for the discovery of ligand-specific functional mechanisms of GPCRs. Molecules. 2019;24:2097.
  • Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: visualising image classification models and saliency maps. In: Workshop at International Conference on Learning Representations (2014). Banff, Canada. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.746.3713