832
Views
292
CrossRef citations to date
0
Altmetric
Review Article

Probable networks and plausible predictions — a review of practical Bayesian methods for supervised neural networks

Pages 469-505 | Received 09 Feb 1995, Published online: 09 Jul 2009

References

  • Abu-Mostafa Y S. The Vapnik–Chervonenkis dimension: information versus complexity in learning. Neural Comput. 1990; 1(3)312–7
  • Berger J. Statistical Decision Theory and Bayesian Analysis. Springer, Berlin 1985
  • Bishop C M. Exact calculation of the Hessian matrix for the multilayer perceptron. Neural Comput. 1992; 4(4)494–501
  • Box G E P, Tiao G C. Bayesian Inference in Statistical Analysis. Addison-Wesley, Reading, MA 1973
  • Breiman L. Stacked regressions. Technical Report 367. Department of Statistics, University of California, Berkeley 1992
  • Bretthorst G. Bayesian Spectrum Analysis and Parameter Estimation. Springer, Berlin 1988
  • Bridle J S. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. Neuro-computing: Algorithms, Architectures and Applications, F Fougelman-Soulie, J Hérault. Springer, Berlin 1989
  • Buntine W, Weigend A. Bayesian back-propagation. Complex Systems 1991; 5: 603–43
  • Copas J B. Regression, prediction and shrinkage (with discussion). J. R. Statist. Soc. B 1983; 45(3)311–54
  • Cox R. Probability, frequency, and reasonable expectation. Am. J. Phys. 1946; 14: 1–13
  • Gull S F. Bayesian inductive inference and maximum entropy. Maximum Entropy and Bayesian Methods in Science and Engineering, Vol. 1: Foundations, G Erickson, C Smith. Kluwer, Dordrecht 1988; 53–74
  • Gull S F. Developments in maximum entropy data analysis. Maximum Entropy and Bayesian Methods Cambridge 1988, J Skilling. Kluwer, Dordrecht 1989; 53–71
  • Guyon I, Vapnik V N, Boser B E, Bottou L Y, Solla S A. Structural risk minimization for character recognition. Advances in Neural Information Processing Systems 4, J E Moody, S J Hanson, R P Lippmann. Morgan Kaufmann, San Mateo, CA 1992; 471–9
  • Hanson R, Stutz J, Cheeseman P. Bayesian classification with correlation and inheritance. Proc. 12th Int. Joint Conf. on Artifical Intelligence, SydneyAustralia. Morgan Kaufmann, San Mateo, CA 1991; 2: 692–8
  • Hassibi B, Stork D G. Second order derivatives for network pruning: Optimal brain surgeon. Advances in Neural Information Processing Systems 5, C L Giles, S J Hanson, J D Cowan. Morgan Kaufmann, San Mateo, CA 1993; 164–71
  • Hinton G E, Sejnowski T J. Learning and relearning in Boltzmann machines. Parallel Distributed Processing, D E Rumelhart, J E McClelland. MIT Press, Cambridge, MA 1986; 282–317
  • Hinton G E, van Camp D. Keeping neural networks simple by minimizing the description length of the weights. Proc. 6th Ann. Workshop on Computer Learning Theory. ACM Press, New York 1993; 5–13
  • Hinton G E, Zemel R S. Autoencoders, minimum description length and Helmholtz free energy. Advances in Neural Information Processing Systems 6, J D Cowan, G Tesauro, J Alspector. Morgan Kaufmann, San Mateo, CA 1994
  • Jaynes E T. Bayesian intervals versus confidence intervals. E T Jaynes: Papers on Probability, Statistics and Statistical Physics, R D Rosencrantz. Kluwer, Dordrecht 1983; 151
  • Jeffreys H. Theory of Probability. Oxford University Press, Oxford 1939
  • LeCun Y, Denker J, Solla S A. Optimal brain damage. Advances in Neural Information Processing Systems 2, D Touretzky. Morgan Kaufmann, San Mateo, CA 1990; 598–605
  • Loredo T J (1990) From Laplace to supernova SN 1987A: Bayesian inference in astrophysics. Maximum Entropy and Bayesian Methods, DartmouthUSA, 1989, P Fougere. Kluwer, Dordrecht, 81–142
  • MacKay D J C. Bayesian methods for adaptive models. California Institute of Technology. 1991, PhD Thesis
  • MacKay D J C. Bayesian interpolation. Neural Comput. 1992a; 4(3)415–47
  • MacKay D J C. A practical Bayesian framework for backpropagation networks. Neural Comput. 1992b; 4(3)448–72
  • MacKay D J C. The evidence framework applied to classification networks. Neural Comput. 1992c; 4(5)698–714
  • MacKay D J C. Bayesian non-linear modelling for the prediction competition. ASHRAE Trans. Vol 100, part 2. ASHRAE, Atlanta, GA 1994
  • MacKay D J C. Bayesian neural networks and density networks. Nucl. Instrum. Methods Phys. Res. A 1995a, in press
  • MacKay D J C (1995b) Hyperparameters: Optimize, or integrate out?. Maximum Entropy and Bayesian Methods, Santa Barbara, CA, 1993, G Heidbreder. Kluwer, Dordrecht
  • Moody J E. The effective number of parameters: An analysis of generalization and regularization in nonlinear learning systems. Advances in Neural Information Processing Systems 4, J E Moody, S J Hanson, R P Lippmann. Morgan Kaufmann, San Mateo, CA 1992; 847–54
  • Neal R M. Bayesian learning via stochastic dynamics. Advances in Neural Information Processing Systems 5, C L Giles, S J Hanson, J D Cowan. Morgan Kaufmann, San Mateo, CA 1993; 475–82
  • Neal R M. Bayesian learning for neural networks. Department of Computer Science, University of Toronto. 1995, PhD Thesis
  • Patrick J D, Wallace C S. Stone circle geometries: an information theory approach. Archaeoastronomy in the Old World, D C Heggie. Cambridge University Press, Cambridge 1982
  • Pearlmutter B A. Fast exact multiplication by the Hessian. Neural Comput. 1994; 6(1)147–60
  • Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. Nature 1986; 323: 533–6
  • Skilling J. Bayesian numerical analysis. Physics and Probability, W T Grandy, Jr, P Milonni. Cambridge University Press, Cambridge 1993
  • Skilling J, Robinson D R T, Gull S F (1991) Probabilistic displays. Maximum Entropy and Bayesian Methods, Laramie, 1990, W T Grandy, L Schick. Kluwer, Dordrecht, 365–8
  • Spiegelhalter D J, Lauritzen S L. Sequential updating of conditional probabilities on directed graphical structures. Networks 1990; 20: 579–605
  • Thodberg H H. Ace of Bayes: application of neural networks with pruning. Technical Report 1132 E. Danish Meat Research Institute. 1993
  • Wallace C, Boulton D. An information measure for classification. Comput. J. 1968; 11(2)185–94
  • Wallace C S, Freeman P R. Estimation and inference by compact coding. J. R. Statist. Soc. B 1987; 49(3)240–65
  • Weir N (1991) Applications of maximum entropy techniques to HST data. Proc. ESO/ST—ECF Data Analysis Workshop, Garching, April, 1991, P J Grosbol, R H Warmels. 115–29, (European Southern Observatory/Space Telescope—European Coordinating Facility)
  • Witten I H, Neal R M, Cleary J G. Arithmetic coding for data compression. Commun. ACM 1987; 30(6)520–40
  • Wolpert D H. On the use of evidence in neural networks. Advances in Neural Information Processing Systems 5, C L Giles, S J Hanson, J D Cowan. Morgan Kaufmann, San Mateo, CA 1993; 539–46

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.