References
- Apostel , L. , Mandelbrot , B. and Morf , A. 1957 . Logique, langage, et théorie de l'information. , Paris : Presses Universitaires de France .
- Bachenko , Joan and Gale , W.A. 1993 . A corpus‐based model of interstress timing and structure . Journal of the Acoustic Society of America , 94 : 1797
- Box , G.E.P. and Tiao , G.C. 1973 . Bayesian Inference in Statistical Analysis. , London : Addison‐Wesley .
- Chao , Y.R. 1968 . A Grammar of Spoken Chinese. , Berkeley and Los Angeles : University of California Press .
- Chitashvili , R.J. and Baayen , R.H. 1993 . “ Word frequency distributions ” . In Quantitative Text Analysis (Quantitative Linguistics , Edited by: Hrebíček , L. and Altmann , G. vol. 52 , 54 – 135 . Trier : Wissenschaftlicher Verlag .
- Church , K.W. 23–26 May 1989 . “ A stochastic parts program and noun phrase parser for unrestricted text ” . In IEEE 1989 International Conference on Acoustics, Speech, and Signal Processing 23–26 May , Glasgow
- Church , K.W. and Gale , W.A. 1991 . A comparison of the enhanced Good‐Turing and deleted estimation methods for estimating probabilities of English bi‐grams. . Computer Speech and Language , 5 : 19 – 54 .
- Church , K.W. , Gale , W.A. and Kruskal , J.B. 1991 . “ The Good‐Turing theorem. ” . In A comparison of the enhanced Good‐Turing and deleted estimation methods for estimating probabilities of English bigrams. , Computer Speech and Language, 5 Edited by: Church , K.W. and Gale , W.A. 19 – 54 . Appendix A.
- Efron , B. and Thisted , R. 1976 . Estimating the number of unseen species: How many words did Shakespeare know? . Biometrika , 63 : 435 – 447 .
- Fienberg , S.E. and Holland , P.W. 1972 . On the choice of flattening constants for estimating multinomial probabilities. . Journal of Multivariate Analysis , 2 : 127 – 134 .
- Fisher , R.A. 1922 . On the mathematical foundations of theoretical statistics. . Philosophical Transactions of the Royal Society of London, A , 222 : 309 – 368 .
- Bennett , J.H. , ed. Collected Papers of R.A. Fisher, vol. I, 1912–24 , University of Adelaide Press .
- Fisher , R.A. , Corbet , A.S. and Williams , C.B. 1943 . The relation between the number of species and the number of individuals in a random sample of an animal population. . Journal of Animal Ecology , 12 : 42 – 58 .
- Gale , W.A. and Church , K.W. 1994 . “ What is wrong with adding one? ” . In Corpus‐Based Research into Language. , Edited by: Oostdijk , N. and De Haan , P. 189 – 198 . Amsterdam : Rodopi .
- Good , I.J. 1953 . The population frequencies of species and the estimation of population parameters. . Biometrika , 40 : 237 – 264 .
- Good , I.J. 1965 . The Estimation of Probabilities: An Essay on Modern Bayesian Methods. , Cambridge, Mass. : M.I.T. Press. .
- Good , I.J. and Toulmin , G.H. 1956 . The number of new species, and the increase in population coverage, when a sample is increased. . Biometrika , 43 : 45 – 63 .
- Goodman , L.A. 1949 . On the estimation of the number of classes in a population. . Annals of Mathematical Statistics , 20 : 572 – 579 .
- Hinsley , F.H. and Stripp , A. , eds. 1993 . Codebreak‐ers: The Inside Story of Bletchley Park. , Oxford : Oxford University Press .
- Hodges , A. 1983 . Alan Turing: The Enigma of Intelligence. , London : Burnett Books .
- Jeffreys , H. 1948 . Theory of Probability, , 2nd ed. , Oxford : Clarendon Press .
- Jelinek , F. and Mercer , R. 1985 . Probability distribution estimation from sparse data. . IBM Technical Disclosure Bulletin , 28 : 2591 – 2594 .
- Johnson , W.E. 1932 . Probability: the deductive and inductive problems. . Mind , 41 : 409 – 423 .
- Katz , S.M. 1987 . Estimation of probabilities from sparse data for the language model component of a speech recognizer. . IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP‐35 , : 400 – 401 .
- Lidstone , G.J. 1920 . Note on the general case of the Bayes‐Laplace formula for inductive or a posteriori probabilities. . Transactions of the Faculty of Actuaries , 8 : 182 – 192 .
- McNeil , D. 1973 . Estimating an author's vocabulary. . Journal of the American Statistical Association , 68 : 92 – 96 .
- Marshall , I. 1987 . “ Tag selection using probabilistic methods. ” . In The Computational Analysis of English. , Edited by: Garside , R.G. , Leech , G.N. and Sampson , G.R. 42 – 56 . Harlow, Essex : Longman .
- Mosteller , F. and Wallace , D.L. 1964 . Inference and Disputed Authorship: , The Federalist London : Add‐ison‐Wesley .
- Nádas , A. 1985 . On Turing's formula for word probabilities. . IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP‐33 , : 1414 – 1416 .
- Perks , W. 1947 . Some observations on inverse probability including a new indifference rule. . Journal of the Institute of Actuaries , 73 : 285 – 312 .
- Press , W.H. , Flannery , B.P. , Teukolsky , S.A. and Vetter‐ling , W.T. 1988 . Numerical Recipes in C. , London : Cambridge University Press .
- Sampson , G.R. 1995 . English for the Computer: The SUSANNE Corpus and Parsing Scheme. , Oxford : Clarendon Press .
- Sproat , R. , Shih , C , Gale , W.A. and Chang , N. 1994 . “ A stochastic finite‐state word‐segmentation algorithm for Chinese. ” . In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics 66 – 73 .
- Weisberg , S. 1985 . Applied Linear Regression, , 2nd ed. , London : Wiley .
- Zipf , G.K. 1935 . The Psycho‐Biology of Language: An Introduction to Dynamic Philology. , London : Houghton Mifflin . reprinted by M.I.T. Press Cambridge, Mass.), 1965
- Zipf , G.K. 1949 . Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. , London : Addison‐Wesley . reprinted by Hafner, London, 1965
- Address correspondence to Geoffrey Sampson, School of Cognitive and Computing Sciences, University of Sussex, Falmer, Brighton BN1 9QH, England, e‐mail: [email protected], tel.: +44 1273 678525, fax: +44 1273 671320. The authors are very grateful to Professor I.J. Good for detailed comments on a draft of this paper. Responsibility for the contents of the paper is the authors’ alone. W.A. Gale has retired, March 1995.