329
Views
137
CrossRef citations to date
0
Altmetric
Original Articles

Good‐turing frequency estimation without tearsFootnote*

&
Pages 217-237 | Published online: 21 Jul 2008

References

  • Apostel , L. , Mandelbrot , B. and Morf , A. 1957 . Logique, langage, et théorie de l'information. , Paris : Presses Universitaires de France .
  • Bachenko , Joan and Gale , W.A. 1993 . A corpus‐based model of interstress timing and structure . Journal of the Acoustic Society of America , 94 : 1797
  • Box , G.E.P. and Tiao , G.C. 1973 . Bayesian Inference in Statistical Analysis. , London : Addison‐Wesley .
  • Chao , Y.R. 1968 . A Grammar of Spoken Chinese. , Berkeley and Los Angeles : University of California Press .
  • Chitashvili , R.J. and Baayen , R.H. 1993 . “ Word frequency distributions ” . In Quantitative Text Analysis (Quantitative Linguistics , Edited by: Hrebíček , L. and Altmann , G. vol. 52 , 54 – 135 . Trier : Wissenschaftlicher Verlag .
  • Church , K.W. 23–26 May 1989 . “ A stochastic parts program and noun phrase parser for unrestricted text ” . In IEEE 1989 International Conference on Acoustics, Speech, and Signal Processing 23–26 May , Glasgow
  • Church , K.W. and Gale , W.A. 1991 . A comparison of the enhanced Good‐Turing and deleted estimation methods for estimating probabilities of English bi‐grams. . Computer Speech and Language , 5 : 19 – 54 .
  • Church , K.W. , Gale , W.A. and Kruskal , J.B. 1991 . “ The Good‐Turing theorem. ” . In A comparison of the enhanced Good‐Turing and deleted estimation methods for estimating probabilities of English bigrams. , Computer Speech and Language, 5 Edited by: Church , K.W. and Gale , W.A. 19 – 54 . Appendix A.
  • Efron , B. and Thisted , R. 1976 . Estimating the number of unseen species: How many words did Shakespeare know? . Biometrika , 63 : 435 – 447 .
  • Fienberg , S.E. and Holland , P.W. 1972 . On the choice of flattening constants for estimating multinomial probabilities. . Journal of Multivariate Analysis , 2 : 127 – 134 .
  • Fisher , R.A. 1922 . On the mathematical foundations of theoretical statistics. . Philosophical Transactions of the Royal Society of London, A , 222 : 309 – 368 .
  • Bennett , J.H. , ed. Collected Papers of R.A. Fisher, vol. I, 1912–24 , University of Adelaide Press .
  • Fisher , R.A. , Corbet , A.S. and Williams , C.B. 1943 . The relation between the number of species and the number of individuals in a random sample of an animal population. . Journal of Animal Ecology , 12 : 42 – 58 .
  • Gale , W.A. and Church , K.W. 1994 . “ What is wrong with adding one? ” . In Corpus‐Based Research into Language. , Edited by: Oostdijk , N. and De Haan , P. 189 – 198 . Amsterdam : Rodopi .
  • Good , I.J. 1953 . The population frequencies of species and the estimation of population parameters. . Biometrika , 40 : 237 – 264 .
  • Good , I.J. 1965 . The Estimation of Probabilities: An Essay on Modern Bayesian Methods. , Cambridge, Mass. : M.I.T. Press. .
  • Good , I.J. and Toulmin , G.H. 1956 . The number of new species, and the increase in population coverage, when a sample is increased. . Biometrika , 43 : 45 – 63 .
  • Goodman , L.A. 1949 . On the estimation of the number of classes in a population. . Annals of Mathematical Statistics , 20 : 572 – 579 .
  • Hinsley , F.H. and Stripp , A. , eds. 1993 . Codebreak‐ers: The Inside Story of Bletchley Park. , Oxford : Oxford University Press .
  • Hodges , A. 1983 . Alan Turing: The Enigma of Intelligence. , London : Burnett Books .
  • Jeffreys , H. 1948 . Theory of Probability, , 2nd ed. , Oxford : Clarendon Press .
  • Jelinek , F. and Mercer , R. 1985 . Probability distribution estimation from sparse data. . IBM Technical Disclosure Bulletin , 28 : 2591 – 2594 .
  • Johnson , W.E. 1932 . Probability: the deductive and inductive problems. . Mind , 41 : 409 – 423 .
  • Katz , S.M. 1987 . Estimation of probabilities from sparse data for the language model component of a speech recognizer. . IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP‐35 , : 400 – 401 .
  • Lidstone , G.J. 1920 . Note on the general case of the Bayes‐Laplace formula for inductive or a posteriori probabilities. . Transactions of the Faculty of Actuaries , 8 : 182 – 192 .
  • McNeil , D. 1973 . Estimating an author's vocabulary. . Journal of the American Statistical Association , 68 : 92 – 96 .
  • Marshall , I. 1987 . “ Tag selection using probabilistic methods. ” . In The Computational Analysis of English. , Edited by: Garside , R.G. , Leech , G.N. and Sampson , G.R. 42 – 56 . Harlow, Essex : Longman .
  • Mosteller , F. and Wallace , D.L. 1964 . Inference and Disputed Authorship: , The Federalist London : Add‐ison‐Wesley .
  • Nádas , A. 1985 . On Turing's formula for word probabilities. . IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP‐33 , : 1414 – 1416 .
  • Perks , W. 1947 . Some observations on inverse probability including a new indifference rule. . Journal of the Institute of Actuaries , 73 : 285 – 312 .
  • Press , W.H. , Flannery , B.P. , Teukolsky , S.A. and Vetter‐ling , W.T. 1988 . Numerical Recipes in C. , London : Cambridge University Press .
  • Sampson , G.R. 1995 . English for the Computer: The SUSANNE Corpus and Parsing Scheme. , Oxford : Clarendon Press .
  • Sproat , R. , Shih , C , Gale , W.A. and Chang , N. 1994 . “ A stochastic finite‐state word‐segmentation algorithm for Chinese. ” . In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics 66 – 73 .
  • Weisberg , S. 1985 . Applied Linear Regression, , 2nd ed. , London : Wiley .
  • Zipf , G.K. 1935 . The Psycho‐Biology of Language: An Introduction to Dynamic Philology. , London : Houghton Mifflin . reprinted by M.I.T. Press Cambridge, Mass.), 1965
  • Zipf , G.K. 1949 . Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. , London : Addison‐Wesley . reprinted by Hafner, London, 1965
  • Address correspondence to Geoffrey Sampson, School of Cognitive and Computing Sciences, University of Sussex, Falmer, Brighton BN1 9QH, England, e‐mail: [email protected], tel.: +44 1273 678525, fax: +44 1273 671320. The authors are very grateful to Professor I.J. Good for detailed comments on a draft of this paper. Responsibility for the contents of the paper is the authors’ alone. W.A. Gale has retired, March 1995.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.