Abstract
Statistics textbooks for undergraduates have not caught up with the enormous amount of analysis of Internet data that is taking place these days. Case studies that use Web server log data or Internet network traffic data are rare in undergraduate Statistics education. And yet these data provide numerous examples of skewed and bimodal distributions, of distributions with thick tails that do not follow the usual models studied in class, and many other interesting statistical curiosities. This paper summarizes the results of research in two areas of Internet data analysis: users' web browsing behavior and network performance. We present some of the main questions analyzed in the literature, some unsolved problems, and some typical data analysis methods used. We illustrate the questions and the methods with large data sets. The data sets were obtained from the publicly available pool of data and had to be processed and transformed to make them available for classroom exercises. Students in Introductory Statistics classes as well as Probability and Mathematical Statistics courses have responded to the stories behind these data sets and their analysis very well. The message in the stories can be conveyed at a descriptive or a more advanced level.
Acknowledgments
We thank the referees and W. Robert Stephenson, the editor of the JSE, for their very helpful comments and suggestions, which have improved the paper considerably. The research in this paper was funded by the Office of Instructional Development at UCLA, under year 2003 Instructional Improvement Grant OID IIP 03–20 to the main author, to whom all correspondence should be addressed. The second author is a graduate student in Statistics who collaborated in the material of Section 3. The contents of the paper are a small part of a larger project intended to create activities and data sets for undergraduate students in Computer Science taking their first course in Statistics. The first author of the paper has used the activities in almost all her classes, at one level or another, with good response from students. We would also like to thank Walter Rosenkrantz, Jeff Mogul, Zhiyi Chi and Mark Hansen who provided some references for consultation at the beginning of the project.