870
Views
0
CrossRef citations to date
0
Altmetric
Original Articles

Datasets and Stories: Introduction and Guidelines

&

Abstract

We describe the purpose of the “Datasets and Stories” section of this journal. Guidelines for submitting datasets and articles to this section are discussed. Instructions are provided for retrieving data from the JSE data archives.

1. Introduction

The purpose of this section is to provide a forum for exchanging interesting data and discussing ways that such data can be used effectively in teaching statistics. In each issue, we intend to feature one or two datasets with full articles describing their use. In addition, an archive of these and other datasets has been established as a resource for readers. Below we describe procedures for accessing these datasets and guidelines for submitting your favorite data.

IMPORTANT: The success of this section is critically dependent upon the willingness of its readers to contribute both interesting datasets and descriptive articles.

2. Guidelines for Submitting Data

At least two files are associated with each archived dataset. A “doc” file should contain adequate documentation to explain the structure of the data, give the source, describe all variable codings, provide sufficient narrative to put the data in context, and suggest some interesting questions to pursue. A blank template for such a documentation file is stored in the data archives and appears as an appendix below.

A second “dat” file contains the raw data as a flat ASCII text file. The “doc” file should contain any format information needed to process the raw data by standard computer packages. In some cases, a dataset might require more than one raw data file.

These two files are required for a dataset to be entered into the data archive. Some contributors will, in addition, have experiences to share using the dataset in the classroom. We encourage these contributors to write an article for the “Datasets and Stories” section of JSE.

3. Guidelines for Submitting Dataset Articles

An article for the “Datasets and Stories” section should be an expansion of the narrative which is found in the “doc” file. It should follow the general guidelines for any JSE article and will be subject to a similar review process. Authors are encouraged to emphasize the “story” aspect of this section by elaborating on the circumstances and questions which led to the collection of the data. We also encourage descriptions of creative ways the data might be used in teaching statistics, particularly those that are based on actual experiences.

4. Criteria for Suitable Datasets

We hesitate to define in advance what are or are not “good” datasets, but several criteria will be considered before making data available in the archives.

(a)

Copyright issues. It is the responsibility of the contributor to secure any permissions needed to make the data freely available to all.

(b)

Reality. In general, we prefer “real” as opposed to “artificial” data, although we acknowledge the usefulness of some well-crafted “fake” data in certain teaching situations.

(c)

Size. Very large datasets (e.g., greater than 1 megabyte of storage) are discouraged unless they have particularly interesting pedagogical appeal. On the other hand, very small datasets (e.g., a two-way table demonstrating Simpson's paradox) may not require computer analysis, but are still useful examples to have for teaching and should be included in the archives.

(d)

General appeal. We are seeking datasets which other instructors might find useful. While that does not exclude examples which are specific to a given discipline, we caution contributors to avoid technical jargon and arcane situations which might be accessible or appeal to only a very limited audience of students.

(e)

Other JSE articles. Authors of other articles in this journal may choose to make raw data relevant to their articles available through the JSE data archives.

(f)

Textbook data. In general, datasets appearing in textbooks would require specific permission from the publisher in order to be included in the JSE data archives. We will consider requests from authors or publishers to make data files for an entire text available through the JSE data archives.

5. Accessing Archived Datasets

Documentation and data files are retrievable through e-mail by sending a message to the address:

[email protected]

Both the “doc” and “dat” files are found with a common root name in the directory “jse/data”. Thus a typical message to retrieve a “doc” or “dat” file should look like

send jse/data/93cars.doc

send jse/data/93cars.dat.txt

A special index file (http://ww2.amstat.org/publications/jse/archive.htm) contains a listing of datasets currently available in the JSE data archives. Descriptive articles (if available) are found in the appropriate JSE volumes.

To serve as an example for submissions to the “Datasets and Stories” section, this issue includes a description by Robin Lock of some data on 1993 model automobiles. The full article is found at http://ww2.amstat.org/publications/jse/v1n1/datasets.lock.html. The raw data and documentation are at http://ww2.amstat.org/publications/jse/datasets/93cars.dat.txt and http://ww2.amstat.org/publications/jse/datasets/93cars.txt.

In future issues we will use this space to list new additions to the JSE data archives and to direct readers to descriptive articles in the “Datasets and Stories” section.

6. Contributions and Comments

Data for the archives, articles for the “Datasets and Stories” section, and questions or suggestions should be directed to either of the section editors:

Robin H. Lock

Mathematics Department

St. Lawrence University

Canton, NY 13617

(315) 379-9021 (office)

(315) 379-5804 (fax)

[email protected]

Tim Arnold

Department of Statistics, Box 8203

North Carolina State University

Raleigh, NC 27695-8203

(919) 515-1927 (office)

(919) 515-7591 (fax) [email protected]

Addendum (added July 7, 1999)

In November 1998, “doc” files were renamed “txt” files to avoid confusion with Word files. In July 1999, some obsolete file names and links in this article were updated.

E-mail access to data and documentation files has been replaced by access through the World Wide Web. Thus the e-mail instructions at the beginning of Section 5 of this paper are no longer correct.

Addendum (added November 2010)

In November 2010, Datasets and Stories Editor Dex Whittinghill made a change to the template for data documentation files. Please use the November 2010 Updated Template for Data Documentation Files that supersedes the former Appendix A in the original Lock and Arnold paper. Do not use the template referenced in the original Lock and Arnold paper.

November 2010 Updated Template for Data Documentation Files

This form is available at http://ww2.amstat.org/publications/jse/v18n3/datasets_template.htm

NAME: A descriptive name for the dataset file (.txt or .dat.txt)

TYPE: e.g., Random sample, Census, Time series, Designed experiment,…

SIZE: Number of observations, number of variables

ARTICLE TITLE: Title of the article, when appropriate

DESCRIPTIVE ABSTRACT:

A brief (no more than 10 lines) description of the dataset.

SOURCES:

Acknowledge any published data sources or give brief description of origins of the data.

VARIABLE DESCRIPTIONS:

Provide a "key" for reading the ASCII data file. Explain how the data is delimited (tab, comma, space, etc.), any variable codings (including missing values) and/or measurement units.

SPECIAL NOTES:

Describe any special circumstances which should be brought to the attention of persons attempting to analyze the data.

STORY BEHIND THE DATA:

A brief narrative describing the origins of the data and the reasons they were collected. This is a good place to supply any background needed to understand the underlying variables, describe relevant issues, and suggest questions which might be of interest. This and the next section should be fairly concise. If you find them getting too long – it's time to write a full “Datasets” article!

PEDAGOGICAL NOTES:

Suggest some ways an instructor might use the data in class. Describe any interesting features and/or statistical concepts which are well illustrated.

REFERENCES:

Include any references not in the SOURCES section.

SUBMITTED BY:

Name

Affiliation

Surface address

e-mail address

(This gives you credit and provides a source for instructors who find the data useful to get clarifications if needed.)

Appendix

EDITOR'S NOTE: The form below is no longer in use. Use Nov. 2010 update saved at http://ww2.amstat.org/publications/jse/v18n3/datasets_template.htm

A Template for Data Documentation (DOC) Files

This form is no longer in use.

NAME: A descriptive title

TYPE: e.g., Random sample, Census, Time series, Designed experiment,…

SIZE: Number of observations, number of variables

DESCRIPTIVE ABSTRACT:

A brief (no more than 10 lines) description of the dataset.

SOURCES:

Acknowledge any published data sources or give brief description of origins of the data.

VARIABLE DESCRIPTIONS:

Provide a “key” for reading the ASCII data file. Explain any variable codings (including missing values) and/or measurement units.

SPECIAL NOTES:

Describe any special circumstances which should be brought to the attention of persons attempting to analyze the data.

STORY BEHIND THE DATA:

A brief narrative describing the origins of the data and the reasons they were collected. This is a good place to supply any background needed to understand the underlying variables, describe relevant issues, and suggest questions which might be of interest. This and the next section should be fairly concise. If you find them getting too long – it's time to write a full “Datasets” article!

PEDAGOGICAL NOTES:

Suggest some ways an instructor might use the data in class. Describe any interesting features and/or statistical concepts which are well illustrated.

REFERENCES:

Include any references not in the SOURCES section.

SUBMITTED BY:

Name

Affiliation

Surface address

e-mail address

(This gives you credit and provides a source for instructors who find the data useful to get clarifications if needed.)

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.