Abstract
A large body of research in the learning sciences has focused on students' commonsense science knowledge—the everyday knowledge of the natural world that is gained outside of formal instruction. Although researchers studying commonsense science have employed a variety of methods, 1-on-1 clinical interviews have played a unique role. The data that result from these interviews take the form of video recordings, which in turn are often compiled into written transcripts and coded by human analysts. In this article, I explore the application of computational techniques to the analysis of this familiar type of data. I describe the success I have had using extremely simple methods from computational linguistics—methods that are based on rudimentary vector space models and simple clustering algorithms. These automated analyses are employed in an exploratory mode as a way to discover student conceptions in the data. The aims of this article are primarily methodological in nature: I attempt to show that it is possible to use techniques from computational linguistics to analyze data from commonsense science interviews in a manner that may provide convergent support for the work of human analysts. As a test bed, I draw on transcripts of a corpus of interviews in which 54 middle school students were asked to explain the seasons.
Notes
1National Science Foundation Grant No. REC-0092648. Conceptual dynamics in complex science interventions (B. Sherin, principal investigator).
2Pushing the analysis to a larger number of clusters did not significantly add to the interpretable clusters. It is not clear at this point whether this is a limitation of the analysis or a result of the limited size of the data corpus.
3Because segments include overlapping portions of the transcript, the challenge appears in more than one segment.
4Although detailed analyses of individual interviews for the remaining 26 transcripts are not presented here or in the online supplemental materials, the reader is reminded that these transcripts nonetheless figured in the analysis that produced the centroid vectors reported in and .
5See, for example, CitationManning et al. (2008) for a discussion of some of these alternative clustering techniques and similar applications in computational linguistics.