8,483
Views
132
CrossRef citations to date
0
Altmetric
Original Articles

SOCIAL SCIENCE RESEARCH METHODS IN INTERNET TIME

Pages 639-661 | Received 05 Nov 2011, Accepted 30 Jan 2012, Published online: 02 Mar 2012
 

Abstract

This article discusses three interrelated challenges related to conducting social science research in ‘Internet Time’. (1) The rate at which the Internet is both diffusing through society and developing new capacities is unprecedented. It creates some novel challenges for scholarly research. (2) Many of our most robust research methods are based upon ceteris paribus assumptions that do not hold in the online environment. The rate of change online narrows the range of questions that can be answered using traditional tools. Meanwhile, (3) new research methods are untested and often rely upon data sources that are incomplete and systematically flawed. The paper details these challenges, then proposes that scholars embrace the values of transparency and kludginess in order to answer important research questions in a rapidly-changing communications environment.

Notes

Particularly with the 2001 closing of SixDegrees.org – the initial social networking site – demonstrating that being a first-mover is not always such an advantage.

My favorite example of this phenomenon is technology writer Gleick's (Citation1995) New York Times Magazine essay, ‘this is sex?’ Therein, he argues that the Internet will never become a popular medium for pornography, because the content is too grainy and slow. … Suffice it to say, as bandwidth, storage capacity, and processor speed all grew, the online market for prurient information developed some new dynamics.

I am borrowing terminology and concepts from the diffusion-of-innovation literature here. See Rogers (Citation2003) and Von Hippel (Citation2005) for further discussion.

Trippi (Citation2005) refers to this development as ‘snow plowing’. He argues that before the Dean campaign could set online fundraising records, they needed commercial giants like Amazon.com and Ebay to effectively acclimate citizens-as-consumers to online purchasing habits.

As anecdotal evidence, in the course of preparing this article for submission, I have had conversations with two doctoral students who began with promising research questions, but had to abandon or deeply modify their projects when the Wayback Machine turned out to lack the relevant archived pages.

Except Hitwise. And Sitemeter is an opt-in decision for individual blogs. Alexa, Comscore, and Quantcast only provide traffic measures for larger sites, and each has systematic flaws. The data landscape is really quite a headache.

The sole option is to contact their webmaster, hope they chose to save this data themselves, and then ask very, very nicely.

A parallel challenge faces large-scale text classification. Most content-scrapers are built around RSS feeds. Provided with realtime data, they can offer sophisticated analysis. Provided with archived data, they frequently get bogged down or require time-consuming cleaning. Enterprising computational social scientists would be well-advised to set up several ‘lobster traps’.

What is more, it would take months to gather this data, and likely multiple years to publish. At which point the finding would likely be dated because of some new development in the two platforms.

To be clear, I do not mean ‘opposing’ in the Exchange Theory sense (Salisbury Citation1969), where peer groups are viewed as competitors. MoveOn regularly shares tips and best practices with ideological peers, and even occasionally raises money for them. They do not, however, share with conservatives.

To their credit, Bimber and Davis (Citation2003) do provide such a transparent discussion. I am calling for an increased emphasis on this practice, rather than suggesting previous scholars have failed in it entirely.

Sitemeter is an opt-in system that measures unique visitors per day.

Quantcast measures unique visitors per month. It operates at the domain name level, meaning political blogs hosted by large online news sites (Slate, NYTimes, Salon) simply cannot be estimated. Alexa and quantcast work well for large websites, but poorly for mid-tier and smaller blogs.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.