1,149
Views
32
CrossRef citations to date
0
Altmetric
Articles

Making Big Data Small: Strategies to Expand Urban and Geographical Research Using Social Media

Pages 115-135 | Published online: 28 Jun 2017
 

ABSTRACT

While exciting, Big Data (particularly geotagged social media data) has proven difficult for many urbanists and social science researchers to use. As a partial solution, we propose a strategy that enables the fast extracting of only relevant data from large sets of geosocial data. While contrary to many Big Data approaches—in which analysis is done on the entire dataset—much productive social science work can use smaller datasets—around the same size as census or survey data—within standard methodological frameworks. The approach we outline in this paper—including the example of a fully operating system—offers a solution for urban researchers interested in these types of data but reluctant to personally build data science skills.

Notes on Contributors

Ate Poorthuis is an assistant professor in the humanities, arts and social sciences at Singapore University of Technology and Design.

Matthew Zook is a professor of information and economic geography at the at University of Kentucky, Lexington.

Notes

1 Particularly relevant for urbanists and geographers is that, for the subset of geotagged tweets, the difference between the sample and the firehose is negligible.

2 The DOLLY project received an academic white listing in May 2009 for a different project (Dugundji, Poorthuis, and van Meeteren, Citation2011; van Meeteren, Poorthuis, and Dugundji, Citation2009). This original white listing allowed DOLLY to access the elevated garden hose (10%) streaming access without going to a third-party commercial vendor in 2011.

3 The fluctuations seen in and are not a result in the DOLLY methodology or system and are tied to changes in (1) actual Twitter usage and/or (2) changes in Twitter’s public API. However we have been unable to clarify with Twitter the exact cause of these changes.

4 Although the University’s data center was outfitted with its own power generator and UPS, the system was affected by power outages. Likewise, some configuration errors and software bugs resulted in some gaps (generally of a few minutes or hours) while updates were preformed. While we discussed filling these gaps the combination of a short-time horizon for action (before Tweets were no longer available), the weight of other demands on our time and lack of human resources made filling these gaps a lower priority than other tasks crucial to keeping the system going. While DOLLY was up approximately 99.99 percent of the time these gaps bring home the difficulty of maintaining 100 percent uptime in long-term data collection.

5 The open source RabbitMQ that utilized the open AMQP standard is used here (Vinoski, Citation2006)

6 Using Natural Earth Data (“Natural Earth,” Citationn.d.) and the PostGIS spatial database ( Ramsey, Citation2005 )

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 392.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.