3,080
Views
9
CrossRef citations to date
0
Altmetric
Special Section: Social Media and Tracking Data

Representation and analytical models for location-based big data

ORCID Icon, ORCID Icon, &
Pages 707-713 | Received 13 Dec 2018, Accepted 17 Dec 2018, Published online: 07 Jan 2019

1. Introduction

The last decade has seen an exponential growth in location-based big data research. Indeed, the availability of fine-grained location-based big data has created unprecedented opportunities for researchers from various disciplines. A number of reasons may have attributed to the tremendous popularity of such data in research. For instance, big data are typically easier to obtain and are often cheaper than traditional questionnaires or surveys. Depending on the chosen method, acquisition of big data can also be less intrusive and may not require efforts from volunteers to release data. Computational methods needed to process these data are currently developing at a rapid pace (Ahas et al. Citation2017). More importantly, big data may allow us to answer questions that cannot be answered otherwise. For example, geo-tagged tweet locations in the world within 24 hours can be used to illuminate cities on an hour-by-hour or even minute-by-minute basis (Jiang Citation2015). It shows how these cities ‘live’ the day. Similarly, the big data approach can be used to understand how cities have evolved over hundreds of years and to predict their dynamics or behavior in the future.

From location-based social media platforms or tracking data devices, disaggregated data are available at high spatial and temporal granularities. Such data constitute an important type of emerging geospatial big data which significantly differ from conventional small data in at least three aspects: sampled versus all, estimated versus measured, aggregated versus individual-based. This perception of big and small data dichotomy is illustrated in (Jiang Citation2015). One may disagree on the first aspect, arguing that big data is also sampled from people on the planet. However, here ‘all’ refers to all social media or location-tracker users whose geo-locations and time-stamps are recorded or shared. All of these users constitute population for us to study their behaviors and related issues, while researchers do not need to impose any sampling strategy to extract a subset from the population.

Figure 1. The three data characteristics that differentiate between big data and small data.

Figure 1. The three data characteristics that differentiate between big data and small data.

The unprecedented research opportunities using location-based big data also come with abundant research challenges regarding theoretical, technical, ethical, and social questions. For instance, big data are often unstructured, multi-dimensional, and of various data types. We need to design appropriate representational models to effectively organize the big data and to develop new techniques to process them to learn from them. This is an emerging field that draws attention and expertise from a wide range of fields. Therefore, we are making continuing and collective efforts to bring researchers together to share research findings from various perspectives. Following the first special issue of the series which is about new insights gained from such data (Yao et al. Citation2016), this special section focuses specifically on representation and analytical models for location-based social media and tracking data. Many of the submissions were significantly expanded upon presentations at a research symposium on the same theme (https://gam.icaci.org/symposium-2017/), which brought together researchers from different parts of the world in various fields of study. The symposium offered a forum for participants to exchange ideas and research findings and to discuss what can be pursued in the future. Following the successful symposium, authors of selected full papers were invited to submit a revised version of their papers for this special section of the IJGIS, while at the same time the special section was also publicly open for everyone to make submissions. Eighteen submissions were received and went through the rigorous peer-review process of the journal. Finally, six high-quality papers were accepted to the special section. We believe these are front-end research articles related to representation and analytical models for location-based social media data and tracking data.

As a follow-up of this special section, another symposium on location-based big data is currently being organized for July 2019 (https://lbs.icaci.org/locbigdata/), with special focus on big data analytics and machine learning. It will offer another common ground for researchers to share ideas and research findings and to discuss future research directions on location-based big data.

2. Current trends in research using location-based social media and tracking data

Research on location-based big data can be generally classified into two categories, those use the data and those about the data. Many research questions that could not be addressed or even raised with the traditional small data are now enabled by the use of location-based big data. Burgeoning applications of such data have been found in all aspects of social and natural sciences. When using the data, often traditional techniques that had been developed for their small data counterparts were adopted directly or with some limited modifications. These studies belong to the first category of geospatial big data research, and they account for most of the current research using such data. The second type of research focuses on the special characteristics of such data and attempt to contribute new theoretical investigations and/or technical innovations for better understanding or better management of such data. Summarized below are some characteristics of current state of the art research developments in the field and related research contributions of papers in this special section.

2.1. New data sources and diverse applications

The rapid growth and wide application of location-detecting and network technologies have opened many data sources. Global Navigation Satellite Systems (GNSS), such as the global positioning system (GPS), are used to provide geolocation and time information to a receiver anywhere on the earth surface. The built-in GPS receiving capability is widely available in smartphones, activity trackers, and location trackers. Thus, location tracking is made easy with the use of these devices that have built-in GPS or RFID chips. Social media platforms, such as twitter and facebook, allow users to share their locations via location-aware devices. The location-aware chips can not only be carried by humans but also be attached to animals or other moving objects. In addition to locations and time, other physiological or environmental context data can also be recorded at the same time. Context information, in combination with locations, provided much-enriched power in wayfinding (e.g., Xi et al. Citation2016) and other applications. Furthermore, indoor positioning systems are developed to track locations of objects or people inside buildings. These systems use different technologies including those based on WIFI, Bluetooth, RFID, magnetic positioning, and so on.

Two articles in this special section contributed studies based on other types of new data sources. Although less popular and more technically challenging, geographic data can be obtained from text-based social media. In this special section, Hu et al. (Citation2018) harvested local place names from housing advertisements posted on local-oriented public websites. Liao et al. (Citation2018) collected eye movement data in real-world pedestrian navigation scenarios using eye tracking glasses to infer user tasks for pedestrian navigation. Other papers in the special section contributed to the literature a wide array of creative investigations that use location-based social media and tracking data to study urban structure, movement trajectories, social events detection, and others.

2.2. New representational or conceptual frameworks

The location-based social media data and tracking data may consist of various data types and forms, such as text, images, geo-locations and timestamps, and possibly other measurement data. The mixed and often ill-structured data demand new representational models. The new models are expected to organize not only the data for efficient retrieval but also the intricate interconnections among data for further analysis. New conceptual frameworks are proposed when the big and ill-structured data are used to study sophisticated problems or processes. In this special section, Jiang and Ren (Citation2018) put forward a new representation model to structure the unstructured big data for predicting human activities in the geographic space. Dunkel et al. (Citation2018) presented a conceptual framework to characterize and to compare collective reactions to social events. In this framework, location-based social media data were collected and analyzed in spatial, temporal, social, thematic, and interlinkage dimensions. Koylu (Citation2018) introduced a framework for modeling and visualizing the semantic and spatio-temporal evolution of topics in an interpersonal communication network.

2.3. New analytical approaches

For the reasons discussed above, many conventional methods for small data cannot simply be applied for a big data research project directly. New techniques have been developed, and existing techniques have been creatively tailored to meet the special needs required by the new types of data. One of the major reasons for organizing this special section is to share recent research developments in analytical approaches. In the special section, Liao et al. (Citation2018) adopted random forest classifier, a machine learning technique, to infer five common navigation tasks based on eye movement features. Koylu (Citation2018) developed an analytical process that consists of topic modeling, geo-social network modeling, and geovisual exploration, for modeling and visualizing the evolution of topics on social media. Hu et al. (Citation2018) applied natural language processing techniques to extract possible local place names from textual social media data, and then perform multiscale geospatial clustering analysis to filter out the non-place names. Guo and Karimi (Citation2018) proposed an analytical model to predict urban movement trajectories and a MapReduce-based algorithm to simulate large-scale trajectory distributions under real-time constraints.

3. Research challenges: possible future research directions

Despite the rapid advances in location-based social media and tracking data research, many scientific challenges are still present. Discussed below are some of the research directions that appear to be particularly important to us.

3.1. A new paradigm

Compared with small data, the location-based social media data and tracking data often show most (if not all) of the typical characteristics of big data, the so-called four Vs (volume, variety, velocity, and veracity). The 4Vs are very much concerned with computing storage, retrieve, and processes. However, we believe that the aforementioned three aspects of characteristics that differentiate big data from small data also deserve special attention. These three characteristics imply that big data is a new paradigm rather than just a new type (Jiang Citation2015). Many conventional representations and analytical models that were developed for small data are invalid or inappropriate for big data. For instance, as shown in , in the small data era, Gaussian statistics based on normal distribution and Euclidean geometry for regular shapes are fairly sufficient for small data analytics. However, they are unsuitable for big data that typically show different forms of distribution. Instead, Paretian statistics based on long-tailed distribution and fractal geometry for irregular shapes have been advocated for big data analytics.

Figure 2. Fractal or living geometry (Mandelbrot Citation1982, Alexander Citation2002–2005) and Paretian statistics (Zipf Citation1949, Newman Citation2005) for big data, while Euclidean geometry and Gaussian statistics for small data.

Figure 2. Fractal or living geometry (Mandelbrot Citation1982, Alexander Citation2002–2005) and Paretian statistics (Zipf Citation1949, Newman Citation2005) for big data, while Euclidean geometry and Gaussian statistics for small data.

3.2. Systematic and non-systematic biases of location-based big data

In addition to the common quality issues to any kind of data, location-based social media data and tracking data have inherent problems of representational biases and uncertainty. People who do use social media and post messages on them do not necessarily form similar demographic structure of the entire population. In other words, these users may not be representative of the entire population. We need to better understand the nature of these biases and uncertainty. To do that, research strategies need to be developed and research efforts be made to deal with the biases. Being able to estimate the biases would allow us to better interpret research results and to generalize findings for a larger population.

3.3. Representation models

As discussed before, location-based social media data often consist of various types of data and are typically unstructured. On the one hand, the data are rich and fine-grained. On the other hand, traditional database models including the popular GIS data models are insufficient to support such data for efficient retrieval of data and relationships. This gap calls for future research efforts on representational data models that can effectively organize and retrieve not only all types of data elements, but also the explicit or implicit relationships among those elements.

3.4. Data issues

Some emerging research issues are found to be directly associated with data handling. Data fusion is one of them. With more data sources becoming available for the same events or topics, it is necessary to develop analytical approaches to integrating heterogeneous data so that a more comprehensive picture of the reality can be captured. Another research issue is about real-time data processing. Recent developments have seen location-based big data streaming into servers in a real-time and continuous fashion. For many applications, the analysis results are only useful if they become available with no or little delay. It raises research challenges for the development of highly efficient data processing and analysis techniques to support streaming processing.

3.5. Privacy-preserving data analytic

In many cases, location-based social media data and tracking data contain some identifiable information of a user (e.g., travel histories). This raises privacy concerns. There is a danger that social media data may open doors to invade the privacy of individuals. Zook et al. (Citation2015) pointed out that the relative newness of these data sources and approaches mean that researchers face the ongoing need to establish best practice for accuracy across a range of contexts. The need is real and important to develop privacy-preserving analytical approaches when using location-based big data. Special research efforts are needed to investigate the privacy issue associated with location-based social media and tracking data.

3.6. Ethical and social issues

In addition to privacy issues, the growing popularity of location-based big data has also led to many other social and ethical issues. While the general topic has attracted many attentions, significant research efforts are still in need for the following two questions: ‘what are social and ethical issues are present for location-based big data research?’ and more importantly, ‘how can these issues be addressed to ensure the proper use of location-based big data?’ The issues should be questioned and investigated from both technical and non-technical (e.g. regulation) perspectives (Huang et al. Citation2018).

References

  • Ahas, R., Krisp, J.M., and Toivonen, T., 2017. Methodological aspects of using geocoded data from mobile devices in transportation research. Journal of Location Based Services, 11, 75–77. doi:10.1080/17489725.2017.1427020
  • Alexander, C., 2002–2005. The nature of order: an essay on the art of building and the nature of the universe. Berkeley, CA: Center for Environmental Structure.
  • Dunkel, A., et al., 2018. A conceptual framework for studying collective reactions to events in location-based social media. International Journal of Geographical Information Science. doi:10.1080/13658816.2018.1546390
  • Guo, Q. and Karimi, H.A., 2018. A methodology with a distributed algorithm for large-scale trajectory distribution prediction. International Journal of Geographical Information Science. doi:10.1080/13658816.2018.1536981
  • Hu, Y., Mao, H., and McKenzie, G., 2018. A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements. International Journal of Geographical Information Science. doi:10.1080/13658816.2018.1458986
  • Huang, H., et al., 2018. Location based services: ongoing evolution and research agenda. Journal of Location Based Services, 12, 63–93. doi:10.1080/17489725.2018.1508763
  • Jiang, B., 2015. Big data is a new paradigm. Available from: https://www.researchgate.net/publication/283017967_Big_Data_Is_a_New_Paradigm
  • Jiang, B. and Ren, Z., 2018. Geographic space as a living structure for predicting human activities using big data. International Journal of Geographical Information Science. doi:10.1080/13658816.2018.1427754
  • Koylu, C., 2018. Modeling and visualizing semantic and spatio-temporal evolution of topics in interpersonal communication on Twitter. International Journal of Geographical Information Science. doi:10.1080/13658816.2018.1458987
  • Liao, H., et al., 2018. Inferring user tasks in pedestrian navigation from eye movement data in real world environments. International Journal of Geographical Information Science. doi:10.1080/13658816.2018.1482554
  • Mandelbrot, B.B., 1982. The fractal geometry of nature. New York: W. H. Freeman and Co.
  • Newman, M.E.J., 2005. Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 46 (5), 323–351. doi:10.1080/00107510500052444
  • Xi, D., et al., 2016. A visual salience model for wayfinding in 3D virtual urban environments. Applied Geography, 75, 176–187. doi:10.1016/j.apgeog.2016.08.014
  • Yao, X., et al., 2016. New insights gained from location-based social media data. special issue. Computers, Environment and Urban Systems, 58.
  • Zipf, G.K., 1949. Human behaviour and the principles of least effort. Cambridge, MA: Addison Wesley.
  • Zook, M., Kraak, M.J., and Ahas, R., 2015. Geographies of mobility: applications of location-based data. International Journal of Geographical Information Science, 29, 1935–1940. doi:10.1080/13658816.2015.1061667

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.