ABSTRACT
This paper considers the problem of partitioning an individual GPS trajectory data into homogeneous, meaningful segments such as stops and trips. Signal loss and signal noise are highly prevalent in human trajectory data, and it is challenging to deal with uncertainties in segmentation algorithms. We propose a new trajectory segmentation algorithm that detects stop segments in a noise-robust manner from GPS data with time gaps. The algorithm consists of three steps that impute time gaps, split data into base segments and estimate states over a base segment. The state-dependent path interpolation was proposed as a framework for gap imputation to deal with locational and temporal uncertainties associated with signal loss. A spatiotemporal clustering-based trajectory segmentation was proposed to detect spatiotemporal clusters of any shape regardless of density to cut a trajectory into internally similar base segments. Fuzzy inference was employed to deal with borderline cases in determining states over base segments based on input features. The proposed algorithm was applied to detect stop/move episodes from raw GPS trajectories that were collected from 20 urban and 19 suburban participants. Sensitivity analysis was conducted to guide the choice of parameters such as the temporal and spatial definitions of a stop. Experimentation results show that the proposed method correctly identified 92% of stop/move episodes, and correctly estimated 98% of episode duration. This study indicates that a sequence of state-dependent gap imputation, clustering-based data segmentation and fuzzy-set-based state estimation can satisfactorily deal with uncertainty in processing human GPS trajectory data.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
1. To detect data points with abrupt location change, a location deviation score is calculated as a ratio of the distance between the current point p and the representative location of p’s temporal neighbor to standard distance of p’s temporal neighbor. The higher the location deviation score, the more the current location is deviated from its temporal neighbor. Data points with a location deviation score >2.4 are marked as outliers based on experimentations.
2. A sigmoid function was used because it would allow for controlling a range of possible values based on central values and level of uncertainty. Parameters for a central value and slope in the sigmoid function were set based on summary statistics of 15% random sample of collected data. Comparison of membership functions () to logistic regression indicates that defined functions perform fairly well (84% vs. 64%). The automated process for optimizing membership function was described in Biljecki et al. (Citation2013, p. 20).
3. To calculate precision and recall, inferred segments within 100 m (SR) and 1 min (MinSegDur) from actual segments are considered to be matched.