194
Views
6
CrossRef citations to date
0
Altmetric
Original Articles

Modified Jaccard index analysis and adaptive feature selection for location fingerprinting with limited computational complexity

ORCID Icon &
Pages 128-157 | Received 28 Jun 2018, Accepted 26 Jan 2019, Published online: 26 Mar 2019
 

ABSTRACT

We propose an approach for fingerprinting-based positioning which reduces the data requirements and computational complexity of the online positioning stage. It is based on a segmentation of the entire region of interest into subregions, identification of candidate subregions during the online-stage, and position estimation using a preselected subset of relevant features. The subregion selection uses a modified Jaccard which quantifies the similarity between the features observed by the user and those available within the reference fingerprint map. The adaptive feature selection is achieved using an adaptive forward-backward greedy search which determines a subset of features for each subregion, relevant with respect to a given fingerprinting-based positioning method. In an empirical study using signals of opportunity for fingerprinting the proposed subregion and feature selection reduce the processing time during the online-stage by a factor of about 10 while the positioning accuracy does not deteriorate significantly. In fact, in one of the two study cases, the 90th percentile of the circular error increased by 7.5% while in the other study case we even found a reduction of the corresponding circular error by 30%.

Acknowledgments

We thank anonymous reviewers for their useful comments and questions which helped significantly improving the paper.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1. In other publications, subregion selection is called spatial filtering (Kushki, Plataniotis, and Venetsanopoulos Citation2007), location-clustering (Youssef, Agrawala, and Shankar Citation2003), or coarse localisation (Feng et al. Citation2012).

2. The forward part of the algorithm is referred to as matching pursuit for least square regression in the signal processing community (Barron et al. Citation2008; Mallat and Zhang Citation1993). In machine learning, it is known as boosting (Bühlmann Citation2006).

3. Herein the initial loss is defined as the mse by using the median of all rp as the estimated position when no candidate feature is selected.

4. For MAP, the precomputed data are the likelihood of the selected features of each subregion and prior probability of each candidate location. For kNN, the precomputed data are the observations of the selected relevant features of each subregion.

5. We implemented MAP as proposed in Youssef and Agrawala (Citation2008). The implementation can be sped up using algebraic factorisation (Bisio et al. Citation2016). For kNN, tree-based methods (e.g. kd-tree) are used in the scikit-learn implementation (Pedregosa et al. Citation2011).

6. Herein we carried out all experiments using only one device. It is to be expected that also the quality of the results obtained using our approach depends on the mobile device(s) used for data collection, see (Bisio et al. Citation2013). However, we leave a related investigation for future work.

7. All APs and beacons had been deployed independently of this experiment and long before it for the purpose of internet access. Each AP supports two frequencies (2.4 GHz and 5 GHz) and has a built-in ble beacon. There is no automatic power adjustment of the APs.

8. An upgrade of the WLAN (e.g. change APs) has been carried out after the data collection of RoI 1.

9. Assessing the impact of different uniform or non-uniform subregion shapes and sizes as well as alternative interpolation strategies is beyond the scope of this paper and left for future work.

10. Due to the constraints (e.g. furnitures and decorations) of accessibility of the RoI, several subregions have no observations. The numbers in parentheses denote the numbers of subregions containing at least one observation.

11. The training data were obtained from the densified RFM obtained through kernel smoothing of raw RFM data, while the test data are separately collected raw data. By coincidence, the size of it is larger than that of the test data.

12. We used Python to implement the proposed method and evaluate the processing time using the time package (https://docs.python.org/3/library/time.html#module-time).

Additional information

Funding

The China Scholarship Council (CSC) financially supports the first author’s doctoral research.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.