519
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Bootstrap bandwidth selection for the pair correlation function of inhomogeneous spatial point processes

ORCID Icon, ORCID Icon & ORCID Icon
Pages 3329-3361 | Received 30 Jan 2022, Accepted 26 May 2023, Published online: 13 Jun 2023

Abstract

This work focuses on kernel estimation of the pair correlation function (PCF) for inhomogeneous spatial point processes. We propose a bootstrap bandwidth selector based on minimizing the mean integrated squared error (MISE). The variance term is estimated by nonparametric bootstrap, and the bias by a plug-in approach using a pilot estimator of the PCF. Kernel estimators of the PCF also require a pilot estimator of the first-order intensity. We test the performance of the bandwidth selector and the role of the pilot intensity estimator in a simulation study. The bootstrap bandwidth selector is competitive with cross-validation procedures, but the contribution of the bandwidth parameter to the goodness-of-fit of the kernel PCF estimator is minor in comparison with that of the pilot intensity function. The data-based kernel intensity estimator leads to biased kernel PCF estimators, while both kernel and parametric covariate-based intensities provide accurate estimators of the PCF.

Mathematics Subject Classifications:

1. Introduction

A main interest in the analysis of spatial point processes is understanding and modelling the amount of interaction between events. Second-order properties such as Ripley´s K-function and the pair correlation function (PCF) have been widely used to characterize the dependence structure of spatial point processes [Citation1–4]. The theory and estimation of these summary statistics have been mainly developed under stationarity, i.e. for spatial point processes with constant intensity function. Baddeley et al. [Citation5] pointed out that this assumption can be very restrictive in practice and introduced second-order reweighted stationary (SOIRS) point processes as those with non-constant intensity function and translation invariant second-order characteristics, and second-order reweighted isotropic (SOIRI) point processes as those with isotropic second-order characteristics. They defined the inhomogeneous K-function of SOIRS and SOIRI point processes as extensions of Ripley´s K-function, and proposed nonparametric estimators for the inhomogeneous K-function and pair correlation functions.

Kernel smoothing is the main nonparametric estimator for the pair correlation function. As for any smoothing procedure, the statistical properties of the kernel estimator of the pair correlation function are highly dependent on the bandwidth parameter. Several data-driven procedures have been developed for stationary point processes, including fixed [Citation6] and adaptive [Citation7] rule of thumb, composite likelihood cross-validation [Citation8], a semiparametric bootstrap procedure [Citation9], and an AMSE-based adaptive plug-in bandwidth selector [Citation10]. However, there are no evidences about the validity of these methods when dealing with inhomogeneous point processes. Guan [Citation11] proposed a least squares cross-validation bandwidth selector for SOIRS point processes, which was adapted by Jalilian and Waagepetersen [Citation12] to improve its computational efficiency.

In addition to bandwidth selection, kernel estimation of the pair correlation function requires an estimator of the first-order intensity function, if it is unknown. In the stationary framework, the consistency of the empirical intensity estimator guarantees the good performance of the kernel estimator of the pair correlation function. In the SOIRS framework, as pointed out by Diggle et al. [Citation13], we can not estimate simultaneously the first and second-order structure of an inhomogeneous point process from a single realization without any additional information, such as covariates or the specification of a parametric model.

In this work we propose a semiparametric bootstrap bandwidth selector for the kernel estimator of the pair correlation function of SOIRI point processes. This proposal is inspired on the bootstrap bandwidth selector introduced by Loh and Jang [Citation9] for the two-point correlation function of homogeneous point processes. We have conducted a simulation study with two main aims: (a) analyse the role of the first-order intensity estimator on the performance of the kernel estimator of the pair correlation function, and (b) test the performance of the bootstrap bandwidth selector by comparison with the rule of thumb and cross-validation procedures currently available. The plan of this paper is the following. Section 2 provides some background on the pair correlation function and its kernel estimators for SOIRI and SOIRI point processes. In Section 3, we introduce the bootstrap bandwidth selector. Section 4 outlines the results of the simulation study. The paper ends with a discussion in Section 5.

2. Kernel estimator of the pair correlation function

Consider a spatial point process X observed on a bounded domain WR2, and let |B| and N(B) denote the area and number of events observed in any Borel set BR2. For any k1 we say that X has kth-order intensity function, λk:(R2)k(0,), if for any disjoint Borel sets, B1,Bk E[N(B1),,N(Bk)]=B1Bkλk(x1,,xk)dx1dxk<.Intuitively, λk(x1,,xk)|B1|,,|Bk| gives the probability of observing one event in each Borel set. In particular for k = 1 and k = 2, we have the first and second-order intensities of X, which are also defined as follows [Citation2] λ(x)=lim|dx|0E[N(dx)]|dx|,which measures the expected number of events per unit area, and λ2(x,y)=lim|dx|,|dy|0E[N(dx)N(dy)]|dx||dy|.The pair correlation function (PCF) is defined as the ratio between the second and first-order intensity functions (1) g(x,y)=λ2(x,y)λ(x)λ(y),(1) and measures how the occurrence of an event in x affects the intensity at location y. A stationary point process has constant intensity, λ(x)=λ, and its second-order intensity and pair correlation function depend on the distance between points λ2(x,y)=λ2(xy) and g(x,y)=g(xy). For stationary and isotropic point processes λ2(x,y)=λ2(xy)=λ2(r), and g(x,y)=g(r) for any r>0. An inhomogeneous spatial point process with first-order intensity bounded away from 0 is second-order intensity reweighted stationary (SOIRS) if g(x,y)=g(xy), and it is second-order intensity reweighted isotropic (SOIRI) when g(x,y)=g(xy) [Citation5].

A typical kernel estimator of the PCF for a SOIRS point process is (2) gˆb(t)=x,yXkb(tx+y)D(x,y)λ(x)λ(y),(2) where kb(r)=b2k(r/b) is a two-dimensional kernel function with scalar bandwidth parameter b, and D(x,y) is the edge correction. Among the edge correctors proposed for second-order properties [Citation1,Citation3], in this work we use the translation factor D(x,y)=|(Wx)(Wy)|=|W(Wx+y)|. Under isotropy, i.e. for a SOIRI point process, expression (Equation2) reduces to (3) gˆb(r)=12πx,yXkb(rxy)D(x,y)λ(x)λ(y)xy,(3) where kb(r)=b1k(r/b) is a one-dimensional kernel function with bandwidth parameter b. Typical choices of k include the uniform kernel, k(r)=I(|r|1/2), recommended by Illian et al. [Citation3] and the Epanechnikov kernel k(x)=34(1x2)+I(|r|1),where I() is the indicator function, and (x)+=max{0,x} being the latter the most widely used.

The kernel estimator of the PCF is computationally fast and has a good performance for any rb, but the estimators in expressions (Equation2)–(Equation3) have a large negative bias when r<b. To overcome this drawback, Guan [Citation11] proposed a bias-corrected estimator which, for SOIRI point processes, is given by (4) gˆc,b(r)=gˆb(r)cb(r);cb(r)=1r/bk(t)dt.(4) The denominator in (Equation4) is equal to 1 if rb, thus gˆc,b(r)=gˆb(r), and smaller than 1 if 0<r<b. Using the second-order Campbell formula, a second-order Taylor expansion, and the symmetry of the kernel function, the bias of gˆb(r) for rb is (5) B[gˆb(r)]=b22g(r)11t2k(t)dt+O(b4),(5) which tends to 0 as b0, if g(r) is bounded. In particular, gˆb(r) is unbiased for Poisson point processes, as in this case g(r)=1.

On the other hand, for r<b we have (6) B[gˆc,b(r)]=cb(r)1[bg(r)1r/btk(t)dt+b22g(r)1r/bt2k(t)dt+O(b3)],(6) which, as in (Equation5), tends to 0 as b0, if g(r) and g(r) are bounded. Therefore, gˆc,b(r) is asymptotically unbiased for all r>0 (see details in Appendix A.1).

Using second to fourth order Campbell formulas, Taylor expansions, and ignoring higher-order interactions, as done by Stoyan and Stoyan [Citation6] for homogeneous point processes (see details in Appendix A.2), the variance of the kernel PCF for any rb is (7) Var[gˆb(r)]=12π2u<r+bk2((ru)/b)GW(2)(u)u2b2g(u)du=1bπ11k2(t)GW(2)(rbt)(rbt)g(rbt)dt,(7) and for r<b we have (8) Var[gˆc,b(r)]=12π2cb(r)2u<r+bk2((ru)/b)GW(2)(u)u2b2g(u)du=1bπcb(r)21r/bk2(t)GW(2)(rbt)(rbt)g(rbt)dt.(8) Due to the term (rbt) in the denominator of the right hand side of expression (Equation7), the variance increases as r0. On the other hand, if the first-order intensity is bounded, λ(x)<λmax, for some λmax>0, then GW2(u)1/(|WWu|λmax), which tends to ∞ as r tends to the diameter of W. Therefore we need to establish an upper limit, rmax, up to which we estimate the pair correlation function. Given that in most cases interaction between events acts at small scales, this restriction does not imply any limitation in practice.

3. Bootstrap bandwidth selector

As we can see in expressions (Equation5)–(Equation8) the goodness-of-fit of the kernel PCF estimator depends on the bandwidth parameter, which has to balance the well known bias-variance trade-off. For this reason, a global error measure such as the weighted mean integrated squared error (9) MISE(b)=0RE[gˆb(r)g(r)]2w(r)dr,(9) where R=rmax is the upper lag, and w(r) a weighting factor, was proposed as a risk function to select the optimal bandwidth for the kernel estimator of the pair correlation function [Citation11,Citation12]. Jang and Loh [Citation10] used a closed-form expression of the asymptotic mean squared error (AMSE) to propose a plug-in adaptive bandwidth selector for the two point correlation function of homogeneous point processes.

Considering w(r)=1 in expression (Equation9), we obtain the asymptotic mean integrated squared error as follows (10) AMISE(B)=obAMSEc(r,b)dr+bRAMSE(r,b)dr,(10) where AMSE and AMSEc refer to the asymptotic mean squared error and bias-corrected mean squared errors (details in Appendix A.3). Integration of expressions (Equation5) and (Equation7) yields the following expression for the second term in (Equation10) (11) bRAMSE(r,b)dr=bRb44(g(r)μ2(k))2dr+1bπbR11k2(t)GW(2)(rbt)(rbt)g(rbt)dtdr,(11) but integration of AMSEC(r,b) does not provide a tractable closed-form expression for the bias-corrected AMISE when r<b. In addition to the complex expression of the bias-corrected AMISE, the potential development of a plug-in bandwidth selector based on the AMISE, as done by Jang and Loh [Citation10] for homogeneous point processes, is also hindered by the need of pilot estimators of g(r), g(r) and λ(x).

Following the proposal of Loh and Jang [Citation9] for the two point correlation function of stationary point processes, here we introduce a semiparametric bootstrap bandwidth selector for the kernel estimator of the pair correlation function of SOIRI point processes. We use the MISE in expression (Equation9) with w(r)=1 as criterion to select the bandwidth parameter, and decompose it into its bias and variance terms. The variance term is estimated by the marked point bootstrap algorithm [Citation14,Citation15] using random sampling [Citation16] instead of block bootstrap as in Loh [Citation15]. The bias term is estimated through a plug-in approach based on a parametric pilot model, given that the nonparametric bootstrap provides unbiased estimators of the pair correlation function. Therefore, the bandwidth parameter is selected by the following semiparametric bootstrap algorithm:

  1. For a given bandwidth, b, compute the bias-corrected kernel estimator of the pair correlation function, gˆc,b(r).

  2. Generate J bootstrap resamples gb,j(r),j=1,J of gˆc,b(r) implementing the marked point bootstrap algorithm as follows:

    (2.1)

    mark each event, {xi}i=1N, with the kernel estimator of the local PCF, mi=gˆb(r,xi), i.e. its contribution to the kernel estimator of the PCF gˆb(r,xi)=12πyX,yxikb(rxiy)D(xi,y)λ(xi)λ(y)xiy.

    (2.2)

    Generate J independent resamples of size N by random sampling with replacement on 1,,N, {i1,j,,iN,j}j=1J, and define the jth bootstrap PCF as the mean of the corresponding local PCFs gb,j(r)=1cb(r)Nk=1Ngˆb(r,xik,j).

  3. Fit a parametric model to the observed pattern and compute the corresponding PCF estimator gˆ(r;θˆ)

  4. Compute the bootstrap MISE MISE(b)=E[0R[gb(r)gˆ(r;θˆ)]2dr]=0RBias2[gb(r)]dr+0RVar[gb(r)dr],where Var[gb(r)] is the variance with respect to the sampling bootstrap distribution, and the bias can be estimated through two alternative procedures.

    1. Asymptotic bias: using the first and second-order derivatives of gˆ(r;θˆ) as plug-in functions in expressions (Equation5) and (Equation6).

    2. Empirical bootstrap bias: Bias[gb(r)]=g¯b(r)gˆ(r;θˆ),where g¯b(r) is the empirical mean of the J bootstrap pair correlation functions.

It should be noted that for inhomogeneous point processes we need to estimate the first-order intensity function prior to obtaining the kernel or parametric estimator of the pair correlation function. Therefore, we need to check in which extent the intensity estimator affects bandwidth selection and, consequently, the performance of the kernel estimator of the pair correlation function.

Parametric or nonparametric approaches can be used to estimate the intensity function. We may know or assume that the intensity belongs to certain parametric family, and estimate the unknown parameters by maximum pseudolikelihood [Citation2,Citation17]. However, unreliable estimates can be obtained if the assumed parametric model deviates from the true intensity function. For this reason, nonparametric approaches, such as the kernel intensity estimator with fixed [Citation2,Citation18] or adaptive bandwidth [Citation19], should be used when the observed pattern is the only information available. Finally, a kernel estimator based on covariates can be used if the intensity function depends on some spatial covariates but the parametric relationship among them is unknown. A rule of thumb [Citation16] and bootstrap [Citation20] bandwidth selectors have been proposed for this estimator.

4. Simulation study

We have conducted a simulation study with two main aims: (i) to test whether the estimator of the first-order intensity affects the performance of the kernel estimator of the pair correlation function, and (ii) to check the performance of the bootstrap bandwidth selector proposed in this work by comparison with the rule of thumb [Citation6], least squares cross-validation [Citation11,Citation12] and composite likelihood cross-validation [Citation8] bandwidth selectors. For this purpose, we have simulated 500 realizations of inhomogeneous Poisson and clustered point processes in the unit square. We generated inhomogeneous Poisson point processes with m=λ(x)dx=200 and first-order intensity defined by two models:

Model 1

Using the simulation study in [Citation15] as reference, λ(x) is a log-Gaussian intensity function with a Gaussian Random field as spatial covariate, see Figures  and (top).

Model 2

λ(x) is a log-Gaussian intensity function with wave model Z(x,y)=10.5cos2(5x) as spatial covariate [Citation21], see Figures  and (bottom).

Figure 1. Model 1: Gaussian random field used as spatial covariate (left), first-order intensity function of the Poisson point process (center), and pair correlation function of the Thomas cluster point processes with parent points intensity proportional to λ(x) and μ=5.

Figure 1. Model 1: Gaussian random field used as spatial covariate (left), first-order intensity function of the Poisson point process (center), and pair correlation function of the Thomas cluster point processes with parent points intensity proportional to λ(x) and μ=5.

Figure 2. Model 2: Spatial covariate (left), first-order intensity function of the Poisson point process (center), and pair correlation function of the Thomas cluster point processes with parent points intensity proportional to λ(x) and μ=5.

Figure 2. Model 2: Spatial covariate (left), first-order intensity function of the Poisson point process (center), and pair correlation function of the Thomas cluster point processes with parent points intensity proportional to λ(x) and μ=5.

Figure 3. Poisson (left) and Thomas cluster point processes generated by Model 1 (top) and Model 2 (bottom).

Figure 3. Poisson (left) and Thomas cluster point processes generated by Model 1 (top) and Model 2 (bottom).

We generated Thomas Cluster point processes with σ=0.05 and σ=0.025 to generate different degrees of clustering and inhomogeneous parent points processes with κ=40 and first-order intensity proportional to those of the Poisson point processes Figures .

The simulation study was conducted with the aid of spatstat [Citation22], ks [Citation23] and RandomFields [Citation24] packages of R [Citation25].

4.1. Effect of the first-order intensity estimators

We have considered three different scenarios to analyse the effect of the first-order intensity estimator on the goodness-of-fit of the kernel estimator of the PCF. In particular, we consider the following three scenarios for the intensity estimation:

  • Parametric intensity estimator with known covariate, estimated by maximum pseudolikelihood [Citation17].

  • Kernel intensity estimator with plug-in matrix bandwidth selector [Citation18,Citation26] and scalar adaptive bandwidth [Citation27].

  • Kernel intensity estimator based on a known spatial covariate with rule of thumb [Citation16] and bootstrap [Citation20] bandwidth.

For each realization of the point processes, we estimated the pair correlation function using the bias-corrected kernel estimator in expression (Equation4) using the three intensity estimators outlined above, and selecting the optimal bandwidth as that minimizing the empirical ISE (12) ISEθ(b)=0R[gˆb(r)gˆθˆ(r)]2dr(12) where gˆθˆ(r) is the pilot parametric PCF estimator obtained using the same first-order intensity used in the kernel PCF. We have applied a Monte Carlo test based on the inhomogeneous L-function with B = 99 realizations of the null hypothesis to test the accuracy of the intensity estimators and the ISE (13) ISE(b)=0R[gˆb(r)g(r)]2dr(13) where g(r) is the theoretical PCF, as discrepancy measure to compare the kernel PCF estimators.

Figure  shows the first-order intensity estimators and the inhomogeneous L-tests for the Poisson point processes generated by Model 1. Comparison between the target (Figure , center) and the intensity estimators shows that the parametric and covariate-based kernel estimators provide accurate estimators of λ(x), whereas the data-based kernel estimator with both fixed and adaptive bandwidth tends to oversmooth the intensity function. The inhomogeneous L-test shows that the five intensity estimators capture the second-order structure of the IPP. Figure (left) shows that, as expected, given that the theoretical PCF is constant, the optimal bandwidth tends to the upper limit of the selection interval. Although similar bandwidth parameters were obtained with the five intensity estimators, comparison between the ISEs Figure (right) shows that the kernel PCFs with covariate-based estimators and adaptive data-based kernel estimator perform better than that with the raw kernel intensity estimator with plug-in bandwidth matrix. PCF estimators with parametric and covariate-based kernel estimator with rule-of-thumbs outperformed those with data-based kernel estimator and covariate-based kernel with bootstrap bandwidth for Model 2 Figures  and . In fact, deviations from the Poisson model were detected at large distances when using the raw and covarate-based kernel intensity estimators with bootstrap bandwidth, whereas the rule of thumbs covariate-based estimators reported a better performance. Figures  and  show the intensity estimators and inhomogeneous L-test for the Thomas cluster point processes with parent point processes generated by Model 1 and 2, respectively. The parametric and covariate-based kernel estimators capture the first and second-order structures of the simulated patterns for the four point processes considered. The data-based kernel estimators, with both fixed and adaptive bandwidth, show a bias towards undersmoothing the intensity function, λ(x), as these approaches interpret the positive interaction between events as hetereogeneity. Consequently, as shown in the inhomogeneous L-tests, we obtain negatively biased kernel estimators of the PCF with larger bias and ISE for clustered point processes with σ=0.025.

Figure 4. Model 1: First-order intensity estimators (top), parametric estimator (λˆθ), kernel smoothing with plug-in bandwidth matrix (λˆH) and adaptive scalar bandwidth (λˆh(x)), covariate-based kernel estimator with rule of thumb (λˆZh), and bootstrap bandwidth (λˆZhboot). Inhomogeneous L-test for the Poisson point processes (bottom), confidence envelope (grey), mean inhomogeneous L-function of the null hypothesis (red), and inhomogeneous L-functon of the observed pattern.

Figure 4. Model 1: First-order intensity estimators (top), parametric estimator (λˆθ), kernel smoothing with plug-in bandwidth matrix (λˆH) and adaptive scalar bandwidth (λˆh(x)), covariate-based kernel estimator with rule of thumb (λˆZh), and bootstrap bandwidth (λˆZhboot). Inhomogeneous L-test for the Poisson point processes (bottom), confidence envelope (grey), mean inhomogeneous L-function of the null hypothesis (red), and inhomogeneous L-functon of the observed pattern.

Figure 5. Model 1: Optimal bandwidth (left) and ISE (right) with parametric (λˆθ), kernel with plug-in bandwidth matrix (λˆH) and adaptive scalar bandwidth (λˆh(x)), covariate-based kernel with rule of thumb (λˆZh), and bootstrap bandwidth (λˆZhboot) intensity estimators for Poisson point processes.

Figure 5. Model 1: Optimal bandwidth (left) and ISE (right) with parametric (λˆθ), kernel with plug-in bandwidth matrix (λˆH) and adaptive scalar bandwidth (λˆh(x)), covariate-based kernel with rule of thumb (λˆZh), and bootstrap bandwidth (λˆZhboot) intensity estimators for Poisson point processes.

Figure 6. Model 2: First-order intensity estimators (top), and 1nhomogeneous L-test for the Poisson point processes (bottom), confidence envelope (grey), mean inhomogeneous L-function of the null hypothesis (red), and inhomogeneous L-functon of the observed pattern. See details in the caption of Figure .

Figure 6. Model 2: First-order intensity estimators (top), and 1nhomogeneous L-test for the Poisson point processes (bottom), confidence envelope (grey), mean inhomogeneous L-function of the null hypothesis (red), and inhomogeneous L-functon of the observed pattern. See details in the caption of Figure 4.

Figure 7. Model 2: Optimal bandwidth (left) and ISE (right) with parametric (λˆθ), kernel (λˆH), covariate-based kernel with rule of thumb (λˆZh), and bootstrap bandwidth (λˆZhboot) intensity estimators for Poisson point processes.

Figure 7. Model 2: Optimal bandwidth (left) and ISE (right) with parametric (λˆθ), kernel (λˆH), covariate-based kernel with rule of thumb (λˆZh), and bootstrap bandwidth (λˆZhboot) intensity estimators for Poisson point processes.

Figure 8. Model 1: First-order intensity estimators (top), and 1nhomogeneous L-test for Thomas cluster point processes with μ=5 and σ=0.05 (bottom). See details in the caption of Figure .

Figure 8. Model 1: First-order intensity estimators (top), and 1nhomogeneous L-test for Thomas cluster point processes with μ=5 and σ=0.05 (bottom). See details in the caption of Figure 4.

Figure 9. Model 1: Optimal bandwidth (left) and ISE (right) with parametric (λˆθ), kernel (λˆH), covariate-based kernel with rule of thumb (λˆZh), and bootstrap bandwidth (λˆZhboot) intensity estimators Thomas cluster point processes with μ=5 and σ=0.05.

Figure 9. Model 1: Optimal bandwidth (left) and ISE (right) with parametric (λˆθ), kernel (λˆH), covariate-based kernel with rule of thumb (λˆZh), and bootstrap bandwidth (λˆZhboot) intensity estimators Thomas cluster point processes with μ=5 and σ=0.05.

Figure 10. Model 1: First-order intensity estimators (top), and 1nhomogeneous L-test for Thomas cluster point processes with μ=5 and σ=0.025 (bottom). See details in the caption of Figure .

Figure 10. Model 1: First-order intensity estimators (top), and 1nhomogeneous L-test for Thomas cluster point processes with μ=5 and σ=0.025 (bottom). See details in the caption of Figure 4.

Figure 11. Model 1: Optimal bandwidth (left) and ISE (right) with parametric (λˆθ), kernel (λˆH), covariate-based kernel with rule of thumb (λˆZh), and bootstrap bandwidth (λˆZhboot) intensity estimators Thomas cluster point processes with μ=5 and σ=0.025.

Figure 11. Model 1: Optimal bandwidth (left) and ISE (right) with parametric (λˆθ), kernel (λˆH), covariate-based kernel with rule of thumb (λˆZh), and bootstrap bandwidth (λˆZhboot) intensity estimators Thomas cluster point processes with μ=5 and σ=0.025.

Figure 12. Model 2: First-order intensity estimators (top), and 1nhomogeneous L-test for Thomas cluster point processes with μ=5 and σ=0.05 (bottom). See details in the caption of Figure .

Figure 12. Model 2: First-order intensity estimators (top), and 1nhomogeneous L-test for Thomas cluster point processes with μ=5 and σ=0.05 (bottom). See details in the caption of Figure 4.

Figure 13. Model 2: Optimal bandwidth (left) and ISE (right) with parametric (λˆθ), kernel (λˆH), covariate-based kernel with rule of thumb (λˆZh), and bootstrap bandwidth (λˆZhboot) intensity estimators Thomas cluster point processes with μ=5 and σ=0.05.

Figure 13. Model 2: Optimal bandwidth (left) and ISE (right) with parametric (λˆθ), kernel (λˆH), covariate-based kernel with rule of thumb (λˆZh), and bootstrap bandwidth (λˆZhboot) intensity estimators Thomas cluster point processes with μ=5 and σ=0.05.

Figure 14. Model 2: First-order intensity estimators (top), and 1nhomogeneous L-test for Thomas cluster point processes with μ=5 and σ=0.025 (bottom). See details in the caption of Figure .

Figure 14. Model 2: First-order intensity estimators (top), and 1nhomogeneous L-test for Thomas cluster point processes with μ=5 and σ=0.025 (bottom). See details in the caption of Figure 4.

Figure 15. Model 2: Optimal bandwidth (left) and ISE (right) with parametric (λˆθ), kernel (λˆH), covariate-based kernel with rule of thumb (λˆZh), and bootstrap bandwidth (λˆZhboot) intensity estimators Thomas cluster point processes with μ=5 and σ=0.025.

Figure 15. Model 2: Optimal bandwidth (left) and ISE (right) with parametric (λˆθ), kernel (λˆH), covariate-based kernel with rule of thumb (λˆZh), and bootstrap bandwidth (λˆZhboot) intensity estimators Thomas cluster point processes with μ=5 and σ=0.025.

4.2. Comparison of bandwidth selectors

In the second stage of the simulation study we focus on testing the performance of the semiparametric bandwidth selector with asymptotic (ABOOT) and empirical (EBOOT) bias. For this purpose, we have applied both bandwidth selectors to the simulated patterns with J = 200 using the five intensity estimators tested in Section 4.1. Finally, we considered the Thomas Cluster model, used to generate the simulated patterns, and a Matérn cluster model to analyse the role of the parametric pilot model needed to estimate the bias in the bootstrap algorithm.

In addition to comparing the two bootstrap algorithms proposed in this work, we have also compared the bootstrap bandwidth selector with the rule of thumb [Citation6], the composite likelihood (CLCV, [Citation8]) and least squares (LSCV, [Citation11]) cross-validation bandwidth selectors currently available in the spatatst package of R [Citation22]. The cross-validation bandwidth selectors use the bias-corrected PCF estimator in (Equation4), and the LSCV selector incorporates the fast algorithm by Jalilian and Waagepetersen [Citation12]. As above, we use the ISE (expression (Equation13)) to compare the goodness-of-fit of the kernel PCF estimators.

Tables  and  outline the mean optimal bandwidth (bopt) and MISE of the kernel PCF estimators obtained for the different models, intensity estimators and bandwidth selectors tested in this work. As expected, in view of the results obtained in Section 4.1, the PCF estimator with raw kernel intensity estimators have a poor performance in comparison with those with covariate-based intensity estimators, mainly for the clustered point processes with adaptive bandwidth. Therefore, in advance we focus on the PCF estimators with covariate-based intensity estimators to compare the bandwidth selectors. It should also be noted that the rule of thumb, whose optimal bandwidth depends just on the expected number of events, provides the same optimal bandwidth regardless of the structure of the spatial point process and first-order intensity estimator used. The kernel PCF estimators obtained with this bandwidth tend to undersmooth the target PCF and have the largest ISE, with larger bias and variance than the remainder criteria, in particular at small distances (r0) (see details in Figures ).

For Poisson point processes, the cross-validation and bootstrap bandwidth selector provide large optimal bandwidths, as expected for a constant theoretical PCF, although the cross-validation bandwidths have larger variability than the bootstrap ones and tend to underestimate the optimal bandwidth. As a consequence, the ISE of kernel PCF estimators with cross-validation bandwidth are larger than those with bootstrap bandwidth Figures  and . Comparison between the bootstrap procedures does not report significant differences between using the asymptotic or empirical bias, or any effect of the wrong parametric model (MC) on the performance of the bandwidth selector Figures .

Figure 16. Model 1, IPP: Optimal bandwidth and MISE of the kernel PCF. Theoretical bandwidth (ISEopt), Stoyan, least-squares (LSCV) and composite likelihood (CLIK) cross-validation, and bootstrap bandwidth selector with asymptotic (BOOT) and empirical (EBOOT) bias with Thomas cluster and Matérn (BOOTmc, EBOOTmc) pilot models, for parametric and covariate-based kernel intensity estimators.

Figure 16. Model 1, IPP: Optimal bandwidth and MISE of the kernel PCF. Theoretical bandwidth (ISEopt), Stoyan, least-squares (LSCV) and composite likelihood (CLIK) cross-validation, and bootstrap bandwidth selector with asymptotic (BOOT) and empirical (EBOOT) bias with Thomas cluster and Matérn (BOOTmc, EBOOTmc) pilot models, for parametric and covariate-based kernel intensity estimators.

Figure 17. Model 1, IPP: Nonparametric pair correlation function (left), bias and variance of the estimators for the different bandwidth selectors and intensity estimators considered.

Figure 17. Model 1, IPP: Nonparametric pair correlation function (left), bias and variance of the estimators for the different bandwidth selectors and intensity estimators considered.

Figure 18. Model 2, IPP: Optimal bandwidth and MISE of the kernel PCF. Theoretical bandwidth (ISEopt), Stoyan, least-squares (LSCV) and composite likelihood (CLIK) cross-validation, and bootstrap bandwidth selector with asymptotic (BOOT) and empirical (EBOOT) bias with Thomas cluster and Matérn (BOOTmc, EBOOTmc) pilot models, for parametric and covariate-based kernel intensity estimators.

Figure 18. Model 2, IPP: Optimal bandwidth and MISE of the kernel PCF. Theoretical bandwidth (ISEopt), Stoyan, least-squares (LSCV) and composite likelihood (CLIK) cross-validation, and bootstrap bandwidth selector with asymptotic (BOOT) and empirical (EBOOT) bias with Thomas cluster and Matérn (BOOTmc, EBOOTmc) pilot models, for parametric and covariate-based kernel intensity estimators.

Figure 19. Model 2, IPP: Nonparametric pair correlation function (left), bias and variance of the estimators for the different bandwidth selectors and intensity estimators considered.

Figure 19. Model 2, IPP: Nonparametric pair correlation function (left), bias and variance of the estimators for the different bandwidth selectors and intensity estimators considered.

Table 1. Model 1: Optimal bandwidth (bopt) and MISE of the kernel PCF.

Table 2. Model 2: Optimal bandwidth (bopt) and MISE of the kernel PCF.

For Thomas cluster point processes with intensity defined by Model 1 Figures  – , all bandwidth selectors but the asymptotic bootstrap with Thomas cluster pilot model underestimate the optimal bandwidth, but this bias in the bandwidth selector does not affect the performance of the kernel PCF. In contrast, the bootstrap bandwidth selector with asymptotic bias (ABOOT) tend to overestimate the optimal bandwidth, resulting in a larger ISE in the most clustered point process, σ=0.025, whereas the bootstrap bandwidth selector with empirical bias (EBOOT) provides a good estimator of the optimal bandwidth. We have not found any negative impact of using a wrong parametric pilot model in the bootstrap bandwidth selector. Finally, Figures  and show that the covariate-based kernel estimator with rule of thumb bandwidth performs slightly worse than that with bootstrap bandwidth and the parametric estimator.

Figure 20. Model 1, TC(μ=5,σ=0.05): Optimal bandwidth and MISE of the kernel PCF. Theoretical bandwidth (ISEopt), Stoyan, least-squares (LSCV) and composite likelihood (CLIK) cross-validation, and bootstrap bandwidth selector with asymptotic (BOOT) and empirical (EBOOT) bias with Thomas cluster and Matérn (BOOTmc and EBOOTmc) pilot models, for parametric and covariate-based kernel intensity estimators.

Figure 20. Model 1, TC(μ=5,σ=0.05): Optimal bandwidth and MISE of the kernel PCF. Theoretical bandwidth (ISEopt), Stoyan, least-squares (LSCV) and composite likelihood (CLIK) cross-validation, and bootstrap bandwidth selector with asymptotic (BOOT) and empirical (EBOOT) bias with Thomas cluster and Matérn (BOOTmc and EBOOTmc) pilot models, for parametric and covariate-based kernel intensity estimators.

Figure 21. Model 1, TC(μ=5,σ=0.05): Nonparametric pair correlation function (left), bias and variance of the estimators for the different bandwidth selectors and intensity estimators considered.

Figure 21. Model 1, TC(μ=5,σ=0.05): Nonparametric pair correlation function (left), bias and variance of the estimators for the different bandwidth selectors and intensity estimators considered.

Figure 22. Model 1, TC(μ=5,σ=0.025): Optimal bandwidth and MISE of the kernel PCF. Theoretical bandwidth (ISEopt), Stoyan, least-squares (LSCV) and composite likelihood (CLIK) cross-validation, and bootstrap bandwidth selector with asymptotic (BOOT) and empirical (EBOOT) bias with Thomas cluster and Matérn (BOOTmc and EBOOTmc) pilot models, for parametric and covariate-based kernel intensity estimators.

Figure 22. Model 1, TC(μ=5,σ=0.025): Optimal bandwidth and MISE of the kernel PCF. Theoretical bandwidth (ISEopt), Stoyan, least-squares (LSCV) and composite likelihood (CLIK) cross-validation, and bootstrap bandwidth selector with asymptotic (BOOT) and empirical (EBOOT) bias with Thomas cluster and Matérn (BOOTmc and EBOOTmc) pilot models, for parametric and covariate-based kernel intensity estimators.

Figure 23. Model 1, TC(μ=5,σ=0.025): Nonparametric pair correlation function (left), bias and variance of the estimators for the different bandwidth selectors and intensity estimators considered.

Figure 23. Model 1, TC(μ=5,σ=0.025): Nonparametric pair correlation function (left), bias and variance of the estimators for the different bandwidth selectors and intensity estimators considered.

For Thomas cluster point processes with intensity defined by Model 2 and σ=0.05 Figures  and , CLCV underestimates the optimal bandwidth and the bootstrap bandwidth selector with asymptotic bias (ABOOT) tends to overestimate the optimal bandwidth. For point processes with σ=0.025 Figures  and all selector but the asymptotic bootstrap with Thomas cluster pilot model, which overestimates the optimal bandwidth, have a slight bias towards underestimating the target bandwidth. However, these biases in the optimal bandwidth have a minor impact on the goodness-of-fit of the kernel PCF estimator, as we do not observe differences between the respective ISEs. As observed for Model 1, PCF estimators with the rule of thumbs covariate-based kernel intensity estimator have larger ISE than those with bootstrap bandwidth in the covariate-based kernel and parametric intensity estimator, which have similar performance.

Figure 24. Model 2, TC(μ=5,σ=0.05): Optimal bandwidth and MISE of the kernel PCF. Theoretical bandwidth (ISEopt), Stoyan, least-squares (LSCV) and composite likelihood (CLIK) cross-validation, and bootstrap bandwidth selector with asymptotic (BOOT) and empirical (EBOOT) bias with Thomas cluster and Matérn (BOOTmc and EBOOTmc) pilot models, for parametric and covariate-based kernel intensity estimators.

Figure 24. Model 2, TC(μ=5,σ=0.05): Optimal bandwidth and MISE of the kernel PCF. Theoretical bandwidth (ISEopt), Stoyan, least-squares (LSCV) and composite likelihood (CLIK) cross-validation, and bootstrap bandwidth selector with asymptotic (BOOT) and empirical (EBOOT) bias with Thomas cluster and Matérn (BOOTmc and EBOOTmc) pilot models, for parametric and covariate-based kernel intensity estimators.

Figure 25. Model 2, TC(μ=5,σ=0.05): Nonparametric pair correlation function (left), bias and variance of the estimators for the different bandwidth selectors and intensity estimators considered.

Figure 25. Model 2, TC(μ=5,σ=0.05): Nonparametric pair correlation function (left), bias and variance of the estimators for the different bandwidth selectors and intensity estimators considered.

Figure 26. Model 2, TC(μ=5,σ=0.025): Optimal bandwidth and MISE of the kernel PCF. Theoretical bandwidth (ISEopt), Stoyan, least-squares (LSCV) and composite likelihood (CLIK) cross-validation, and bootstrap bandwidth selector with asymptotic (BOOT) and empirical (EBOOT) bias with Thomas cluster and Matérn (BOOTmc and EBOOTmc) pilot models, for parametric and covariate-based kernel intensity estimators.

Figure 26. Model 2, TC(μ=5,σ=0.025): Optimal bandwidth and MISE of the kernel PCF. Theoretical bandwidth (ISEopt), Stoyan, least-squares (LSCV) and composite likelihood (CLIK) cross-validation, and bootstrap bandwidth selector with asymptotic (BOOT) and empirical (EBOOT) bias with Thomas cluster and Matérn (BOOTmc and EBOOTmc) pilot models, for parametric and covariate-based kernel intensity estimators.

Figure 27. Model 2, TC(μ=5,σ=0.025): Nonparametric pair correlation function (left), bias and variance of the estimators for the different bandwidth selectors and intensity estimators considered.

Figure 27. Model 2, TC(μ=5,σ=0.025): Nonparametric pair correlation function (left), bias and variance of the estimators for the different bandwidth selectors and intensity estimators considered.

5. Discussion

This work introduces a semiparametric bootstrap bandwidth selector for the kernel estimator of the pair correlation function of inhomogenoeus spatial point processes, which can be seen as an extension of the bootstrap bandwidth selector introduced by Loh and Jang [Citation9] for the two points correlation function of homogeneous point processes. To implement this bandwidth selector we split the MISE of the kernel PCF estimator into its bias and variance terms. The bias is estimated by a plug-in procedure using a pilot parametric model for the PCF, and the variance is estimated through a marked point bootstap algorithm [Citation14,Citation15].

This procedure has larger computational cost than plug-in algorithms, such as the AMSE based selector in Jang and Loh [Citation10] for homogeneous point processes. However, as argued in Section 3, the untractable AMISE expression for the bias-corrected kernel PCF estimator and the need of a pilot estimator of the first-order intensity, in addition to the pilot PCF function, do not allow the development of a plug-in estimator, in the inhomogeneous framework. On the other hand, the nonparametric bootstrap algorithm provides a consistent estimator of the variance but, as the bootstrap estimate of the PCF is unbiased [Citation15], we cannot use a nonparametric algorithm.

The simulation study, in addition to testing the performance of the bandwidth selector, addresses a key problem in nonparametric estimation of second-order properties, the need of a pilot estimator of the first-order intensity. As expected, given that we cannot distinguish between heterogeneity and dependence in a single point pattern [Citation2], kernel PCF estimators with a kernel intensity estimator have a large negative bias which increases with the degree of clustering of the simulated point processes. Notice that, in absence of additional information, the kernel intensity function interprets clustering as inhomogeneity providing an estimator of the conditional intensity instead of the first-order intensity. Kernel PCF estimators with parametric intensity estimator and kernel covariate-based intensity estimators have similar performance and, consequently, the latter may be a good choice in real data applications when the covariates driving the spatial distribution of events are known, but not the model that determines the relationship between covariate and events.

Focusing on the bootstrap bandwidth selector, we have seen that the assumption of a wrong pilot model in the bias component has a minor impact on the performance of the bandwidth selector. We have also seen that the asymptotic and empirical plug-in bias estimators provide similar results. The semiparametric bootstrap bandwidth selector outperforms the computationally efficient rule of thumbs [Citation6], which estimates the optimal bandwidth as a function of the number of events. Our proposal is competitive with cross-validation algorithms, but none of them outperforms the other. In fact, even when different optimal bandwidths are obtained with the cross-validation and bootstrap selectors, these differences do not affect the goodness-of-fit of the kernel PCF estimator.

In summary, we have developed a semiparametric bootstrap bandwidth selector for the kernel PCF estimator which is competitive with cross-validation algorithms. We have seen that the bandwidth parameter has a minor impact on the goodness-of-fit of the kernel PCF estimator in comparison with the pilot intensity required to compute the estimator. Recently, Shaw et al. [Citation21] proposed a globally intensity-weighted estimator of the PCF which reduces the bias of PCF estimator with data-based kernel intensity. Comparison of this new approach with those with covariate-based pilot intensity function and adaptation of the semiparametric bootstrap to this new approach would be useful contributions to this research line.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work has been supported by the Agencia Estatal de Investigación (AEI)/FEDER, UE [project numbers MTM2016-76969-P, PID2019-107392RB-I00 and PID2020-116587GB-I00].

References

  • Baddeley A, Rubak E, Turner R. Spatial point patterns: methodology and applications with R. New York: CRC press; 2015.
  • Diggle PJ. Statistical analysis of spatial and spatio-temporal point patterns. 3rd ed. Boca Raton, FL: CRC Press; 2013.
  • Illian DJ, Penttinen PA, Stoyan DH, et al. Statistical analysis and modelling of spatial point patterns. Chichester: John Wiley & Sons; 2008.
  • Møller J, Waagepetersen RP. Statistical inference and simulation for spatial point processes. Florida: Chapman and Hall/CRC; 2003.
  • Baddeley A, Møller J, Waagepetersen R. Non- and semiparametric estimation of interaction in inhomogeneous point patterns. Stat Neerl. 2000;54(3):329–350.
  • Stoyan D, Stoyan H. Fractals, random shapes, and point fields: methods of geometrical statistics. Vol. 302. New York: John Wiley & Sons Inc; 1994.
  • Stoyan D, Stoyan H. Improving ratio estimators of second order point process characteristics. Scand J Stat. 2000;27(4):641–656.
  • Guan Y. A composite likelihood cross-validation approach in selecting bandwidth for the estimation of the pair correlation function. Scand J Stat. 2007;34(2):336–346.
  • Loh JM, Jang W. Estimating a cosmological mass bias parameter with bootstrap bandwidth selection. J R Stat Soc Ser C Appl Stat. 2010;59(5):761–779.
  • Jang W, Loh JM. Bandwidth selection for estimating the two-point correlation function of a spatial point pattern using amise. Stat Sin. 2017;27:1401–1418.
  • Guan Y. A least-squares cross-validation bandwidth selection approach in pair correlation function estimations. Stat Probab Lett. 2007;77(18):1722–1729.
  • Jalilian A, Waagepetersen R. Fast bandwidth selection for estimation of the pair correlation function. J Stat Comput Simul. 2018;88(10):2001–2011.
  • Diggle PJ, Gómez-Rubio V, Brown PE, et al. Second-order analysis of inhomogeneous spatial point processes using case–control data. Biometrics. 2007;63(2):550–557.
  • Loh J, Stein M. Bootstrapping a spatial point process. Stat Sin. 2004;14:69–101.
  • Loh JM. Bootstrapping an inhomogeneous point process. J Stat Plan Inference. 2010;140(3):734–749.
  • Baddeley A, Chang YM, Song Y, et al. Nonparametric estimation of the dependence of a spatial point process on spatial covariates. Stat Interface. 2012;5(2):221–236.
  • Baddeley A, Turner R. Practical maximum pseudolikelihood for spatial point patterns: (with discussion). Aust N Z J Stat. 2000;42(3):283–322.
  • Fuentes-Santos I, González-Manteiga W, Mateu J. Consistent smooth bootstrap kernel intensity estimation for inhomogeneous spatial poisson point processes. Scand J Stat. 2016;43(2):416–435.
  • Davies TM, Hazelton ML. Adaptive kernel estimation of spatial relative risk. Stat Med. 2010;29(23):2423–2437.
  • Borrajo MI, González-Manteiga W, Martínez-Miranda M. Bootstrapping kernel intensity estimation for inhomogeneous point processes with spatial covariates. Comput Stat Data Anal. 2020;144:Article ID: 106875.
  • Shaw T, Møller J, Waagepetersen RP. Globally intensity-reweighted estimators for k-and pair correlation functions. Aust N Z J Stat. 2021;63(1):93–118.
  • Baddeley A, Turner R. Spatstat: an R package for analyzing spatial point patterns. J Stat Softw. 2005;12(6):1–42.
  • Duong T. ks: Kernel Smoothing; 2013. R package version 1.8.13. Available from: http://CRAN.R-project.org/package=ks
  • Schlather M, Menck P, Singleton R, et al. Randomfields: simulation and analysis of random fields. R Package Version. 2013;2:66.
  • R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2020. Available from: https://www.R-project.org/
  • Diggle P. A kernel method for smoothing point process data. J R Stat Soc Ser C Appl Stat. 1985;34(2):138–147. Available from: http://www.jstor.org/stable/2347366
  • Davies TM, Baddeley A. Fast computation of spatially adaptive kernel estimates. Stat Comput. 2018;28(4):937–956.

Appendix

Appendix 1

Performance of the kernel PCF estimator

The mean integrated squared error (MISE) is a common global error measure for the performance of a kernel estimator and, consequently, has been widely used as criterion to find a bandwidth parameter that balances the trade-off between bias and variance. In particular, a weighted mean integrated squared error (A1) MISE(b)=0RE[gˆb(r)g(r)]2w(r)dr(A1) where R=rmax is the upper lag, and w(r) a weight factor, was proposed as a risk function to select the optimal bandwidth for the kernel estimator of the PCF [Citation11,Citation12]. In the same line as Guan [Citation11] and Jalilian and Waagepetersen [Citation12], this section analyses the bias and variance of the kernel PCF estimator and provides an expression for its AMISE.

A.1. Bias

The kth-order Campbell formula [Citation4] states that for any Borel function f:(R2)k[0,) (A2) E[x1,,xkXf(x1,,xk)]=f(x1,,xk)λk(x1,,xk)dx1,dxk=f(x1,,xk)[j=1kλ(xj)]gk(x1,,xk)dx1,dxk<(A2) where gk() is the kth-order normalized joint intensity. Thus, considering f(x,y)=IW(x)IW(y)kb(rxy)|WWxy|λ(x)λ(y)xyand using the second-order Campbell formula (k = 2 in (EquationA2)), we can write the mean of the kernel PCF in (Equation3) for any r0 and b>0 as (A3) E[gˆb(r)]=12πR2u<r+bI(xWWu)kb(ru)|WWu|ug(u)dudx=u<r+bk((ru)/b)bug(u)du=1min{1,r/b}k(t)g(rbt)dt(A3) Thus, E[gˆb(r)]g(r)1r/bk(t)dtg(r)11k(t)dt=g(r)and, consequently, gˆb(r) is asymptotically unbiased for rb and has a negative bias at small distances, i.e. for r<b. This drawback can be overcome using the bias-corrected estimator introduced by Guan [Citation11] (A4) gˆc,b(r)=gˆb(r)cb(r);cb(r)=1r/bk(t)dt(A4) The denominator in (EquationA4) is equal to 1 if rb, thus gˆc,b(r)=gˆb(r), and smaller than 1 if 0<r<b.

Going back to expression (EquationA3), a second-order Taylor expansion yields E[gˆb(r)]=1min{1,r/b}k(t)(g(r)btg(r)+(bt)22g(r))dtThus, given the symmetry of the kernel function, the bias of gˆb(r) for rb is (A5) B[gˆb(r)]=b22g(r)11t2k(t)dt+O(b4)(A5) which tends to 0 as b0, if g(r) is bounded. In particular, gˆb(r) is unbiased for Poisson point processes, as in this case g(r)=1.

On the other hand, for r<b we have (A6) B[gˆc,b(r)]=cb(r)1[bg(r)1r/btk(t)dt+b22g(r)1r/bt2k(t)dt+O(b3)](A6) which, as in (EquationA5), tends to 0 as b0, if g(r) and g(r) are bounded. Therefore, gˆc,b(r) is asymptotically unbiased for all r>0.

A.2. Variance

The variance of the kernel PCF is simply (A7) Var[gˆb(r)]=E[gˆb2(r)]E2[gˆb(r)](A7) The first term in (EquationA7) is (A8) E[gˆb2(r)]=14π2E[(f(x,y))2]=14π2(2E[x,yXf2(x,y)]+4E[x,y,wXf(x,y)f(x,w)]+E[x,y,w,zXf(x,y)f(w,z)])(A8) Using the second-order Campbell formula (k = 2 in (EquationA2)), the mean in the first term of expression (EquationA8) is (A9) E[x,yXf2(x,y)]=E[x,yXIW(x)IW(y)k2((rxy)/b)|WWxy|2λ2(x)λ2(y)xy2b2]=R2u<r+bI(xWWu)k2((ru)/b)|WWu|2λ(x)λ(x+u)u2b2g(u)dudx=u<r+bk2((ru)/b)GW(2)(u)u2b2g(u)du(A9) where GW(2)(u)=1|WWu|2I(xWWu)λ(x)λ(x+u)dxA third-order Campbell formula (k = 3 in (EquationA2)) yields the following expression for the second term in the right hand side of (EquationA8) (A10) E[x,y,wXf(x,y)f(x,w)]=E[x,y,wXk((r|xy|)/b)k((r|xw|)/b)|WWxy||WWxw|λ2(x)λ(y)λ(w)xyxwb2]=R2u,v<r+bI(xWWu)I(xWWv)k((ru)/b)k((rv)/b)|WWu||WWv|λ(x)uvb2g3(u,v)dudvdx=u,v<r+bk((ru)/b)k((rv)/b)GW(3)(u,v)uvb2g3(u,v)dudv(A10) where ge(u,v) is the translation invariant third-order normalized joint intensity function, and GW(3)(u,v)=1|WWu||WWv|I(xWWu)I(xWWv)λ(x)dxFinally, a fourth-order Campbell formula (k = 4 in (EquationA2)) yields the following expression for the last term in the right hand side of Equation (EquationA8) (A11) E[x,y,w,zXf(x,y)f(w,z)]=E[x,y,w,zXk((r|xy|)/b)k((r|wz|)/b)|WWxy||WWwz|λ(x)λ(y)λ(w)λ(z)xywzb2]=R2u,v<r+bI(xWWu)I(xWWv)k((ru)/b)k((rv)/b)|WWu||WWv|uvb2g4(u,x+u,x+v)dudvdx(A11) where g4(u,x+u,x+v) is the translation invariant fourth-order normalized joint intensity.

On the other hand, from expression (EquationA3) the squared mean of gˆb(r) is (A12) E2[gˆb(r)]=14π2[R2u<r+bI(xWWu)kb(ru)|WWu|ug(u)dudx]2=(R2)2uv<r+b×I(xWWu)I(yWWv)k((ru)/b)k((rv)/b)|WWu||WWv|uvb2g(u)g(v)dudvdxdy(A12) Ignoring higher-order interactions, as done by Stoyan and Stoyan [Citation6] for homogeneous point processes, the variance of the kernel PCF for any r>b is (A13) Var[gˆb(r)]12π2u<r+bk2((ru)/b)GW(2)(u)u2b2g(u)du=1bπ11k2(t)GW(2)(rbt)(rbt)g(rbt)dt(A13) and for r<b we have (A14) Var[gˆc,b(r)]12π2cb(r)2u<r+bk2((ru)/b)GW(2)(u)u2b2g(u)du=1bπcb(r)21r/bk2(t)GW(2)(rbt)(rbt)g(rbt)dt(A14) Due to the term (rbt) in the denominator of the right hand side of expression (Equation7), the variance increases as r0. On the other hand if the first-order intensity is bounded, λ(x)<λmax, for some λmax>0, then GW2(u)1/(|WWu|λmax), which tends to ∞ as r tends to the diameter of W. Therefore we need to establish an upper limit, rmax, up to which we estimate the pair correlation function. Given that in most cases interaction between events acts at small scales, this restriction does not imply any limitation in practice. Finally, if we assume an infill asymptotic framework in which m=Wλ(x)dx, b0 and mb0, where m is the expected number of events observed in W, then 1/(|WWu|λmax)0.

A.3. Mean integrated squared error

Considering expressions (EquationA5) and (EquationA13) we obtain the following expression for the asymptotic mean squared error (AMSE) of gˆb(r) for rb (A15) AMSE(r,b)=b44(g(r)μ2(k))2+1bπ11k2(t)GW(2)(rbt)(rbt)g(rbt)dt(A15) where μ2(k)=11t2k(t)dt. Also, the AMSE of gˆc,b(x) for r<b is (A16) cb(r)2AMSEc(r,b)=[bg(r)1r/btk(t)+b22g(r)1r/bt2k(t)dt]2+1bπ1r/bk2(t)GW(2)(rbt)(rbt)g(rbt)dt(A16) Considering w(r)=1 in expression (Equation9), we obtain the asymptotic mean integrated squared error as follows (A17) AMISE(b)=obAMSEc(r,b)dr+bRAMSE(r,b)dr(A17) Integrating expression (EquationA15) we obtain the following expression for the second term in (Equation10) (A18) bRAMSE(r,b)dr=bRb44(g(r)μ2(k))2dr+1bπbR11k2(t)GW(2)(rbt)(rbt)g(rbt)dtdr(A18) but integration of expression (EquationA16) does not provide a tractable closed-form expression for the bias-corrected AMISE when r<b.