470
Views
3
CrossRef citations to date
0
Altmetric
Stochastic Process

Local Inhomogeneous Weighted Summary Statistics for Marked Point Processes

ORCID Icon, , & ORCID Icon
Pages 588-602 | Received 04 Jul 2022, Accepted 19 Apr 2023, Published online: 21 Jun 2023

Abstract

We introduce a family of local inhomogeneous mark-weighted summary statistics, of order two and higher, for general marked point processes. Depending on how the involved weight function is specified, these summary statistics capture different kinds of local dependence structures. We first derive some basic properties and show how these new statistical tools can be used to construct most existing summary statistics for (marked) point processes. We then propose a local test of random labeling. This procedure allows us to identify points, and consequently regions, where the random labeling assumption does not hold, for example, when the (functional) marks are spatially dependent. Through a simulation study we show that the test is able to detect local deviations from random labeling. We also provide an application to an earthquake point pattern with functional marks given by seismic waveforms.

1 Introduction

The analysis of a point pattern, given as a collection of points in a region, typically begins with computing an estimate of some summary statistic which may be used to find specific structures in the data and suggest suitable models (Van Lieshout Citation2000; Daley and Vere-Jones Citation2008; Illian et al. Citation2008; Gelfand et al. Citation2010; Chiu et al. Citation2013).

The choice of summary statistic depends both on the pattern at hand and on the feature or hypothesis of interest.

A widely used summary statistic for descriptive analyses and diagnostics, which is obtained as an instance of the so-called reduced second moment measure (Cressie and Collins Citation2001; Møller Citation2003; Chiu et al. Citation2013), is Ripley’s K-function (Ripley Citation1976), which is based on the assumption of a non-marked stationary and isotropic point process. In the marked case, assuming discrete marks and stationarity, cross versions of the K- or nearest neighbor distance distribution functions have been proposed (Diggle Citation2013). For real-valued marks, the mark correlation type-functions in Penttinen and Stoyan (Citation1989) and Illian et al. (Citation2008) are widely used and such second order statistics have been studied in more detail and reformulated by Schlather (Citation2001), in order to obtain a more rigorous formulation. However, although the assumption of stationarity is mathematically appealing, it can rarely be justified in practice since the intensity tends to change over the study region. This is to say that the underlying point process is inhomogeneous and, in the unmarked case, Baddeley, Møller, and Waagepetersen (Citation2000) proposed an inhomogeneous extension of the K-function for a class of point processes, which are referred to as second order intensity-reweighted stationary. Their ideas were extended to spatio-temporal point processes in Gabriel and Diggle (Citation2009) and Møller and Ghorbani (Citation2012). Further, Møller and Waagepetersen (Citation2003) proposed an extension of this K-function to second order intensity-reweighted stationary multivariate point processes. As indicated in Cronie and van Lieshout (Citation2016) and Iftimi, Cronie, and Montes (Citation2019), this structure may be extended to K-functions for general marked point processes. To analyze higher order interactions in general stationary marked point processes, Van Lieshout (Citation2006) proposed marked versions of the nearest neighbor distance distribution functions, the empty space function and the J-function. These summary statistics, which allow us to study spatial interactions between different mark groupings of the points, were later extended to the inhomogeneous setting by Cronie and van Lieshout (Citation2016) and Iftimi, Cronie, and Montes (Citation2019). In particular, to test for random labeling, Cronie and van Lieshout (Citation2016) proposed inhomogeneous Lotwick-Silverman-type Monte Carlo tests based on their new summary statistics, while Iftimi, Cronie, and Montes (Citation2019) proposed second order Monte Carlo tests based on permuting the attached marks. Further details on the random shift-type testing considered in Lotwick-Silverman-type tests can be found in Mrkvička et al. (Citation2021).

Despite the relatively long history of point process theory (see e.g., Stoyan and Stoyan Citation1994; Daley and Vere-Jones Citation2008; Diggle Citation2013), few approaches have been proposed to analyze spatial point patterns where the features of interest are functions/curves instead of qualitative or quantitative variables. Examples of point patterns with associated functional data include forest patterns where for each tree we have a growth function, curves representing the incidence of an epidemic over a period of time, and the evolution of distinct economic parameters such as unemployment and price rates, all for distinct spatial locations. The study of such configurations allows analyzing the effects of the spatial structure on individual functions. Illian et al. (Citation2006) consider for each point a transformed Ripley’s (Citation1976) K-function to characterize spatial point patterns of ecological plant communities, while Mateu, Lorenzo, and Porcu (Citation2007) build new marked point processes formed by spatial locations and curves defined in terms of Local Indicators of Spatial Association (LISA) functions, which describe local characteristics of the points. They use this approach to classify and discriminate between points belonging to a clutter and those belonging to a feature. Finally, the idea of analyzing point patterns with attached functions has been presented coherently by Comas, Delicado, and Mateu (Citation2011) and Ghorbani et al. (Citation2021).

Ghorbani et al. (Citation2021) introduced a very broad framework for the analysis of Functional Marked Point Processes (FMPPs), indicating how they connect the point process framework with both Functional Data Analysis (FDA; Ramsay and Silverman (Citation2002)) and geostatistics. In particular, they defined a new family of summary statistics, so-called weighted nth order marked inhomogeneous K-functions, together with their nonparametric estimators, which they exploited to analyze Spanish population structures, such as demographic evolution and sex ratio over time. This summary statistic family can be used to run a Monte Carlo test of random labeling, for example, by means of global envelopes test (GET; Myllymäki et al. Citation2017), to assess whether the functional marks of the analyzed pattern are spatially dependent. However, this procedure is essentially global, since it does not provide information on the points which mostly contributed to the rejection of the random labeling hypothesis. Therefore, motivated by the need of detecting such points, and thus the regions in which they are located, where the functional marks really do depend on the surrounding structure, in this article we introduce a new class of summary statistics, local t-weighted marked nth order inhomogeneous K-functions. These are used to propose a local test of random labeling. Here t refers to a function which governs how much weight we put on different aspects of the marked point process/pattern.

Further, we use the developed tools to analyze seismic data. Note that while the spatial (and temporal) locations of the epicenters of earthquakes are typically analyzed within the framework of point processes, the associated seismic waveforms are commonly investigated in separate analyses through FDA. Applying the local test allows us to identify where one would expect waveforms (i.e., functional marks) to be similar to those of nearby points.

All the performed analyses are carried out through the R Core Team (Citation2022) software, and the codes are available from the first author. Preliminary data manipulation is performed through the software Python (Van Rossum and Drake Jr Citation1995).

The structure of the article is as follows. In Section 2, the motivation of this article is presented, showing the dataset and problem that will be further analyzed along the article. Section 3 contains some preliminaries on functional marked point processes. In Section 4, we present our proposed local t-weighted nth order inhomogeneous K-functions and their main properties, also relating them to their global counterparts. Section 5 outlines the main steps to run a local test of random labeling. In Section 6, we present a motivating example to show the further advantages of a local test, compared to a global one. To have a comprehensive understanding of the performance of the proposed local test, we show simulation results under different scenarios. Section 7 provides an application to seismic data. Finally, conclusions are drawn in Section 8.

2 Data and Motivation

Earthquakes’ detection provides a whole set of data which are usually studied separately, that is, spatial (and temporal) occurrence of points through point process theory (Siino et al. Citation2017; Iftimi, Cronie, and Montes Citation2019; D’Angelo et al. Citation2022, to cite just a few recent works), and the analysis of waveforms through FDA (Adelfio et al. Citation2011, Citation2012; Chiodi et al. Citation2013).

A recently released set of data on Italian seismic activity encompasses both of these data types. The Italian seismic dataset for machine learning (INSTANCE) is a dataset of seismic waveforms data and associated metadata (Michelini et al. Citation2021), which includes 54,008 earthquakes for a total of 115,9249 three-channel waveforms. It also contains 132,330 three-channel noise waveforms. For each of these waveforms, 115 metadata (i.e., statistical variables) are available, providing information on station, trace, source, path and quality. Overall, the data are collected by 19 networks which consist of 620 seismic stations. The dataset is available on http://www.pi.ingv.it/instance/.

The earthquake list in the dataset is based on the Italian seismic bulletin (http://terremoti.ingv.it/bsi) of the “Istituto Nazionale di Geofisica e Vulcanologia,” includes events which occurred between January 2005 and January 2020, and in the magnitude range between 0.0 and 6.5. The waveform data have been recorded primarily by the Italian National Seismic Network. depict the earthquake locations and the seismic stations which recorded the events.

Fig. 1 The Italian seismic dataset for machine learning (INSTANCE). (a) Earthquake locations; (b) Seismic stations used for waveforms extraction. The symbol sizes are proportional to earthquake magnitude and number of arrival phases recorded by stations, respectively; (c) Seismic waveforms of some events with magnitude in the range [2,4]. Vertical lines indicate the seismic waves’ arrival times. Source: Michelini et al. (Citation2021).

Fig. 1 The Italian seismic dataset for machine learning (INSTANCE). (a) Earthquake locations; (b) Seismic stations used for waveforms extraction. The symbol sizes are proportional to earthquake magnitude and number of arrival phases recorded by stations, respectively; (c) Seismic waveforms of some events with magnitude in the range [2,4]. Vertical lines indicate the seismic waves’ arrival times. Source: Michelini et al. (Citation2021).

In , some waveforms contained in the dataset are represented. All the waveform traces have a length of 120 sec, are sampled at 100 Hz, and are provided both in counts and ground motion physical units after deconvolution of the instrument transfer functions. The waveform dataset is accompanied by metadata consisting of more than 100 variables providing comprehensive information on the earthquake source, the recording stations, the trace features, and other derived quantities.

3 Preliminaries on Marked Point Processes

Throughout the article, we consider a marked point process Y={(xi,mi)}i=1N (Daley and Vere-Jones Citation2008, Definition 6.4.1), with ground points xi in the d-dimensional Euclidean space Rd,d1, which is equipped with the Lebesgue measure |A|=Adz for Borel sets AB(Rd); a closed Euclidean r-ball around xRd will be denoted by b[x,r]. By definition, the ground process Yg={xi}i=1N, obtained from Y by ignoring the marks, is a well-defined point process on Rd in its own right. Note that, formally, Y is a random element in the measurable space (Nlf,N) of locally finite point configurations/patterns x={((x1,m1),,(xn,mn))},n0 (Van Lieshout Citation2000; Daley and Vere-Jones Citation2008). We assume that the mark space M is Polish and equipped with a finite reference measure ν on the Borel σ-algebra B(M). The Borel σ-algebra B(Rd×M)=B(Rd)B(M) is endowed with the product measure A×E|A|ν(E),A×EB(Rd×M). We will let Y(A×E)=(x,m)Y1{(x,m)A×E}, where 1 is the indicator function, denote the cardinality of the random set Y(A×E). We assume that Y is simple, that is, it almost surely (a.s.) does not contain multiple points in the sense that P(Y({(x,m)})=0 or 1)=1 for all (x,m)Rd×M.

Given this general setup, one may obtain various forms of marked point processes, most notably multivariate/multitype point processes with M={1,,k} (Diggle Citation2013) and functional marked point processes with M given by a suitable function space (Ghorbani et al. Citation2021).

3.1 Functional Marked Point Processes

In this section, we provide the definition of functional marked point processes following Ghorbani et al. (Citation2021).

In classical FDA, one analyses a collection of functions {f1(t),,fn(t)},tT[0,),n1, which take values in some Euclidean space Rk,k1, and belong to some suitable function space, typically an L2-space. Although t usually represents time, it could also represent some other quantity, for example, spatial distance. Classically, one would assume that such collections of functions constitute realizations or samples of some collection of iid random functions or stochastic processes {F1(t),,Fn(t)},tT. Such an assumption may, however, be questioned in certain settings. For example, two functions fi and fj, which are spatially close to each other in Rk, could gain (or lose) from being close to each other. Accordingly, it seems natural to relax the iid assumption for F1,,FN. A natural way to handle such a scenario is to generate F1,,FN conditionally on some collection of (dependent) random spatial locations. Note that the conditional distribution of F1,,FN could render them either independent or dependent.

To facilitate such a setting, we consider a functional marked point process (Ghorbani et al. Citation2021), which is defined as a marked point process where the marks are random elements in some (Polish) function space, M, most notably the space of L2-functions f:TRk. Realizations of FMPPs are called functional marked point patterns. It is noteworthy that the original formal construction of functional marked point processes by Ghorbani et al. (Citation2021) also included an additional nonfunctional mark, so that each ground process point would be marked by a pair which consists of a function and a nonfunctional variable. We here do not consider such auxiliary nonfunctional marks.

3.2 Product Densities

Provided that it exists, the nth order intensity/product density function ρ(n),n1, which is the density of the nth order factorial moment measure α(n), may be specified through the nth order Campbell formula. It states that, for any nonnegative measurable function h on (Rd×M)n, the expectation of the random sum of h satisfies (1) E[(x1,m1),,(xn,mn)Yh((x1,m1),,(xn,mn))]=h((x1,m1),,(xn,mn))ρ(n)((x1,m1),,(xn,mn))i=1ndxiν(dmi),(1) where indicates that the sum is over n-tuples of distinct points of Y. Heuristically, ρ(n)((x1,m1),,(xn,mn))dx1ν(dm1) dxnν(dmn) gives the probability that Y has points in infinitesimal neighborhoods d(xi,mi)(xi,mi)Rd×M with measures dxiν(dmi),i=1,,n. Moreover, we retrieve α(n)((A1×E1)××(An×En)),(Ai×Ei)B(Rd×M), i=1,,n, by letting h be given by the indicator function 1{(x1,m1)(A1×E1),,(xn,mn)(An×En)}. It further follows that ρ(n)((x1,m1),,(xn,mn)) =fx1,,xn(m1,,mn)ρg(n)(x1,,xn), where ρg(n) is the nth order product density of Yg and fx1,,xn(·) is a conditional density function on Mn which governs the joint distribution of n marks, given that their associated ground process points are given by x1,,xnRd. These, in turn, yield the corresponding mark distributions Mx1,,xn(E1,,En) =E1Enfx1,,xn(m1,,mn)i=1nν(dmi), which govern the joint distribution on n marks, given the associated ground process locations.

The intensity measure of Y, which coincides with the first order factorial moment measure, here satisfies (2) α(A×E)=E[Y(A×E)]=AEρ(x,m)dxν(dm)=AEfx(m)ρg(x)dxν(dm)=AMx(E)ρg(x)dx,(2) where the first-order intensity functions ρ=ρ(1) and ρg=ρg(1) are typically referred to as the intensity functions of Y and Yg. Note that ρ may be viewed as a “heat map” which reflects the infinitesimal chance of having a point of Y at/around an arbitrary location in Rd×M. When the intensity function (of the ground process) is constant, we say that the (ground) process is homogeneous, otherwise it is called inhomogeneous.

When, conditional on the ground process, all marks have the same marginal univariate distribution, so that Mz(E)=Efz(m)dν(dm)=Ef(m)dν(m)=M(E), we say that X has a common (marginal) mark distribution. This holds for example, when Y is stationary, that is, when its distribution is invariant under translations of the ground points; here α(A×E)=ρgM(E)|A| and ρg>0 is the constant intensity of the ground process. We will see that, at times, it is particularly convenient to have here that the reference measure ν coincides with the common mark distribution M, which implies that the common mark density f is set to 1 and ρ(x,m)=ρg(x).

When Y is independently marked, that is, when the marks are independent conditional on the ground process, fx1,,xn(m1,,mn)=fx1(m1)fxn(mn) for any n1 and if, in addition, there is a common mark distribution, whereby the marks are iid conditional on the ground process, we say that Y is randomly labeled and note that fx1,,xn(m1,,mn)=f(m1)f(mn).

3.2.1 Intensity Reweighted Stationarity

We next turn to the notion of a kth order marked intensity reweighted stationary (k-MIRS) marked point process Y (Ghorbani et al. Citation2021). We say that Y is k-MIRS, k{1,2,}, if ρ is bounded away from 0 and the correlation functions g(n)((x1,m1),,(xn,mn)) =ρ(n)((x1,m1),,(xn,mn))ρ(x1,m1)ρ(xn,mn) =fx1,,xn(m1,,mn)fx1(m1)fxn(mn)ρg(n)(x1,,xn)ρg(x1)ρg(xn),n1,satisfy g(n)((x1,m1),,(xn,mn))=g(n)((z+x1,m1),,(z+xn,mn)) for any zRd and any nk. Note that g(1)(·)1 and that the second ratio on the right-hand side is the nth order correlation function, gg(n), of the ground process. Provided that the product densities of all orders exist, stationarity implies k-MIRS for all orders k1. Note further that g(n)(·)1,n1, for a Poisson process and when g(n)((x1,m1),,(xn,mn))>1 points of Yg in infinitesimal neighborhoods of x1,,xn with marks in infinitesimal neighborhoods of m1,,mn tend to cluster/aggregate. Similarly, g(n)((x1,m1),,(xn,mn))<1 indicates inhibition/regularity.

3.3 Palm Distributions

Let Y be a simple marked point process whose intensity function exists. Many of the summary statistics we will consider can be expressed in terms of reduced Palm distributions. These satisfy the reduced Campbell–Mecke formula which states that, for any nonnegative measurable function h on the product space (Rd×M)×Nlf, (3) E[(z,m)Yh((z,m),Y\{(z,m)})]=E[h((x,m),Y!(x,m))]ρ(x,m)dxν(dm)=E!(x,m)[h((x,m),Y)]ρ(x,m)dxν(dm).(3)

Here Y!(x,m) is the reduced Palm process at (x,m)Rd×M, which we interpret as Y conditioned on the null event that there is a point in (x, m), which is removed upon realization. The probability distribution P!(x,m)(·)=P!(x,m)(Y·)=P(Y!(x,m)·) on (Nlf,N), which corresponds to E!(x,m), is called the reduced Palm distribution at (x, m).

4 Local Weighted Marked Summary Statistics

Global summary statistics have had a prominent role in the statistical analysis of point processes. More precisely, their nonparametric estimators are typically used to characterize the degree of spatial interaction present in the underlying data-generating point process. In Section 1, we have reviewed a few such examples, for instance K-functions.

The individual contributions to a global statistic, which are commonly called Local Indicators of Spatial Association (LISA) functions, can be used to identify outlying components measuring the influence of each contribution to the global statistic (Anselin Citation1995). This is the case of the scatterplot based on the local Moran index (Anselin Citation1996). On the other hand, the individual contributions can be used to test for specific local structures, such as spatial association and hot spot detection in areal data (Getis and Ord Citation1992). Basically, the local statistics mentioned so far are often used to analyze areal data but Getis and Franklin (Citation1987) introduced a local version of the K-function for spatial point processes to show that trees exhibit different kinds of heterogeneity when examined at different scales of analysis. The notion of individual functions for certain statistics has also been studied in Stoyan and Stoyan (Citation1994) and Mateu, Lorenzo, and Porcu (Citation2010) showed that the local product density function (Cressie and Collins Citation2001) is more sensitive to identifying different local structures and unusual points than the local K-function. Applications of LISA functions range from detection of features in images with noise (Mateu, Lorenzo, and Porcu Citation2007) to detection of disease clusters (Moraga and Montes Citation2011). In Siino et al. (Citation2018) the authors extend local indicators of spatial association to the spatio-temporal context (LISTA functions) based on the second order product density, and these local functions have been used to define a proper statistical test for clustering detection. Recently, LISTA functions have been used both for diagnostic (Adelfio et al. Citation2020) and fitting purposes (D’Angelo, Adelfio, and Mateu Citation2023). Finally, D’Angelo, Adelfio, and Mateu (Citation2021) extended LISTA functions to spatio-temporal point processes living on linear networks.

As we have clearly indicated, an alternative to studying the aforementioned global summary statistics for marked point processes is considering local summary statistics which describe the spatial interaction in the vicinity of a given marked point. In order to do so here in the marked context, we introduce the function (4) L((x,m),x)=Ln((x,m),x;t˜,ρ˜)=(x1,m1),,(xn1,mn1)xt˜((x,m),(x1,m1),,(xn1,mn1))ρ˜(x,m)ρ˜(x1,m1)ρ˜(xn1,mn1),(4) for (x,m)Rd×M, point pattern xNlf, and measurable t˜:(Rd×M)nR, n2. Note that, formally, the argument ρ˜ does not need to be the true intensity function ρ of Y, it could for example, be a plug-in estimator. We will exploit Definition 1, and thereby (4), to define proper notions of (mark-weighted nth order inhomogeneous) local summary statistics.

Definition 1.

Given a marked point process Y, we refer to L((x,m),Y{(x,m)};t˜,ρ˜),(x,m)Y, as the family of nth order local marked cumulative summary statistics of Y associated with t˜ and ρ˜.

The construction of a specific local statistic is obtained by identifying when, for some function family {t˜r}, (5) G(r,Y)=(x,m)YLn((x,m),Y{(x,m)};t˜r,ρ˜)(5) forms an estimator of an existing global summary statistic.

Using nth order local marked cumulative summary statistics to quantify local spatial interactions for a point pattern x entails inserting an estimate ρ̂(x,m)=f̂z(m)ρ̂g(x) for the unknown intensity ρ(x,m)=fz(m)ρg(x), that is, setting ρ˜=ρ̂. When we assume that there is a common mark distribution which coincides with the mark reference measure ν, we obtain that ρ̂(x,m)=ρ̂g(x), that is, the intensity estimate does not depend on the mark values. Imposing this assumption is particularly convenient when dealing with functional marks since estimation of the mark density, which here is a density on a function space, is rather challenging and beyond the scope of this article. Note that when Y is randomly labeled, it has a common mark distribution and in this setting the assumption ρ̂(x,m)=ρ̂g(x) thus makes sense.

Turning to the distributional properties of the nth order local marked cumulative summary statistics, we next derive their expectations under the assumption of k-MIRS. Note, in particular, that the choice of t˜ plays a significant role here.

Theorem 1.

When Y is k-MIRS and ρ˜=ρ, for any WB(Rd) the expectation of L((x,m),Y{(x,m)}W×M;t˜,ρ),(x,m)YW×M, is almost everywhere given by WxWx(MMt˜((x,m),(x1+x,m1),, (xn1+x,mn1))f0,x1,,xn1(m,m1,,mn1)f0(m)fx1(m1)fxn1(mn1) ν(dm1)ν(dmn1))gg(n)(0,x1,,xn1)dx1dxn1,when 2nk. Moreover, the expectation of G(r,YW×M) is obtained by replacing t˜ by t˜r in the expression above and integrating it over W×M with respect to the reference measure on Rd×M.

Proof.

Note first that the expectation coincides with E!(x,m)[Ln((x,m),YW×M;t˜,ρ)]=E!(x,m)[(x1,m1),,(xn1,mn1)YW×M      t˜((x,m),(x1,m1),,(xn1,mn1))ρ(x,m)ρ(x1,m1)ρ(xn1,mn1)].

Hence, our starting point will be the reduced Campbell-Mecke formula. Consider an arbitrary bounded A×EB(Rd×M). It follows that E[(x,m)YA×E(x1,m1),,(xn1,mn1)Y{(x,m)}W×Mt˜((x,m),(x1,m1),,(xn1,mn1))ρ(x,m)ρ(x1,m1)ρ(xn1,mn1)]=A×EE!(x,m)[(x1,m1),,(xn1,mn1)YW×Mt˜((x,m),(x1,m1),,(xn1,mn1))ρ(x1,m1)ρ(xn1,mn1)]dxν(dm).

On the other hand, by the Campbell formula we have that E[(x,m)YA×E(x1,m1),,(xn1,mn1)Y{(x,m)}W×Mt˜((x,m),(x1,m1),,(xn1,mn1))ρ(x,m)ρ(x1,m1)ρ(xn1,mn1)]=A×ERd×MRd×M1{x1,,xn1W}t˜((x,m),(x1,m1),,(xn1,mn1))g(n)((x,m),(x1,m1),,(xn1,mn1))dx1ν(dm1)dxn1ν(dmn1)dxν(dm)=A×ERd×MRd×Mi=1n11{uiWx}t˜((x,m),×(u1+x,m1),,(un1+x,mn1))g(n)((0,m),(u1,m1),,(un1,mn1))du1ν(dm1)dun1ν(dmn1)dxν(dm)by the imposed k-MIRS and a change of variables, ui+x=xi. Hence, since A×EB(Rd×M) was arbitrary, for almost every (x, m) we have that E!(x,m)[Ln((x,m),Y;t˜,ρ)] =(Wx)×M(Wx)×Mt˜((x,m),(u1+x,m1),, (un1+x,mn1))g(n)((0,m),(u1,m1),, (un1,mn1))du1ν(dm1)dun1ν(dmn1) =WxWx(MMt˜((x,m),(u1+x,m1),, (un1+x,mn1))f0,u1,,un1(m,m1,,mn1)f0(m)fu1(m1)fun1(mn1) ν(dm1)ν(dmn1))gg(n)(0,u1,,un1)du1dun1, by Fubini’s theorem. □

The first thing we note is that when Y is independently marked then the density ratio in the expression for the expectation vanishes. In addition, if Y is a Poisson process on Rd×M which satisfies being a marked point process with mark space M, then the expectation reduces to an integral with t˜ as integrand. These observations may be used as benchmarks for when Y exhibits mark (in)dependence and spatial interaction locally.

4.1 Special Cases

We next illustrate how (5), through Definition 1 and (4), reduces to several existing summary statistic estimators by varying t˜ and ρ˜.

4.1.1 Ground K-functions

First, set n = 2 and t˜ to t˜r((x,m),(x1,m1))=w(x,x1)1{x1x+C}/|W|,r0, where WRd,|W|>0, and w(·) is an edge correction term. If the ground process is stationary with intensity ρg>0 and ρ˜(x,m)ρg, then (5) with Y set to YW×M reduces to an estimator of Ripley’s K-function when x+C=x+b[0,r]=b[x,r] whereas if the ground process is inhomogeneous and we set ρ˜(x,m)=ρg(x), it follows that (5) reduces to an estimator of the inhomogeneous K-function (Baddeley, Møller, and Waagepetersen Citation2000) for Yg. The extension to space-time is straightforward; replace the Euclidean ball b[0,r] by C={(x,s):xr,|s|t}B(Rd+1), where · denotes the Euclidean norm (Gabriel and Diggle Citation2009; Cronie and Van Lieshout Citation2015; Iftimi, Cronie, and Montes Citation2019).

4.1.2 Marked K-Functions

When n = 2, by instead letting t˜r((x,m),(x1,m1))=w(x,x1)1{x1x+C}1{mE,m1E1}/(|W|ν(E)ν(E1)) and ρ˜=ρ in (4), using a suitable edge correction function w(·), then G(r,YW×M) in (5) reduces to an estimator of the marked second-order reduced moment measure KEE1(C) of Iftimi, Cronie, and Montes (Citation2019), which measures the intensity reweighted interactions between points with marks in E and points with marks in E1, when their separation vectors belong to CB(Rd). We note that measures of this kind are in general not symmetric, that is, KEE1(·)KE1E(·) (Iftimi, Cronie, and Montes Citation2019). Furthermore, choosing C to be the closed origin-centered ball b[0,r] of radius r0, we consider the marked inhomogeneous K-function KinhomEE1(r) of Cronie and van Lieshout (Citation2016), which measures pairwise intensity reweighted spatial dependence within distance r between points with marks in E and points with marks in E1.

By additionally letting n > 2, we obtain a definition of a marked nth order reduced moment measure, KE×i=1n1Ei(C1××Cn1), which measures the intensity reweighted spatial interaction between an arbitrary point with mark in E and distinct (n1)-tuples of other points, where the separation vectors between the E-marked point and these n – 1 points, which have marks in E1,,En1, belong to C1,,Cn1. We note that Ci=b[0,r],i=1,,n1,r0, yields an n-point version of the marked inhomogeneous K-function KinhomE×i=1n1Ei(r) of Cronie and van Lieshout (Citation2016), which may be used to analyze intensity reweighted interactions between a point with mark in E and n – 1 of its r-close neighbors, which have marks belonging to the respective sets E1,,En1.

4.1.3 Weighted Marked Reduced Moment Measures and K-functions

Finally, by letting ρ˜=ρ and t˜((x,m),(x1,m1),, (xn1,mn1)) be given by the product of (6) t˜(m,m1,,mn1)=t(m,m1,,mn1)1{mE}ν(E)i=1n11{miEi}ν(Ei),w˜(x,x1,,xn1)=w(x,x1,,xn1)i=1n11{xi(x+Ci)},(6) for EB(M),ν(E)>0, and Ci×EiB(Rd)×B(M)=B(Rd×M),ν(Ei)>0,i=1,,n1, we obtain an unbiased estimator K̂tE×i=1n1Ei(C1××Cn1)=G(r,YW×M) of the t-weighted marked nth order reduced moment measure of Ghorbani et al. (Citation2021), (7) KtE×i=1n1Ei(C1××Cn1)=1|W|ν(E)i=1n1ν(Ei)E[(x,m)YW×E(x1,m1),,(xn1,mn1)Y{(x,m)}t(m,m1,,mn1)ρ(x,m)i=1n11{xixCi}1{miEi}ρ(xi,mi)](7) assuming that the edge correction function w is such that unbiasedness holds. Examples of such w include the minus sampling edge correction and the translational edge correction (Ghorbani et al. Citation2021). Note here that one just as well could have merged the scaled indicators in the expression for t˜ with t so that t˜=t; Ghorbani et al. (Citation2021) included this mark set filtering to highlight that their summary statistic generalizes previously proposed ones.

4.2 Local t-Weighted Marked nth Order Inhomogeneous K-Function

In this section, we provide the estimator corresponding to the local contributions of (7) and discuss its properties.

Definition 2.

Let t˜ be (up to indicator-scaling) as in (6) and consider (8) K̂t(x,m)×i=1n1Ei(C1××Cn1)=Ln((x,m),Y{(x,m)}W×M;t˜,ρ˜)=1ρ˜(x,m)ν(E)i=1n1ν(Ei)(x1,m1),,(xn1,mn1)Y{(x,m)}W×Mw(x,x1,,xn1)t(m,m1,,mn1)i=1n11{xixCi}1{miEi}ρ˜(xi,mi),(x,m)YW×M,(8) for some suitable edge correction w in (6), WB(Rd),EB(M),ν(E)>0, and Ci×EiB(Rd×M),ν(Ei)>0,i=1,,n1. We refer to K̂t(x,m)×i=1n1Ei(r)=K̂t(x,m)×i=1n1Ei(b[0,r]n1),r0, as a local t-weighted marked nth order inhomogeneous K-function. In particular, K̂t,n(x,m)(r)=K̂t(x,m)×Mn1(r) does not perform any explicit mark set filtering.

Note first that when there is a common mark distribution which coincides with the reference measure on M, setting ρ˜=ρ we, for instance, obtain K̂t,n(x,m)(r)=(x1,m1),,(xn1,mn1)Y{(x,m)}(b[x,r]W)×M t(m,m1,,mn1)w(x,x1,,xn1)ρg(x)ρg(x1)ρg(xn1)since ν must be a probability measure here.

Regarding the distributional properties of (8), when Y is k-MIRS, Theorem 1 tells us that the expectation is given by 1ν(E)i=1n11ν(Ei)RdRdw(x,x1+x,,xn1+x) i=1n11{xi(x+Ci)(Wx)} (E1En1t(m,m1,,mn1) f0,x1,,xn1(m,m1,,mn1)f0(m)fx1(m1)fxn1(mn1)ν(dm1)ν(dmn1)) gg(n)(0,x1,,xn1)dx1dxn1.

In particular, under independent marking the mark related integral within brackets reduces to E1En1t(m,m1,,mn1)ν(dm1)ν(dmn1), whereby (8) is given by the product of this term and a term measuring intensity reweighted spatial interaction.

4.2.1 Test Functions for FMPPs

Turning to the FMPP case, by choosing different test functions t(·) for the functional marks, we may extract different features. We here focus on pairwise interactions, that is, n = 2.

The test function t is intended to reflect similarities between functions. Hence, a natural starting point would be a metric t(f1,f2)=d(f1,f2) on the function space M, which does not necessarily need to be the underlying assumed metric on M. The first candidate that comes to mind is an Lp-distance: (9) t(f1,f2)=(ab|f1(t)f2(t)|pdt)1/p,1p,(9) where p= represents the supremum metric. For any choice of p in (9), similarity between functions implies a small value of the test function. Other tentative functions are semi-metrics based on the Lp distance between the sth derivatives of the functions, for different combinations of p and s, with the L1 and L2 distances being particular cases, and semi-metrics based on functional principal component analysis.

A further alternative is the functional marked counterpart of the test function for the classical variogram, given by (10) t(f1,f2)=ab(f1(t)F¯¯(t))(f2(t)F¯¯(t))dt,(10) with F¯¯(t)=(1/n)i=1nfi(t) being the average functional mark at time t for the observed functional part of the point pattern; such averaging is motivated by the assumption of a common mark distribution.

5 Local Test for Random labeling

Simple hypotheses for spatial point patterns, such as Complete Spatial Randomness, are commonly tested using an estimator of a global summary statistic, for example, Ripley’s K-function. In this context, one typically resorts to Monte Carlo testing. The first step is then to generate Q simulations under the null hypothesis, and to estimate the chosen summary statistic for both the observed pattern and the simulations. In order to study whether there is random labeling in a (functional) marked point process, the simulations are obtained by permuting the (functional) marks, that is, randomly assigning them to the spatial points of the ground pattern, which are kept fixed. Then, the chosen summary statistic is estimated for each of these permutations and global envelopes at a given nominal level are generated based on them. The result of the test can be assessed graphically: if the summary statistic estimate for the observed pattern exits the envelopes, we proceed with the assumption that the underlying FMPP is not randomly labeled. Furthermore, it is possible to calculate a p-value based on the position of the observed summary statistic within the qth envelopes, following Myllymäki et al. (Citation2017). We know, however, that the conclusion drawn from the application of the above-mentioned global test pertains to the whole analyzed process, indicating whether all the functional marks are randomly labeled or not. Motivated by the will to further detect the specific points, and regions, where the functional marks really do depend on the other marked points, we propose a local test for random labeling. The main idea is to run a global envelope test on each point of the analyzed pattern by means of the previously proposed local t-weighted marked inhomogeneous K-functions, to draw different conclusions about the individual points, based on the obtained p-values. In Algorithm 1 we outline the proposed local test. Note that we alternatively may use sampling without replacement in step 5 of Algorithm 1. Moreover, if convinced that multiple testing issues are present here, one may adjust the Type I error probability α by using for example, the Holm-Bonferroni method.

Algorithm 1

Local test of random labeling

1: Set a fixed nominal value α for Type I error;

2: Consider a (functional) marked point pattern x={(xj,mj)}j=1k,k1;

3: Set a number of simulations, Q1;

4: for each q=1,,Q: do

5: Randomly sample k (functional) marks, with replacement, from the original k ones;

6: Denote the resulting point pattern by xq={(xj,mjq)}j=1k;

7: end for

8: for each j=1,,k, do

9: Compute Ln(j,q)={K̂t(xj,mjq)×i=1n1Ei(r;xq)}r[0,rmax] for all q=1,,Q;

10: Apply global envelope testing, using the functions Ln(j,q),q=1,,Q, to generate the envelopes;

11: Obtain a p-value pj from the test;

12: Reject the null hypothesis for the j th point if pjα.

13: end for

6 Motivating Example and Simulation Study

This section is dedicated to simulation studies to assess the performance of our proposed local test. First, Section 6.1 provides a motivating example of the use of such a test, by means of simulated data resembling seismic events, which in turn have motivated this work. In particular, this means simulating the functional marks as seismic waveforms, following the typical abrupt change in variance of the signal in correspondence with the arrivals of the first P- and S-waves. Then, Section 6.2 presents an extensive simulation study, showing diverse and more general settings. Specifically, we assess the performance of the test by summarizing the results in terms of classification rates.

6.1 The Need for a Local Test

We simulate a homogeneous spatial point pattern with 250 points on the unit square, W=[0,1]×[0,1], which represents the ground pattern. For each ground point xi, we simulate a functional mark of the from fi(t)=y(t)=μ(t)+ϵ(t),tT=[0,1], ϵ(t)N(0,σ(t)2), σ(t)2=0.2+7.51{t>0.4}51{t>0.6},where the mean signal μ(t) is taken to be zero. The spatial ground point pattern and the corresponding waveform for a given point are shown in . Since the marks/waveforms are simulated from the same model, and independently of each other and the spatial locations of the points, we see that such a process is indeed randomly labeled.

Fig. 2 (a) Simulated earthquake locations. (b) Simulated waveform marking the highlighted point on panel (a). (c) Result of the global test.

Fig. 2 (a) Simulated earthquake locations. (b) Simulated waveform marking the highlighted point on panel (a). (c) Result of the global test.

Having generated the data, we first run a global envelope test for random labeling, by randomly permuting the simulated waveforms, that is, the functional marks, keeping the location of the points fixed. We run the test by means of the t-weighted marked nth order inhomogeneous K-function of Ghorbani et al. (Citation2021), with n = 2, making it a second order summary statistic, and t given by the test function (10), that is, the functional marked counterpart of the test function for the classical variogram. As previously mentioned, we assume that there is a common mark distribution which coincides with the reference measure on the mark space so that the intensity function is estimated by the ground process intensity estimate. To be as objective as possible, we do not use the homogeneous intensity estimator ρ̂g(·)=Yg(W)/|W| here but instead we use a kernel intensity estimator, as in practice it would be unknown to us whether the actual ground process is (in)homogeneous. We use a Gaussian kernel intensity estimator ρ̂g(·), where we select the bandwidth, h, according to Cronie and Van Lieshout (Citation2018). More specifically, we minimize the discrepancy between the area of the observation window and the sum of reciprocal estimated intensity values at the points of the point pattern, that is, we minimize CvL(h)=(|W|i1/ρ̂g(xi;h))2, where the sum is taken over all the data points xi and ρ̂g(xi;h) is the kernel intensity estimate with bandwidth h, evaluated in xi. Then, once the bandwidth has been selected, the intensity estimate is corrected for edge effects through global edge correction (the option diggle = FALSE in the spatstat function density.ppp), that is, dividing the estimate by the convolution of the Gaussian kernel with the window of observation (Diggle Citation1985). Finally, for w we use Ripley’s isotropic edge correction in the summary statistic to correct for edge effects. We repeated the procedure 39 times, obtaining the result depicted in . We stress that our approach seems to be robust with respect to the bandwidth specification in this particular scenario setting, that is, the choice of bandwidth selection approach plays a minor role in the final result.

As evident from , the observed summary statistic completely lies within the envelopes, and this confirms the expected result of lack of spatial dependence/structure of the functional marks. This result is further corroborated by the nonsignificant p-value, equal to 0.25.

6.1.1 Simulating Spatially Dependent Functional Marks

To make the functional marks spatially dependent, we then superimpose a homogeneous spatial point pattern with 50 points, generated in the [0,0.5]×[0,0.5] square, that is, the bottom left region of the entire study region W. For these additional points, we generate different functional marks than before, namely with the underlying trend μ(t)=10+6sin(3πzt). Consequently, we have simulated a FMPP with spatially varying functional marks, that is, not randomly labeled. We therefore expect a global test of random labeling to confirm this.

We first run the same global test of random labeling as before. Here, the K-function is based on a kernel intensity estimate whose bandwidth is selected by Diggle’s (Citation2013) rule. It represents a good alternative to Cronie and Van Lieshout’s (Citation2018) one, being slightly faster to compute. We use Q = 39 and obtain a global p-value of 0.025. This, together with the observed K-function lying outside the envelopes (), indicates the ability of the global test to correctly detect the spatial dependence of the functional marks.

Fig. 3 (a) Result of global test for the spatially dependent simulated data. (b) Output of the local test: the black triangles are the significant points for which the hypothesis of random labeling is rejected.

Fig. 3 (a) Result of global test for the spatially dependent simulated data. (b) Output of the local test: the black triangles are the significant points for which the hypothesis of random labeling is rejected.

We know, however, that this conclusion should not be drawn for each point of the pattern, if we consider local restrictions of it, but specifically for those in the vicinity of the [0,0.5]×[0,0.5] square. We therefore proceed by running our proposed local test, based on the proposed second order local K-function K̂t,2(x,m)(r),r[0,rmax], in Definition (2), with the same choice of test function t(·) and the same intensity estimation scheme as for the global one. depicts the points of the simulated point pattern, and it displays as black triangles those points for which the local test came out significant. Hence, this illustrates that the proposed local test is able to correctly identify some of the points, and consequently some parts of the region, where the hypothesis of random labeling does not hold locally. Note that a universally preferable option for rmax does not exist. In this article, it is set to min(xW,yW)/4, where xW and yW represent the maximum width and height of the observation region W, respectively; note that this rule of thumb is supported by Diggle (Citation2013). Indeed, changing the value of rmax has an impact on the final results, and we found that our choice provided the best compromise among the options.

6.2 Extended Simulation Study

This section aims to study the proposed method’s performance in terms of classification rates considering different scenarios, concerning both the ground processes and the functional marks’ structures. To this end, we simulate under different such scenarios, to obtain a comprehensive understanding of the results of the local test in different settings.

In detail, we consider three types of ground process structures, all with an expected point count of 200: (a) a homogeneous Poisson process; (b) an inhomogeneous Poisson process with intensity function ρg(x)=ρg(x1,x2)=exp(3.5+3x2),xW; (3) a Thomas process, with intensity of the Poisson process of cluster centeres equal to 25, standard deviation of random displacement of a point from its cluster center equal to 0.05, and mean number of points per cluster equal to 7. They are all generated in W, that is, the unit square, and will be referred to as the base patterns. Then, we superimpose additional simulated patterns in the [0,0.5]×[0,0.5] square, coming from the same generating processes, but with an expected number of points of 50; hereby the expected total number of points on [0,0.5]×[0,0.5] is 50+200/4=100 and on its complement it is 150. These additional patterns will be referred to as feature patterns. A graphical representation of these three ground patterns comes in .

Fig. 4 Simulation scenarios. (a)–(c) Spatial ground patterns; (d)–(f) Functional marks of model (11) (in gray) and of the marking models in item 1, 2, and 3, respectively (in black).

Fig. 4 Simulation scenarios. (a)–(c) Spatial ground patterns; (d)–(f) Functional marks of model (11) (in gray) and of the marking models in item 1, 2, and 3, respectively (in black).

As for the functional marks, we consider the time domain T=[0,10] and, practically, we sample each simulated mark function in 100 equally spaced time points in T. We assume that each functional mark satisfies fi(t)=Z(xi,t), where xi is the ith ground point and (11) Z(x,t)=μ+ξ(x,t),(x,t)W×T,(11) for a zero-mean stationary Gaussian random field ξ with covariance function C(h, u); here h and u denote the spatial and the temporal lags, respectively. For the base patterns, we consider μ = 5 and a pure nugget effect model with covariance function C(h,u)=σ21{h=0},σ2=0.01. In other words, each fi is random noise with mean 5 and variance 0.01 and all fi’s are iid; see the grey curves in . For the feature patterns, we consider three different marking models:

  1. Shifted base model: We here let ξ have the same form as in the base model but let μ=5.5.

  2. Decreased variance base model: We here let ξ have the same form as in the base model but let σ2=0.001.

  3. Nonseparable space-time model: We here let μ = 5 and consider a space isotropic covariance function given by C(h,u)=(ψ(u)+1)δ/2ϕ(h/ψ(u)+1). Here, ϕ is a normal mixture and the corresponding covariance function only depends on the distance between two points, while ψ is a variogram model, which we choose according to a fractal Brownian motion with fractal dimension α = 1; this is an intrinsically stationary isotropic variogram model.

We note that the first two of these scenarios represent independent but not identically distributed marks, whereas in the third scenario we additionally have that the marks are also dependent. In , the functional marks corresponding to the marking models in item 1, 2, and 3 are depicted.

We show the results of the local test in terms of true-positive rate (TPR), false-positive rate (FPR), and accuracy (ACC), averaging over 100 simulated point patterns in . The rates are defined as TPR=true positivespositives, FPR=false negativesnegatives, ACC=true positives and negativespositives and negatives.

Table 1 Results of the local test averaged over 100 simulated point patterns with an expected point count of 250 each.

We of course wish to have TPR and ACC close to 1 and FPR close to 0.

As shown in , the performance of the local test in terms of classification rates strongly depends on the difference in the functional marks. Specifically, changing only the mean of the underlying random field is not enough for properly identifying the points of the feature patterns. This sufficiently improves when changing the variance only, but the best result is obtained when the whole model is changed, that is, changing the correlation structure. The effect of the type of ground pattern is less evident but still present. The inhomogeneous Poisson scenario reports the best classification rates, followed by the Thomas and homogeneous Poisson ones.

Finally, we found that the test function t(·) based on the L2 distance in (9) gave the better results overall. To further explore how the choice of test function influences the test, we also compared to a test function incorporating a derivative function accounting for the shape of the functional marks. This yielded similar results but turned out to be more computationally demanding.

7 Real Seismic Data Analysis

We analyze data coming from the ISTANCE dataset, presented in Section 2. More specifically, we analyze a sample dataset provided at http://www.pi.ingv.it/instance/. The observed point pattern consists of 300 seismic events which occurred in a period ranging from 21st July 2012 to the 9th December 2016. As shown in , the observation area is [6.729,18.002]×[36.64,46.46], including also seismic events occurring around Italy. They tend to gather into two main clusters. The northernmost originated in May 2012, when two major earthquakes struck Northern Italy, causing 27 deaths and widespread damage. The events are known in Italy as the 2012 Emilia earthquakes, because they mainly affected the Emilia region. Then, the Central Italy seismic sequence began in August 2016, and it is now defined by the INGV as the Amatrice-Norcia-Visso seismic sequence. The analyzed events’ magnitudes vary between 0.5 to 4.8.

Fig. 5 Earthquake locations.

Fig. 5 Earthquake locations.

We first compute the proposed local K-function. depicts the estimated local summary statistics. In particular, the steady black lines represent the global statistics, while the grey ones represent the individual contributions. In dashed lines we also represent the theoretical value. In panel (a), the K-function is based on a kernel intensity estimate whose bandwidth is selected by Diggle’s (Citation2013) rule, while in panel (b) the bandwidth is chosen as in Cronie and Van Lieshout (Citation2018). We observe some relevant differences: while with Cronie and Van Lieshout’s (Citation2018) rule we depict different local K-functions deviating from the global one, following Diggle (Citation2013), we find a unique outlying local K-function. This may be explained by the fact that Cronie and Van Lieshout’s (Citation2018) approach tends to yield a bit too large bandwidths when large parts of the study region contain no points, while Diggle’s (Citation2013) approach tends to yield too small bandwidths in general; see Cronie and Van Lieshout (Citation2018) for details. Note that by increasing the bandwidth we decrease the intensity estimate and, as a consequence, the summand denominators in (8) are decreased. Therefore, we run the proposed local test of random labeling with both options for the bandwidth selection and, as expected, the differences observed in the computation of the local K-functions are reflected in the results of the test.

Fig. 6 Local K-functions. (a) The K-function is based on a kernel intensity estimate whose bandwidth is selected by Diggle’s (Citation2013) rule. (b) The bandwidth is chosen as in Cronie and Van Lieshout (Citation2018).

Fig. 6 Local K-functions. (a) The K-function is based on a kernel intensity estimate whose bandwidth is selected by Diggle’s (Citation2013) rule. (b) The bandwidth is chosen as in Cronie and Van Lieshout (Citation2018).

displays the significant points (black triangles) and the nonsignificant ones (grey points). Panel (a) shows the results with Diggle’s (Citation2013) bandwidth while the ones in panel (b) are obtained with Cronie and Van Lieshout’s (Citation2018) bandwidth. For both choices, we selected a significance level of 0.1. We observe that the significant points tend to be similar in both cases, therefore, the choice of bandwidth (selection method) does not seem to be crucial. We note that such bandwidth-induced differences were missing in the previously run simulation study. We attribute this sensitivity of the procedure to the shapes of the functional marks, that are obviously more variable, if compared to the simulated ones.

Fig. 7 Results of the local test at α=0.1. Nonsignificant events are displayed as gray points and significant events are the black triangles. (a) The K-function is based on a kernel intensity estimate whose bandwidth is selected by Diggle’s (Citation2013) rule. (b) The bandwidth is chosen as in Cronie and Van Lieshout (Citation2018).

Fig. 7 Results of the local test at α=0.1. Nonsignificant events are displayed as gray points and significant events are the black triangles. (a) The K-function is based on a kernel intensity estimate whose bandwidth is selected by Diggle’s (Citation2013) rule. (b) The bandwidth is chosen as in Cronie and Van Lieshout (Citation2018).

Nevertheless, both bandwidths lead to significant events belonging to important well known Italian seismic sequences. Of course, these sequences are likely generated by different underlying processes, giving rise to long-term and highly correlated aftershocks. The implication of this result is twofold. On one hand, we have been able to correctly identify seismic events belonging to important well-known Italian seismic sequences. On the other hand, we have found that the shocks related to these sequences exhibit different local dependence structure and therefore, these events are likely generated by different underlying processes, corresponding to different seismic sources.

8 Conclusions

In this work, we have proposed a general form for local summary statistics for marked point processes, which has been exploited to define the family of local inhomogeneous mark-weighted summary statistics for spatial point processes with functional marks, that is, Functional Marked Point Processes (FMPP). We have employed such local summary statistics to construct a local test for random labeling, that is, to identify points, as well as regions, where this hypothesis does not hold.

More specifically, we first introduce a general local function for marked point patterns. With this specification, we are able to show that this function may be exploited to generate most summary statistics established in the literature. With particular reference to the functional marked context, we define the family of local t-weighted marked nth order inhomogeneous summary statistics based on the K-function, which is a local contribution to a global summary statistic estimator. We obtain a result for the expectation of the general local summary statistic and exploit it to derive an expression for the expectation of our t-weighted local statistics.

Having access to these tools, we have proposed a local test of random labeling, resorting to the second order version of our proposed local estimator, obtaining a local test useful for identifying specific regions where a global test would not detect atypical behavior of the points.

To study the performance of the test in terms of classification rates, we have conducted a simulation study, considering a number of scenarios with different ground processes and structures for the functional marks. Such simulations have shown that in many settings, the local test performs well in identifying points of a pattern where the hypothesis of random labeling is not verified.

We can draw a number of future work paths. Nevertheless, the local functions proposed in this article can be considered as a very informative synthesis of the local second-order behavior, useful for characterizing the study area by an extended marked model, based on the FMPP theory. Incorporating local characteristics as functional marks would become part of the so called Constructed Functional Marks (CFMs), which are marks reflecting the geometries of point configurations in neighborhoods of the individual points.

Concerning the application to seismic data, we aim at including also auxiliary (nonfunctional) marks into the analysis. These could contain synthetic information about the waveforms, such as the arrival times of the seismic event, or the inter-time between the two. The achievement of the unification of earthquake data and the FMPP theory would result in building a framework where it would be possible to exploit the available information of the seismic point process altogether.

A final comment concerns the possible extension of this article’s tools to spatio-temporal ground processes, which of course are of importance for processes which typically exhibit spatio-temporal interactions, such as the seismic one. Undoubtedly, such extensions would be crucial for accounting for the temporal dimension of the seismic events, whose realization depends on their past history, as proved by the existence of aftershocks. This would mean to consider a spatio-temporal marked point process Y={(xi,mi)}i=1N, with ground points xi in the three-dimensional space R2×R+ and exploit the methodological framework introduced in this article. Moreover, local summary statistics in space and time are well established, both theoretically (Siino et al. Citation2018; Adelfio et al. Citation2020) and computationally (Gabriel et al. Citation2021). Although such an extension could be straightforwardly achieved by essentially having our summary statistic functions incorporate an additional argument, t, which controls the temporal lags (see Iftimi, Cronie, and Montes Citation2019), this adds another level of complexity which we believe is out of the scopes of this article, but it surely represents an interesting path to cover in future.

Supplementary Materials

Supplementary material contains the source codes to reproduce experimental results.

Open Access

The authors would like publish open access, making use of the Bibsam deal which Swedish universities have with Taylor & Francis.

Supplemental material

Supplemental Material

Download Zip (9.3 MB)

Disclosure Statement

The authors have no potential conflict of interest to report.

Additional information

Funding

This work was supported by “FFR 2023 - Giada Adelfio,” “FFR 2023 - Nicoletta D’Angelo.” We acknowledge financial support from the Italy’s National Recovery and Resilience Plan (PNRR), grant agreement No PE0000018 - GRINS - Growing Resilient, INclusive and Sustainable.

References

  • Adelfio, G., Chiodi, M., D’Alessandro, A., and Luzio, D. (2011), “FPCA Algorithm for Waveform Clustering,” Journal of Communication and Computer, 8, 494–502.
  • Adelfio, G., Chiodi, M., D’Alessandro, A., Luzio, D., D’Anna, G., and Mangano, G. (2012), “Simultaneous Seismic Wave Clustering and Registration,” Computers & Geosciences, 44, 60–69. DOI: 10.1016/j.cageo.2012.02.017.
  • Adelfio, G., Siino, M., Mateu, J., and Rodríguez-Cortés, F. J. (2020), “Some Properties of Local Weighted Second-Order Statistics for Spatio-Temporal Point Processes,” Stochastic Environmental Research and Risk Assessment, 34, 149–168. DOI: 10.1007/s00477-019-01748-1.
  • Anselin, L. (1995), “Local Indicators of Spatial Association-Lisa,” Geographical Analysis, 27, 93–115. DOI: 10.1111/j.1538-4632.1995.tb00338.x.
  • Anselin, L. (1996), “The Moran Scatterplot as an ESDA Tool to Assess Local Instability in Spatial Association,” in Spatial Analytical (Vol. 4), eds. M. M. Fischer, p. 121, London: Routledge.
  • Baddeley, A. J., Møller, J., and Waagepetersen, R. (2000), “Non-and Semi-Parametric Estimation of Interaction in Inhomogeneous Point Patterns,” Statistica Neerlandica, 54, 329–350. DOI: 10.1111/1467-9574.00144.
  • Chiodi, M., Adelfio, G., D’Alessandro, A., and Luzio, D. (2013), “Clustering and Registration of Multidimensional Functional Data,” in Statistical Models for Data Analysis, eds. P. Giudici, S. Ingrassia, and M. Vichi, pp, 89–97, Heidelberg: Springer.
  • Chiu, S. N., Stoyan, D., Kendall, W. S., and Mecke, J. (2013), Stochastic Geometry and its Applications (3rd ed.), Chichester: Wiley.
  • Comas, C., Delicado, P., and Mateu, J. (2011), “A Second Order Approach to Analyse Spatial Point Patterns with Functional Marks,” Test, 20, 503–523. DOI: 10.1007/s11749-010-0215-1.
  • Cressie, N., and Collins, L. B. (2001), “Analysis of Spatial Point Patterns Using Bundles of Product Density LISA Functions,” Journal of Agricultural, Biological, and Environmental Statistics, 6, 118–135. DOI: 10.1198/108571101300325292.
  • Cronie, O., and Van Lieshout, M. (2015), “A J-Function for Inhomogeneous Spatio-Temporal Point Processes,” Scandinavian Journal of Statistics, 42, 562–579. DOI: 10.1111/sjos.12123.
  • Cronie, O., and van Lieshout, M. N. M. (2016), “Summary Statistics for Inhomogeneous Marked Point Processes,” Annals of the Institute of Statistical Mathematics, 68, 905–928. DOI: 10.1007/s10463-015-0515-z.
  • Cronie, O., and van Lieshout, M. N. M. (2018), “A Non-Model-based Approach to Bandwidth Selection for Kernel Estimators of Spatial Intensity Functions,” Biometrika, 105, 455–462.
  • Daley, D. J., and Vere-Jones, D. (2008), An Introduction to the Theory of Point Processes. Volume II: General Theory and Structure (2nd ed.), New York: Springer-Verlag.
  • D’Angelo, N., Adelfio, G., and Mateu, J. (2021), “Assessing Local Differences between the Spatio-Temporal Second-Order Structure of Two Point Patterns Occurring on the Same Linear Network,” Spatial Statistics, 45, 100534. DOI: 10.1016/j.spasta.2021.100534.
  • D’Angelo, N., Adelfio, G., and Mateu, J. (2023), “Locally Weighted Minimum Contrast Estimation for Spatio-Temporal Log-Gaussian Cox Processes,” Computational Statistics & Data Analysis, 180, 107679.
  • D’Angelo, N., Siino, M., D’Alessandro, A., and Adelfio, G. (2022), “Local Spatial Log-Gaussian Cox Processes for Seismic Data,” Advances in Statistical Analysis, 106, 633–671. DOI: 10.1007/s10182-022-00444-w.
  • Diggle, P. (1985), “A Kernel Method for Smoothing Point Process Data,” Journal of the Royal Statistical Society, Series C, 34, 138–147. DOI: 10.2307/2347366.
  • Diggle, P. (2013), Statistical Analysis of Spatial and Spatio-Temporal Point Patterns. Boca Raton, FL: CRC Press.
  • Gabriel, E., and Diggle, P. J. (2009), “Second-Order Analysis of Inhomogeneous Spatio-Temporal Point Process Data,” Statistica Neerlandica, 63, 43–51. DOI: 10.1111/j.1467-9574.2008.00407.x.
  • Gabriel, E., Diggle, P. J., Rowlingson, B., and Rodriguez-Cortes, F. J. (2021), stpp: Space-Time Point Pattern Simulation, Visualisation and Analysis, R package version 2.0-5.
  • Gelfand, A. E., Diggle, P., Guttorp, P., and Fuentes, M. (2010), Handbook of Spatial Statistics, Boca Raton, FL: CRC Press.
  • Getis, A., and Franklin, J. (1987), “Second-Order Neighborhood Analysis of Mapped Point Patterns,” Ecology, 68, 473–477. DOI: 10.2307/1938452.
  • Getis, A., and Ord, J. K. (1992), “The Analysis of Spatial Association by Use of Distance Statistics,” Geographical Analysis, 24, 189–206. DOI: 10.1111/j.1538-4632.1992.tb00261.x.
  • Ghorbani, M., Cronie, O., Mateu, J., and Yu, J. (2021), “Functional Marked Point Processes: A Natural Structure to Unify Spatio-Temporal Frameworks and to Analyse Dependent Functional Data,” Test, 30, 529–568. DOI: 10.1007/s11749-020-00730-2.
  • Iftimi, A., Cronie, O., and Montes, F. (2019), “Second-Order Analysis of Marked Inhomogeneous Spatiotemporal Point Processes: Applications to Earthquake Data,” Scandinavian Journal of Statistics, 46, 661–685. DOI: 10.1111/sjos.12367.
  • Illian, J., Benson, E., Crawford, J., and Staines, H. (2006), “Principal Component Analysis for Spatial Point Processes–Assessing the Appropriateness of the Approach in an Ecological Context,” in Case Studies in Spatial Point Process Modeling, eds. A. Baddeley, P. Gregori, J. Mateu, R. Stoica, and D. Stoyan, pp. 135–150, New York: Springer.
  • Illian, J., Penttinen, A., Stoyan, H., and Stoyan, D. (2008), Statistical Analysis and Modelling of Spatial Point Patterns (Vol. 70), Chichester: Wiley.
  • Mateu, J., Lorenzo, G., and Porcu, E. (2007), “Detecting Features in Spatial Point Processes with Clutter via Local Indicators of Spatial Association,” Journal of Computational and Graphical Statistics, 16, 968–990. DOI: 10.1198/106186007X258961.
  • Mateu, J., Lorenzo, G., and Porcu, E. (2010), “Features Detection in Spatial Point Processes via Multivariate Techniques,” Environmetrics, 21, 400–414.
  • Michelini, A., Cianetti, S., Gaviano, S., Giunchi, C., Jozinović, D., and Lauciani, V. (2021), “Instance–The Italian Seismic Dataset for Machine Learning,” Earth System Science Data, 13, 5509–5544. DOI: 10.5194/essd-13-5509-2021.
  • Møller, J. (2003), “Shot Noise Cox Processes,” Advances in Applied Probability, 35, 614–640. DOI: 10.1239/aap/1059486821.
  • Møller, J., and Ghorbani, M. (2012), “Aspects of Second-Order Analysis of Structured Inhomogeneous Spatio-Temporal Point Processes,” Statistica Neerlandica, 66, 472–491. DOI: 10.1111/j.1467-9574.2012.00526.x.
  • Møller, J., and Waagepetersen, R. P. (2003), Statistical Inference and Simulation for Spatial Point Processes, Boca Raton: Chapman and Hall/CRC.
  • Moraga, P., and Montes, F. (2011), “Detection of Spatial Disease Clusters with LISA Functions,” Statistics in Medicine, 30, 1057–1071. DOI: 10.1002/sim.4160.
  • Mrkvička, T., Dvořák, J., González, J. A., and Mateu, J. (2021), “Revisiting the Random Shift Approach for Testing in Spatial Statistics,” Spatial Statistics, 42, 100430. DOI: 10.1016/j.spasta.2020.100430.
  • Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H., and Hahn, U. (2017), “Global Envelope Tests for Spatial Processes,” Journal of the Royal Statistical Society, Series B, 79, 381–404. DOI: 10.1111/rssb.12172.
  • Penttinen, A., and Stoyan, D. (1989), “Statistical Analysis for a Class of Line Segment Processes,” Scandinavian Journal of Statistics, 16, 153–168.
  • R Core Team (2022), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing.
  • Ramsay, J. O., and Silverman, B. W. (2002), Applied Functional Data Analysis: Methods and Case Studies (Vol. 77), New York: Springer.
  • Ripley, B. D. (1976), “The Second-Order Analysis of Stationary Point Processes,” Journal of Applied Probability, 13, 255–266. DOI: 10.2307/3212829.
  • Schlather, M. (2001), “On the Second-Order Characteristics of Marked Point Processes,” Bernoulli, 7, 99–117. DOI: 10.2307/3318604.
  • Siino, M., Adelfio, G., Mateu, J., Chiodi, M., and D’alessandro, A. (2017), “Spatial Pattern Analysis Using Hybrid Models: An Application to the Hellenic Seismicity,” Stochastic Environmental Research and Risk Assessment, 31, 1633–1648. DOI: 10.1007/s00477-016-1294-7.
  • Siino, M., Rodríguez-Cortés, F. J., Mateu, J., and Adelfio, G. (2018), “Testing for Local Structure in Spatiotemporal Point Pattern Data,” Environmetrics, 29, e2463. DOI: 10.1002/env.2463.
  • Stoyan, D., and Stoyan, H. (1994), Fractals, Random Shapes and Point Fields: Methods of Geometrical Statistics, Chichester: Wiley.
  • Van Lieshout, M. (2000), Markov Point Processes and their Applications, Singapore: World Scientific.
  • Van Lieshout, M. (2006), “A j-function for Marked Point Patterns,” Annals of the Institute of Statistical Mathematics, 58, 235–259.
  • Van Rossum, G., and Drake Jr, F. L. (1995), Python Reference Manual, Amsterdam: Centrum voor Wiskunde en Informatica.