Abstract
Topological Data Analysis (TDA) is a novel statistical technique, particularly powerful for the analysis of large and high dimensional data sets. Much of TDA is based on the tool of persistent homology, represented visually via persistence diagrams. In an earlier article we proposed a parametric representation for the probability distributions of persistence diagrams, and based on it provided a method for their replication. Since the typical situation for big data is that only one persistence diagram is available, these replications allow for conventional statistical inference, which, by its very nature, requires some form of replication. In the current paper we continue this analysis, and further develop its practical statistical methodology, by investigating a wider class of examples than treated previously.
Notes
1 In all our persistence diagrams, the ‘point at infinity’ is the highest, leftmost point in the H0 diagram. In essence, removing it from the analysis is much like working with reduced rather than standard homology, and has the effect of removing one generator from the H0 diagram. Thus, in the statistical analysis to follow, it needs to be added, at the end, to all significant points found in the diagram.
2 In some of these simulations the sums were identically zero for all k = 1, 2, 3 simultaneously, since there were no k-th nearest neighbors at distance less than δ. In these cases the parameters θ1, θ2, and θ3 are meaningless, and so these simulations (33 of them) were deleted from this part of the analysis. We shall do the same later on, in similar cases, without further comment.
3 Recall that the Betti number βk of a set is the rank of the free part of its k-th homology group. Or, heuristically, the number of k-dimensional ‘holes’ in the set.