Full article: Latent Space Modeling of Hypergraph Data

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

The increasing prevalence of relational data describing interactions among a target population has motivated a wide literature on statistical network analysis. In many applications, interactions may involve more than two members of the population and this data is more appropriately represented by a hypergraph. In this article, we present a model for hypergraph data that extends the well-established latent space approach for graphs and, by drawing a connection to constructs from computational topology, we develop a model whose likelihood is inexpensive to compute. A delayed acceptance MCMC scheme is proposed to obtain posterior samples and we rely on Bookstein coordinates to remove the identifiability issues associated with the latent representation. We theoretically examine the degree distribution of hypergraphs generated under our framework and, through simulation, we investigate the flexibility of our model and consider estimation of predictive distributions. Finally, we explore the application of our model to two real-world datasets. Supplementary materials for this article are available online.

Keywords:

1 Introduction

A hypergraph describes interactions which occur among arbitrary subsets of a population of interest. This data structure is comprised of a node set, which indexes the population, and a hyperedge set, which indicates which members of the population interact. Examples of hypergraphs occur in image co-tagging (see ), where hyperedges indicate users who appear in the same photograph in an online platform, and coauthorship (see ), where hyperedges represent the set of authors who collaborated on an academic article. A key feature of a hypergraph representation is that can express higher-order interactions and, whilst these data could be analyzed according to an extensive graph modeling literature (see Kolaczyk Citation2009), this results in a loss of structural information which cannot be recovered (see ). This represents the main motivation of this article where we develop a novel model for hypergraph data, detail a procedure for inference and present theoretical results on the degree distribution.

Fig. 1 Examples of hypergraph datasets. (b) shows co-tagging data and (c) shows a subsample of the coauthorship network of Ji and Jin (Citation2016). The figures were made with R packages HyperG (Marchette Citation2021) and igraph (Csardi and Nepusz Citation2006).

More formally, a hypergraph $G = (V, E)$ consists of a set of N node labels V and M hyperedges E, where $e \in E$ contains no repeated elements and $e \subseteq V$ (see ). This data type can also be equivalently represented as a bipartite graph with an $(N \times M)$ adjacency tensor in which each hyperedge is indexed by a node and an edge from a population node to a hyperedge node indicates presence in a hyperedge (see ). Although a significant portion of the existing literature relies on the bipartite representation, we focus on the former representation since it allows us to develop a model which allows for an arbitrary number of hyperedges by avoiding conditioning on M.

We consider extending the latent distance model of Hoff, Raftery, and Handcock (Citation2002) to the hypergraph setting using the representation in . In this approach, a low-dimensional coordinate is associated with each node and nodes whose latent coordinates are close in terms of Euclidean distance are assumed more likely to share a tie. This modeling choice is appealing since the latent space offers an intuitive visualization of the data, encourages transitive relationships due to the metric property of Euclidean distance and allows control over the joint distribution of subgraph counts. Models of this type also allow quantification of uncertainty in interaction probabilities and prediction of future interactions. In our model, we wish to harness hypergraph analogues of these properties and we focus on the following inferential questions.

(Q1)How can we determine an intuitive visualization of an observed hypergraph?
(Q2)How do we expect new nodes to interact with the observed hypergraph relationships?

There is a growing literature concerned with hypergraph modeling (Battiston et al. Citation2020), and several graph models have been extended to the bipartite setting. Examples include exponential random graphs (Wang, Pattison, and Robins Citation2013), blockmodels (Doreian, Batagelj, and Ferligoj Citation2004) and latent space network models (Friel et al. Citation2016). However, in contrast to our work, models for bipartite representations assume the number of hyperedges is known and fixed.

Examples of models for general hypergraphs include the β model of Stasi et al. (Citation2014), preferential attachment models (Wang et al. Citation2010) and the clustering-based model of Ng and Murphy (Citation2021). In a m-uniform hypergraph all hyperedges contain exactly m elements, and extensions of stochastic blockmodels (see Ghoshdastidar and Dukkipati Citation2017; Chien, Lin, and Wang Citation2018) and graphons (Balasubramanian Citation2021) have been studied for this hypergraph type. Simplicial hypergraphs, or rather simplicial complexes (see Edelsbrunner and Harer Citation2010), are a significant class of hypergraphs in which the presence of a hyperedge implies the presence of all subsets of that hyperedge. Proposed models for simplicial complexes include an analogue of exponential random graphs (Zuev, Eisenberg, and Krioukov Citation2015), preferential attachment based models (Bianconi and Rahmede Citation2017), simplicial configuration models (Young et al. Citation2017) and the recursive models of Kahle (Citation2014). A discussion of random simplicial complexes is presented in Kahle (Citation2014) and work on geometric simplicial complexes (Bobrowski and Kahle Citation2018) is most related to our proposed procedure. Simplicial complexes also appear more broadly in the statistics literature (Chazal and Michel Citation2017; Lunagómez et al. Citation2017) and our terminology “simplicial hypergraph” is nonstandard.

Our focus on inference for the full generality of hypergraphs marks a distinction between our work and much of the existing literature. By considering general hypergraphs, we avoid restrictions inherent to simplicial and uniform representations and this allows us to consider how hyperedges of different orders interact in the data. Non-simplicial hypergraph arise in many settings and particular instances of non-simplicial coauthorship networks are presented in and Section 7.2.

The main contributions of this article are as follows. First, using the representation shown in , we develop a computationally tractable latent space model for non-simplicial hypergraph data. Second, we explore the novel application of Bookstein coordinates to address non-identifiability in the latent representation. Third, we perform inference on hypergraph data that is outside the reach of competing methods and consider a novel application of delayed acceptance MCMC methodology. Finally, we study properties of the degree distribution for our model and present approximate theoretical results.

The rest of this article is organized as follows. Section 2 provides the background for the hypergraph model presented in Section 3. Section 4 considers the degree distribution of our model and Section 5 describes a procedure for obtaining posterior samples. The simulation studies and real data examples are presented in Sections 6 and 7, respectively, and we conclude with a discussion in Section 8.

2 Background

2.1 Latent Space Network Modeling

Latent space models were introduced for network data in Hoff, Raftery, and Handcock (Citation2002). This modeling framework associates the nodes of a network with low-dimensional latent coordinates and expresses the probability of edges forming as a function of the latent representation.

To describe this model for a N node network, we let $Y = {y_{i j}}_{i, j = 1, 2, \dots, N}$ denote the observed $(N \times N)$ adjacency matrix, where y_ij represents the connection between nodes i and j. For binary interactions, we take y_ij = 1 if i and j share an edge and y_ij = 0 otherwise. We also let $u_{i} \in R^{d}$ denote the d-dimensional latent coordinate associated with the ith node, for $i = {1, 2, \dots, N} = [N]$ . The presence of an edge is then given by (1) $\begin{matrix} Y_{i j} & \sim Bernoulli (p_{i j}), p_{i j} = P (y_{i j} = 1 | u_{i}, u_{j}, θ) \\ = 1 / (1 + \exp {- f (u_{i}, u_{j}, θ)}), \end{matrix}$ (1) for $i, j \in [N]$ , where θ represents additional model parameters and f controls how u_i and u_j affect the propensity for ties to form. Typically, f is specified as monotonically decreasing in a measure of similarity between u_i and u_j . For example, Hoff, Raftery, and Handcock (Citation2002) take (2) $f (u_{i}, u_{j}, θ) = α - ‖ u_{i} - u_{j} ‖,$ (2) where $‖ \cdot ‖$ is the Euclidean distance, and $θ = (α)$ represents the base-rate tendency for edges to form. The function f may also be adapted to incorporate covariate information so that nodes which share certain characteristics are more likely to be connected. The likelihood, conditional on $U = {u_{i}}_{i = 1}^{N}$ and θ, is given by (3) $\begin{matrix} L (U, θ; Y) \propto \prod_{i < j} P {(y_{i j} = 1 | u_{i}, u_{j}, θ)}^{y_{i j}} \\ \times {[1 - P (y_{i j} = 1 | u_{i}, u_{j}, θ)]}^{1 - y_{i j}} . \end{matrix}$ (3)

Properties of this model are well-understood and we refer to Rastelli, Friel, and Raftery (Citation2016) for details.

2.2 Random Geometric Graphs

Random geometric graphs (RGGs) (Penrose Citation2003) can be viewed as a special case of the latent space network model outlined in Section 2.1. To express the connection probabilities we let $B_{r} (u_{i}) = {u \in R^{d} | ‖ u - u_{i} ‖ \leq r}$ , where $‖ \cdot ‖$ is Euclidean distance, denote a ball of radius r with center u_i . The presence of an edge is then expressed deterministically as (4) $\begin{matrix} p_{i j} = P (y_{i j} = 1 | u_{i}, u_{j}, r) = 1 (B_{r} (u_{i}) \cap B_{r} (u_{j}) \neq \emptyset) \\ = 1 (‖ u_{i} - u_{j} ‖ \leq 2 r), \end{matrix}$ (4) so that nodes i and j share a tie only if their corresponding sets have a nonempty intersection (see ) or, equivalently, when $‖ u_{i} - u_{j} ‖ \leq 2 r$ .

Fig. 2 Example of a Čech complex. Left: $B_{r} (u_{i})$ for ${u_{i} = (u_{i 1}, u_{i 2})}_{i = 1}^{7}$ in $R^{2}$ . Middle: the graph obtained by taking pairwise intersections. Right: the hypergraph obtained by taking intersections of arbitrary order. The shaded region indicates an order 3 hyperedge.

We can now express the likelihood of observing Y conditional on U and r as (5) $\begin{matrix} L (U, r; Y) \propto \prod_{i < j} 1 {(‖ u_{i} - u_{j} ‖ \leq 2 r)}^{y_{i j}} \\ \times {[1 - 1 (‖ u_{i} - u_{j} ‖ \leq 2 r)]}^{1 - y_{i j}} . \end{matrix}$ (5)

Since a RGG is deterministic conditional on U and r, (5) is equal to 1 only when there is a perfect correspondence between the observed connections Y and the connections induced by the latent positions and radius, and is otherwise equal to 0.

2.3 Random Geometric Hypergraphs

We can extend the RGG from Section 2.2 to hypergraphs by considering the full intersection pattern of convex sets. To begin, we introduce the concept of a nerve (Edelsbrunner and Harer Citation2010, sec. 3.2) in the following definition.

Definition 2.1.

(Nerve) Let $A = {A_{i}}_{i = 1}^{N}$ represent a collection of non-empty sets. The nerve of $A$ is given by $Nrv (A) = {σ \subseteq {1, 2, \dots, N} | \underset{j \in σ}{\cap} A_{j} \neq \emptyset}$ .

The nerve represents the sets of indices whose corresponding regions have a non-empty intersection. The sets ${1}, {2}, \dots, {N}$ are included in $Nrv (A)$ and $| σ | \leq N$ for $σ \in Nrv (A)$ , where $| σ |$ is the order of the set. The nerve defines a hypergraph where $σ \in Nrv (A)$ represents a hyperedge, and for sets $σ_{1} \in Nrv (A)$ and $σ_{2} \subset σ_{1}$ , it follows immediately that $σ_{2} \in Nrv (A)$ implying that the hypergraph is simplicial (Kahle Citation2017).

Taking $A_{i} = B_{r} (u_{i})$ , as for the RGG in Section 2.2, gives rise to the well-studied Čech complex (Edelsbrunner and Harer Citation2010, sec. 3.2). This is defined below.

Definition 2.2

(Čech Complex). For a set of coordinates $U = {u_{i}}_{i = 1}^{N}$ and a radius r, the Čech complex $C_{r} (U)$ is given by $C_{r} (U) = Nrv ({B_{r} (u_{i})}_{i = 1}^{N})$ .

We now introduce a subset of the Čech complex known as the k-skeleton.

Definition 2.3

(k-skeleton of the Čech Complex). Let $C_{r} (U)$ denote the Čech complex, as given in Definition 2.2. The k-skeleton of $C_{r} (U)$ is given by $C_{r}^{(k)} (U) = {σ \in C_{r} (U)$ $| | σ | \leq k}$ .

Example 2.1.

For the Čech complex in , $C_{r} (U) = {{1}, {2}, {3}, {4}, {5}, {6}, {7}, {2, 4}$ , {2, 7}, {3, 5}, {3, 6}, {5, 6}, ${3, 5, 6}}, C_{r}^{(1)} (U) = {{1}, {2}, {3}, {4}, {5}, {6}, {7}}$ , $C_{r}^{(2)} (U) = C_{r}^{(1)} \cup {{2, 4}, {2, 7},$ ${3, 5}, {3, 6}, {5, 6}}$ , and $C_{r}^{(3)} (U) = C_{r}^{(2)} (U) \cup {3, 5, 6}$ .

3 Latent Space Hypergraphs

3.1 Motivation and Notation

In the motivating examples from Section 1, we observed arbitrary hypergraph interactions. Since the models discussed in Section 2 express either pairwise or simplicial interactions, they are not appropriate for this data. This motivates the development of a non-simplicial latent space hypergraph model. We aim to propose a model which has (i) a convenient and computationally tractable likelihood with (ii) a support that is simple to describe.

We consider general hypergraphs on N nodes and let K denote the maximum hyperedge order, where $2 \leq K \leq N$ . Let $G_{N, K} = (V_{N}, E_{N, K})$ denote the space of hypergraphs on N and K, where $V_{N} = {1, 2, \dots, N}$ and $E_{N, K}$ represent the node labels and hyperedges up to order K on N nodes, respectively. Additionally, let $E_{N}^{k}$ represent hyperedges of exactly order k on N nodes, so that $E_{N, K} = \cup_{k = 2}^{K} E_{N}^{k}$ , and let $e_{k} \in E_{N}^{k}$ denote an order k interaction, where $| e_{k} | = k$ and e_k contains no repeated elements.

3.2 Combining k-skeletons

To extend the model in Section 2.3 to express non-simplicial hypergraphs we allow the radii to differ for each hyperedge order. We generate a hypergraph by isolating the order k hyperedges from the complex $C_{r_{k}} (U)$ and combining these for $k = 2, 3, \dots, K$ , where $r_{k} \neq r_{k'}$ for $k \neq k'$ . For example, when K = 3, we take radii r₂ and r₃ along with their Čech complexes $C_{r_{2}} (U)$ and $C_{r_{3}} (U)$ . Then, we consider the order 2 hyperedges in $C_{r_{2}} (U)$ and the order 3 hyperedges in $C_{r_{3}} (U)$ , as shown in . We refer to a hypergraph constructed in this way as a non-simplicial random geometric hypergraph (nsRGH).

Fig. 3 Example of a nsRGH (see Definition 3.1) with $U = {u_{i}}_{i = 1}^{7} = {(u_{i 1}, u_{i 2})}_{i = 1}^{7}$ . Left: $C_{r_{2}} (U)$ . Middle: $C_{r_{3}} (U)$ . Right: $\cup_{k = 2}^{3} D_{r_{k}}^{(k)} (U)$ .

Definition 3.1

(nsRGH). Take $r = (r_{2}, r_{3}, \dots, r_{K})$ such that $r_{k} > r_{k - 1} > 0$ for $k = 3, 4, \dots, K$ . We define a nsRGH on N nodes as the hypergraph $g_{N, K} (U, r)$ with hyperedges given by $\cup_{k = 2}^{K} D_{r_{k}}^{(k)} (U)$ , where $D_{r_{k}}^{(k)} (U) = C_{r_{k}}^{(k)} (U) ∖ C_{r_{k}}^{(k - 1)} (U)$ denotes the hyperedges of exactly order k in the Čech complex with radius r_k and $C_{r_{k}}^{(k)} (U)$ is as in Definition 2.3.

Example 3.1.

For the nsRGH shown in the right panel of we have $g_{7, 3} (U, r) = D_{r_{2}}^{(2)} (U) \cup D_{r_{3}}^{(3)} (U)$ , where $D_{r_{2}}^{(2)} (U) = {{2, 4}, {2, 7}, {3, 5}, {5, 6}}$ , and $D_{r_{3}}^{(3)} (U) = {3, 5, 6}$ .

In Definition 3.1 constraints are imposed on the radii $r = (r_{2}, r_{3}, \dots, r_{K})$ to ensure that the hypergraphs are non-simplicial since for example, if $r_{3} < r_{2}$ and the hyperedge {i, j, k} is present in the hypergraph, it follows that ${i, j}, {i, k}$ , and {j, k} must also be present.

3.3 Generative Model and Likelihood

Our procedure for generating a nsRGH $g_{N, K} (U, r)$ (see Definition 3.1) is deterministic conditional on U and r. Practical application of this model is therefore challenging since (i) the probability observing a hypergraph $h_{N, K} \in G_{N, K}$ is binary (similarly to the RGG in Section 2.2) and (ii) it is difficult to characterize the space of hypergraphs which can be expressed as a nsRGH. These issues hinder estimation via likelihood methods and make it difficult to say whether a latent representation can be found a priori.

To address these issues we consider a modification of the nsRGH $g_{N, K} (U, r)$ , denoted $g_{N, K}^{*} (U, r, φ)$ , obtained by independently modifying the state of each order k hyperedge in $g_{N, K} (U, r)$ with probability $φ_{k} \in [0, 1]$ . More precisely, if $y_{e_{k}}^{(g)} \in {0, 1}$ is the state of e_k in $g_{N, K}^{*} (U, r, φ)$ where $y_{e_{k}}^{(g)} = 1$ indicates presence, we take $y_{e_{k}}^{(g^{*})} = (y_{e_{k}}^{(g)} + s_{k}) \mod 2$ , where $y_{e_{k}}^{(g^{*})} \in {0, 1}$ is the state of hyperedge e_k in $g_{N, K}^{*} (U, r, φ)$ and $S_{k} \sim Bernoulli (φ_{k})$ . This means that $y_{e_{k}}^{(g)}$ is modified from either present to absent or absent to present with probability $φ_{k}$ to obtain $y_{e_{k}}^{(g^{*})}$ . The noise terms ${φ_{k}}_{k = 2}^{K}$ control the amount of modification of $g_{N, K} (U, r)$ , and so setting $φ_{k}$ close to 0 obtains a hypergraph that is similar to a nsRGH.

To fully specify a generative model, we assign a probability distribution on the latent coordinates U. We let $u_{i} \overset{iid}{\sim} N (μ, Σ)$ , for $i \in [N]$ to reflect the intuition that nodes placed more centrally in the latent representation are likely to have high degrees and neighbors with high degrees. A hypergraph can now be generated by the procedure given in Algorithm 1, and details of efficient implementation of this are given in supplements C and E.3.

Algorithm 1

Sample a hypergraph $g_{N, K}^{*}$ given $N, K, r, φ, μ$ , and Σ.

Sample $U = {u_{i}}_{i = 1}^{N}$ such that $u_{i} \overset{iid}{\sim} N (μ, Σ)$ , for $i = 1, 2, \dots, N$ .

For $k = 2, 3, \dots, K$ ,

a) Given U and r_k , check which $e_{k} = {i_{1}, i_{2}, \dots, i_{k}} \in E_{N}^{k}$ satisfy $y_{e_{k}}^{(g)} = 1$ .

To determine if $y_{e_{k}}^{(g)} = 1$ , check that $\cap_{l = 1}^{k} B_{r_{k}} (u_{i_{l}}) \neq \emptyset$ .

b) For all $e_{k} \in E_{N}^{k}$ , sample $S_{k} \sim Bernoulli (φ_{k})$ .

Let $y_{e_{k}}^{(g^{*})} = (y_{e_{k}}^{(g)} + s_{k}) \mod 2$

To express the likelihood of observing the hypergraph $h_{N, K} \in G_{N, K}$ , we let (6) $d_{k} (g_{N, K} (U, r), h_{N, K}) = \sum_{e_{k} \in E_{N}^{k}} | y_{e_{k}}^{(g)} - y_{e_{k}}^{(h)} |$ (6) denote the discrepancy between the order k hyperedges in $g_{N, K} (U, r)$ and $h_{N, K}$ , where $k = 2, 3, \dots, K$ and $y_{e_{k}}^{(h)} \in {0, 1}$ indicates the state of e_k in $h_{N, K}$ . Intuitively, the distance (6) corresponds to the number of hyperedges whose state differs between $h_{N, K}$ and $g_{N, K} (U, r)$ . We note that evaluating this distance does not require the $\sum_{k = 2}^{K} (\begin{matrix} N \\ k \end{matrix})$ computations suggested by (6), and refer to the supplement for details.

Given this notion of hypergraph distance the likelihood of observing $h_{N, K}$ , conditional on $U, r$ , and $φ$ , can be written as (7) $\begin{matrix} L (U, r, φ; h_{N, K}) \propto \prod_{k = 2}^{K} φ_{k}^{d_{k} (g_{N, K} (U, r), h_{N, K})} \\ \times {(1 - φ_{k})}^{(\begin{matrix} N \\ k \end{matrix}) - d_{k} (g_{N, K} (U, r), h_{N, K})} . \end{matrix}$ (7)

This follows by considering which hyperedges in $g_{N, K} (U, r)$ must have their state modified to match the hyperedges in $h_{N, K}$ , and which hyperedges are the same as in $h_{N, K}$ . Since our likelihood is of the same form as (Lunagómez, Olhede, and Wolfe Citation2020, proof of Proposition 3.1), it follows that hypergraphs with a greater number of hyperedge modification are less likely for $0 < φ_{k} < 1 / 2$ and so (7) behaves in an intuitive way.

The model specification is complete with the following priors for $k = 2, 3, \dots, K$ (8) $\begin{matrix} μ \sim N (m_{μ}, Σ_{μ}) Σ_{μ} \sim W^{- 1} (Φ, ν), r_{k} \sim \exp (λ_{k}), and \\ φ_{k} \sim B (a_{k}, b_{k}), \end{matrix}$ (8) where $W^{- 1} (\cdot, \cdot)$ and $B (\cdot, \cdot)$ denote an Inverse-Wishart and Beta, respectively.

3.4 Can We Improve Model Flexibility?

In a nsRGH, the constraint $r_{k} > r_{k - 1}$ , for $k = 3, \dots, K$ , ensures that the interactions are non-simplicial. However, this implies r_k will impact the higher-order hyperedges and, to improve model flexibility, we introduce additional modification parameters. In Algorithm 1 the noise $φ_{k}$ is applied independently across all hyperedges of order k and, alternatively, we can modify each hyperedge depending on its state in $g_{N, K} (U, r)$ . For $k = 2, 3, \dots, K$ , let $ψ^{0} = (ψ_{2}^{0}, ψ_{3}^{0}, \dots, ψ_{K}^{0}) \in [0, 1]$ denote the probability of modifying the state of a hyperedge in $g_{N, K} (U, r)$ from absent to present, and let $ψ^{1} = (ψ_{2}^{1}, ψ_{3}^{1}, \dots, ψ_{K}^{1}) \in [0, 1]$ denote the probability of modifying the state of a hyperedge in $g_{N, K} (U, r)$ from present to absent. Crucially, for large k, the term $ψ_{k}^{1}$ may increase to account for the discrepancy between the density of order k hyperedges in $h_{N, K}$ and $g_{N, K} (U, r)$ . This generative model is summarized in Algorithm A.1 in the supplement and, as commented in Section 3.3, we can implement this without the suggested $\sum_{k = 2}^{K} (\begin{matrix} N \\ k \end{matrix})$ likelihood computations. Further details of this are given in the supplement, as highlighted in Section 3.3.

As in Section 3.3, the likelihood of observing $h_{N, K}$ is based on a distance metric, (9) $d_{k}^{(a b)} (g_{N, K} (U, r), h_{N, K}) = # {e_{k} \in E_{N}^{k} | y_{e_{k}}^{(g)} = a \cap y_{e_{k}}^{(h)} = b},$ (9) which records the number of hyperedges that have state $a \in {0, 1}$ in $g_{N, K} (U, r)$ and state $b \in {0, 1}$ in $h_{N, K}$ . For example, $d_{k}^{(01)} (g_{N, K} (U, r), h_{N, K})$ represents the number of hyperedges absent in $g_{N, K} (U, r)$ and present in $h_{N, K}$ . Efficient evaluation of (9) is discussed in supplement E.3.2 and the likelihood conditional on $U, r, ψ^{1}$ , and $ψ^{0}$ is given by (10) $\begin{matrix} L (U, r, ψ^{1}, ψ^{0}; h_{N, K}) \propto \prod_{k = 2}^{K} [{(ψ_{k}^{1})}^{d_{k}^{(10)} (g_{N, K} (U, r), h_{N, K})} \\ \times {(1 - ψ_{k}^{1})}^{d_{k}^{(11)} (g_{N, K} (U, r), h_{N, K})} \\ \times {(ψ_{k}^{0})}^{d_{k}^{(01)} (g_{N, K} (U, r), h_{N, K})} \\ \times {(1 - ψ_{k}^{0})}^{d_{k}^{(00)} (g_{N, K} (U, r), h_{N, K})}] . \end{matrix}$ (10)

We obtain (10) in a similar way to (7), and note that (10) is equivalent to (7) when $ψ_{k}^{1} = ψ_{k}^{0} = φ_{k}$ , for $k = 2, 3, \dots, K$ .

The model specification is complete with the following priors for $k = 2, 3, \dots, K$ (11) $\begin{matrix} μ \sim N (m_{μ}, Σ_{μ}), Σ_{μ} \sim W^{- 1} (Φ, ν), r_{k} \sim \exp (λ_{k}), \\ ψ_{k}^{0} \sim B (a_{k}^{0}, b_{k}^{0}), ψ_{k}^{1} \sim B (a_{k}^{1}, b_{k}^{1}), \end{matrix}$ (11) where $W^{- 1} (\cdot, \cdot)$ and $B (\cdot, \cdot)$ denote an Inverse-Wishart and Beta, respectively.

3.5 Identifiability

For the model presented in Section 3.3, the joint distribution of the hypergraph $g_{N, K}^{*} (U, r, φ)$ and its associated nsRGH (see Definition 3.1) $g_{N, K} (U, r)$ can be decomposed as follows. (12) $\begin{matrix} p (g_{N, K}^{*} (U, r, φ), g_{N, K} (U, r) | μ, Σ, φ, r) = \\ p (g_{N, K}^{*} (U, r, φ) | g_{N, K} (U, r), φ) p (g_{N, K} (U, r) | μ, Σ, r) . \end{matrix}$ (12)

Note that a similar decomposition exists for the model given in Section 3.4.

The hyperedges expressed by the conditional distribution $p (g_{N, K} (U, r) | μ, Σ, r)$ are determined through the positions of their relative latent coordinates and are therefore unchanged given distance-preserving transformations of U and scaling of U and r. We address this non-identifiability by defining U on the Bookstein space of coordinates (see Bookstein Citation1986; Dryden and Mardia Citation1998, sec. 2.3.3 and details in Supplement B) which determine a translation, rotation and rescaling of U with respect to a set of anchor points. These anchor points are fixed throughout the posterior sampling procedure (see Section 5) and also address the non-identifiability associated with rescaling r.

Removing non-identifiability in U is typically achieved via Procrustes analysis (Dryden and Mardia Citation1998) in which a post-processing step is included to remove the effect of distance-preserving transformations of U (Hoff, Raftery, and Handcock Citation2002). This approach can be used for our model, providing that scaling between U and r is accounted for. By using Bookstein coordinates, we instead work directly with our posterior samples on a quotient space (Dryden and Mardia Citation1998) which accounts for distance-preserving transformations. This approach avoids the additional computation associated with the Procrustes transformation.

Finally, a hyperedge in our model can be expressed by either the geometric or noise component. To maintain the properties imposed by $g_{N, K} (U, r)$ , we wish to keep the noise $φ$ small. When there are few observed hyperedges, it will become increasingly difficult to distinguish between these competing sources. This implies an additional source of non-identifiability which we address by ensuring the noise terms $φ$ are sufficiently small via prior distributions.

4 Theoretical Results

We now study the behavior of the node degree in the hypergraph model detailed in Algorithm 1. Since the nodes in our hypergraph model are exchangeable, it is sufficient to study the degree properties of the ith node, and it is straightforward to extend the results in this section to the model detailed in Algorithm A.1.

To begin, we make explicit the following properties of our hypergraph model: (P1) A hypergraph generated from our model is a modification of a nsRGH (see Definition 3.1), and (P2) Conditional on U and r, hyperedges of each order occur independently in our model. We also make the following assumptions: (A1) The number of nodes N and maximum hyperedge order K are fixed, and (A2) The covariance of the latent coordinates Σ is diagonal, where $Σ_{l l} = σ_{l l}^{2}$ for $l = 1, 2,$ $\dots, d$ . Note that (A2) is not restrictive since we can obtain this by applying a distance-preserving transformation to our point cloud in $R^{d}$ .

Following Section 3, we denote a hypergraph obtained by modifying the nsRGH $g_{N, K} (U, r)$ as $g_{N, K}^{*} (U, r, φ)$ , and we let $y_{e_{k}}^{(g)}$ and $y_{e_{k}}^{(g^{*})}$ indicate the presence or absence of the hyperedge e_k in $g_{N, K} (U, r)$ and $g_{N, K}^{*} (U, r, φ)$ , respectively. The degree of node i in a hypergraph is given by the number of hyperedges which contain i and, for node i in $g_{N, K} (U, r)$ , we denote the order k degree as ${Deg}_{(i, k)}^{(g)} = \sum_{{e_{k} \in E_{N}^{k} | i \in e_{k}}} y_{e_{k}}^{(g)}$ and the full degree as ${Deg}_{(i)}^{(g)} = \sum_{k = 2}^{K} {Deg}_{(i, k)}$ . Analogous expressions for $g_{N, K}^{*} (U, r, φ)$ are obtained by replacing g with $g^{*}$ . To organize our results, we separate the discussion into hyperedges with order k = 2 and $k \geq 3$ . Proofs can be found in Supplement F.

4.1 Degree Properties for k = 2

We begin by restating a result from Rastelli, Friel, and Raftery (Citation2016) which allows us to express the degree distribution and average degree for order k = 2 hyperedges in our model. Throughout, we write hyperedges of order 2 that contain node i as {i, j}.

Theorem 4.1

(Reproduced from (Rastelli, Friel, and Raftery Citation2016, Theorem 1)). Let $p_{{i, j}} (u_{i} | θ)$ denote the probability of node i with latent coordinate u_i belonging to an order 2 hyperedge in the model detailed in Section 3.3, conditional on $θ = (r_{2}, φ_{2}, μ, Σ)$ . The degree distribution of order 2 hyperedges for the ith node for our hypergraph model, conditional on θ, is given by (13) $\begin{matrix} p ({Deg}_{i, 2}^{(g^{*})} = l | θ) = \int p (u_{i} | μ, Σ) (\begin{matrix} N - 1 \\ l \end{matrix}) p_{{i, j}} \\ {(u_{i} | θ)}^{l} {(1 - p_{{i, j}} (u_{i} | θ))}^{N - 1 - l} d u_{i}, \end{matrix}$ (13) where ${Deg}_{(i, 2)}^{(g)} = \sum_{{e_{2} \in E_{N, 2} | i \in e_{2}}} y_{e_{2}}^{(g)}$ , and the average degree is given by (14) ${\bar{d}}_{2}^{(g^{*})} (θ) = (N - 1) \int p (u_{i} | μ, Σ) p_{{i, j}} (u_{i} | θ) d u_{i} .$ (14)

Proof.

See Supplement F. □

In order to apply this result, we must derive an expression for $p_{{i, j}} (u_{i} | θ)$ . This is given in the following proposition.

Proposition 4.1

(Probability of an order k = 2 hyperedge). Let $e_{2} = {i, j}$ denote an order k = 2 hyperedge. The probability of e₂ occurring in $g^{*}$ , given that node i has latent position u_i , can be expressed as (15) $\begin{matrix} p_{{i, j}} (y_{i j}^{(g^{*})} = 1 | θ_{1}) = p (y_{{i, j}}^{(g)} = 1 | θ_{2}) (1 - φ_{2}) \\ + (1 - p (y_{{i, j}}^{(g)} = 1 | θ_{2})) φ_{2}, \end{matrix}$ (15) where $θ_{1} = (u_{i}, r_{2}, φ_{2}, μ, Σ), θ_{2} = (u_{i}, r_{2}, μ, Σ)$ and $p (y_{{i, j}}^{(g)} = 1 | θ_{2})$ denotes the probability of $e_{2} = {i, j}$ being present in $g_{N, K} (U, r)$ when node i is positioned at u_i . It follows that $p (y_{{i, j}}^{(g)} = 1 | θ_{2}) = p ({(U_{j}^{*})}^{T} U_{j}^{*} \leq 4 r_{2}^{2})$ , where $U_{j}^{*} = U_{j} - u_{i} \sim N (μ - u_{i}, Σ)$ . Then (16) ${(U_{j}^{*})}^{T} U_{j}^{*} \sim \sum_{l = 1}^{d} σ_{l}^{2} χ_{1, {({(μ - u_{i})}_{l} / σ_{l})}^{2}}^{2},$ (16) where ${(μ - u_{i})}_{l}$ is the lth component of $(μ - u_{i})$ , for $l = 1, 2, \dots, d$ and $Σ = diag (σ_{1}^{2}, σ_{2}^{2}, \dots, σ_{d}^{2})$ .

Proof.

See Supplement F. □

Theorem 4.1 and Proposition 4.1 allow us to evaluate properties of the node degree of order k = 2 hyperedges, where the intractable integrals (13) and (14) are evaluated using numerical methods. It is straightforward to calculate (16) when the variances are constant, but for general diagonal Σ, we require numerical evaluation as implemented in the R package CompQuadForm (Duchesne and de Micheaux Citation2010). The expression for the connection probabilities in Proposition 4.1 is validated for an example in .

Fig. 4 Comparison of theoretical (solid) and Monte Carlo (dashed) estimates of $p (y_{e_{k, i}}^{(g)} = 1 | u_{i}, r_{k}, μ, Σ)$ for varying r_k . We take $Σ = diag (1, 1), μ = (0, 0)$ and consider connection probabilities for k = 2, 3, 4. In the left plot $u_{i} = μ$ and in the right plot $u_{i} = (1, 2)$ . The same study with $Σ = diag (1, 2)$ is provided in Supplement F.3.

4.2 Degree Properties for $k \geq 3$

We now present some results for the order k degree distribution. First, note that the expression for the degree distribution of order k = 2 hyperedges, given in Theorem 4.1, arises since hyperedges {i, j} occur independently given u_i . This is no longer true for higher-order hyperedges since, for example, if k = 3 it is clear that hyperedges {i, j, l} and {i, j, m} are not independent given i. Deriving an exact expression for the order k degree distribution is therefore challenging, and we consider an approximation of this in the following proposition where we denote an order k hyperedge containing node i as $e_{k, i}$ .

Proposition 4.2

(Approximate order k degree distribution). Let $p_{e_{k, i}} (y_{e_{k, i}}^{(g^{*})} = 1 | θ_{k})$ denote the probability of node i with latent position u_i belonging to an order k hyperedge, denoted as $e_{k, i}$ , in the model detailed in Section 3.3, conditional on $θ_{k} = (u_{i}, r_{k}, φ_{k}, μ, Σ)$ . Conditional on $r_{k}, φ_{k}, μ$ , and Σ, the degree distribution of order $k \geq 3$ hyperedges for the ith node for our hypergraph model is approximately (17) $\begin{matrix} p ({Deg}_{i, k}^{(g^{*})} = l | θ) \\ \approx \int p (u_{i} | μ, Σ) \frac{{(λ (N, k, θ_{k}))}^{l} \exp (- λ (N, k, θ_{k}))}{l!} d u_{i} \end{matrix}$ (17) where $λ (N, k, θ_{k}) = (\begin{matrix} N - 1 \\ k - 1 \end{matrix}) p (y_{e_{k, i}}^{(g^{*})} = 1 | θ_{k})$ and ${Deg}_{(i, k)}^{(g^{*})} = \sum_{{e_{k} \in E_{N}^{k} | i \in e_{k}}} y_{e_{k}}^{(g^{*})}$ .

Proof.

See Supplement F. □

This proposition relies on a Poisson approximation to the sum of dependent Bernoulli trials. Bounds for the quality of this approximation have been studied (e.g., see Teerapabolarn Citation2014), and it is well-understood that this approximation will degrade as either the number of possible hyperedges, as controlled by N and k, or the connection probability $p_{e_{k, i}} (u_{i} | r_{k}, φ_{k}, μ, Σ)$ grows, or both.

As in the previous section, we require an expression for $p_{e_{k, i}} (u_{i} | r_{k}, φ_{k}, μ, Σ)$ to explore this distribution. As in Proposition 4.1, an order k hyperedge can be expressed either by the geometric or noise component of our model, where the probability of occurrence in $g_{N, K} (U, r)$ is the more challenging to determine. Recall that this quantity is given by the probability that the balls of radius r_k corresponding to the nodes in $e_{k, i}$ have a non-empty intersection, so that the hyperedge $e_{k, i} = {i, i_{2}, i_{3}, \dots, i_{k}}$ is present in $g_{N, K} (U, r)$ if $\cap_{j \in e_{k, i}} B_{r_{k}} (u_{j}) \neq \emptyset$ . This condition is equivalent to the coordinates ${u_{j}}_{j \in e_{k, i}}$ lying within a ball of radius r_k (see Supplement E.3.1). Since the latent coordinates are assumed to follow a Normal distribution, evaluating this probability requires the evaluation of an intractable integral (see Gilliland Citation1962). To address this we consider an approximation of this quantity in which the region of integration is approximated by a square of side length s_k , denoted as $S_{s_{k}}$ . We choose $s_{k} = \sqrt{π} r_{k}$ so that it has an equal area to a disk of radius r_k . This choice simplifies the region of integration by allowing each dimension to be considered independently. Alternative choices for the distribution on the latent coordinates, such as the uniform, may be more amenable to analytic expressions for the degree distribution and we leave exploration of this to future work.

Lemma 4.1

(Probability of an order $k \geq 3$ hyperedges). We can express the probability of an order k hyperedge containing node i, denoted as $e_{k, i}$ , being present in $g^{*}$ conditional on the latent coordinate u_i as (18) $\begin{matrix} p (y_{e_{k, i}}^{(g^{*})} = 1 | θ_{1, k}) = p (y_{e_{k, i}}^{(g)} = 1 | θ_{2, k}) (1 - φ_{k}) \\ + (1 - p (y_{e_{k, i}}^{(g)} = 1 | θ_{2, k})) φ_{k}, \end{matrix}$ (18) where $θ_{1, k} = (u_{i}, r_{k}, φ_{k}, μ, Σ), θ_{2, k} = (u_{i}, r_{k}, μ, Σ)$ and $p (y_{e_{k, i}}^{(g)} = 1 | θ_{2, k})$ is the probability of the order k hyperedge $e_{k, i} = {i, i_{2}, i_{3},$ $\dots, i_{k}}$ being present in $g_{N, K} (U, r)$ given u_i .

The probability of $e_{k, i}$ occurring in $g_{N, K} (U, r)$ with $U_{j} \sim N (μ, Σ)$ , conditional on u_i , can then be approximated as follows. (19) $\begin{matrix} p (y_{e_{k, i}}^{(g)} = 1 | θ_{2, k}) \approx \prod_{l = 1}^{d} [(k - 1) \int f (u | μ_{l}, σ_{l}) \\ \times {[F (u + \sqrt{π} r_{k} | μ_{l}, σ_{l}) - F (u | μ_{l}, σ_{l})]}^{k - 2} \\ \times 1 (u_{i} \in (u, u + \sqrt{π} r_{k})) d u \\ + {[F (u_{i} + \sqrt{π} r_{k} | μ_{l}, σ_{l}) - F (u_{i} | μ_{l}, σ_{l})]}^{k - 1} ] , \end{matrix}$ (19) where $f (\cdot | μ_{l}, σ_{l})$ and $F (\cdot | μ_{l}, σ_{l})$ denote the pdf and cdf of the random variable with distribution $N (μ_{l}, σ_{l}^{2})$ , μ_l denotes the lth element of μ and $Σ = diag (σ_{1}^{2}, σ_{2}^{2}, \dots, σ_{d}^{2})$ .

Proof.

See Supplement F. □

These results allow us to examine the behavior of the degree distribution. We consider the quality of the $p (y_{e_{k, i}}^{(g)} = 1 | u_{i}, r_{k}, μ, Σ)$ approximation and the Poisson approximation in and , respectively. Since hyperedges in our model may come from the latent coordinates and or the random noise, similar connection probabilities can occur through different parameter combinations. However, we should find such combinations have notable differences in the degree distribution and other properties such as, for instance, motif counts and measures of transitivity. The above calculations suggest that examining such properties theoretically is likely to be challenging. Techniques such as those used in Kahle (Citation2017) may be applied here, however, our choice of underlying distribution on the latent coordinates is likely to limit their applicability.

Fig. 5 Comparison between empirical (dashed line) and Poisson approximation (points) of the order k = 3 degree distribution conditional on the latent coordinate u_i . We take $N = 10, μ = (0, 0), Σ = diag (1, 1)$ and evaluate the distribution for $r_{3} \in (0.1, 0.4, 1.0)$ . The left plot shows $u_{i} = μ$ and the right plot shows $u_{i} = (1, 2)$ . The equivalent Figures with $Σ = diag (1, 2)$ and N = 20 are given in Supplement F.3.

5 Posterior Sampling

Here we outline our posterior sampling scheme for the model detailed in Section 3.3 and note that this can be modified to sample from the model in Section 3.4. Posterior samples are obtained via a Metropolis-Hastings-within-Gibbs MCMC scheme (Gamerman and Lopes Citation2006, sec. 6.4.2) with a delayed acceptance (DA) update (Banterle et al. Citation2019) for the latent positions and algorithmic details of our MCMC scheme are given in Supplement E.

First, recall that we define U on the Bookstein space of coordinates (Section 3.5) so that a set of anchor points remain fixed throughout estimation. For $u_{i} \in R^{d}$ , let the d anchor points be denoted by ${u_{B_{1}}, u_{B_{2}}, \dots, u_{B_{d}}}$ . Then, for $i \in {1, 2, \dots, N} \ {B_{1}, B_{2}, \dots, B_{d}}$ we propose $U^{*}$ with ith entry $u_{i}^{*} = u_{i} + ϵ_{u}$ where $ϵ_{u} \sim N (0, σ_{u} I_{d})$ and $u_{j}^{*} = u_{j}$ for $j \neq i$ . This proposal is then accepted as a sample from $p (U | μ, Σ, r, ψ^{0}, ψ^{1}, h_{N, K})$ with probability (20) $A R_{u_{i}} = \min {1, \frac{L (U^{*}, r, ψ^{1}, ψ^{0}; h_{N, K}) p (u_{i}^{*} | μ, Σ)}{L (U, r, ψ^{1}, ψ^{0}; h_{N, K}) p (u_{i} | μ, Σ)}} .$ (20)

Calculating the hypergraph $g_{N, K} (U, r)$ presents a computational bottleneck in evaluating the likelihood (20). We utilize the GUDHI library (The GUDHI Project Citation2021) to perform this calculation and find that the cost of calculating the order k hyperedges in $g_{N, K} (U, r)$ grows with both N and k. This suggests our target is a natural candidate for delayed acceptance in which terms of the likelihood are sequentially included in order of increasing computational cost, allowing for a faster and computationally cheaper rejection of poor proposals. Let $h_{N}^{k}$ denote the observed hyperedges of order k and write (21) $\begin{matrix} A R_{u_{i}} = \min {1, \underset{ρ_{2} (U, U^{*})}{\underset{⏟}{\frac{p (h_{N, 2} | U^{*}, r_{2}, ψ_{2}^{1}, ψ_{2}^{0}) p (u_{i}^{*} | μ, Σ)}{p (h_{N, 2} | U, r_{2}, ψ_{2}^{1}, ψ_{2}^{0}) p (u_{i} | μ, Σ)}}} \\ \times \underset{ρ_{k} (U, U^{*})}{\underset{⏟}{\prod_{k = 3}^{K} \frac{p (h_{N}^{k} | U^{*}, r_{k}, ψ_{k}^{1}, ψ_{k}^{0})}{p (h_{N}^{k} | U, r_{k}, ψ_{k}^{1}, ψ_{k}^{0})}}}} , \end{matrix}$ (21) where $p (h_{N}^{k} | U, r_{k}, ψ_{k}^{1}, ψ_{k}^{0})$ denotes the contribution of the order k hyperedge terms in (10), for $k = 2, 3, \dots, K$ . Following Banterle et al. (Citation2019), we take $ρ_{k} (U, U^{*})$ as specified in (21) and accept $U^{*}$ if each intermediary rate $\min {1, ρ_{k} (U, U^{*})}$ indicates acceptance as outlined in Algorithm E.1 in the supplement.

The kth radii is updated according to a standard random-walk MH step, where the proposal $r_{k}^{*} = r_{k} + ϵ_{r}$ with $ϵ_{r} \sim N (0, σ_{r})$ , is accepted with probability (22) $\min {1, \frac{p (h_{N}^{k} | U, r_{k}^{*}, φ_{k}) p (r_{k}^{*} | λ_{k})}{p (h_{N}^{k} | U, r_{k}, φ_{k}) p (r_{k} | λ_{k})}} .$ (22)

The remaining parameters can be sampled directly from their full conditionals and updated via a Gibbs step. Full details are presented in the supplement and we note here that our implementation also incorporates adaptive updates for U and r as detailed in Vihola (Citation2012) and implemented in the R library ramcmc (Helske Citation2021).

6 Simulations

Sections 6.1 evaluate the efficacy of our approach by investigating predictive distributions, and Section 6.2 explores the scalability of our sampling procedure. Additional simulation studies on model-depth and misspecification are presented in supplements H and K.

6.1 Assessing Model Fit

For this study, we simulate a hypergraph from the model outlined in Algorithm 1 with $N = 50, d = 2, K = 3, r = (0.35, 0.45), φ_{0} = φ_{1} = (0.001, 0.001), μ = (0, 0)$ and $Σ = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix})$ and examine the performance of our MCMC scheme through inspection of the predictive degree distribution. After 5000 post burn-in iterations we obtain posterior mean estimates to (2 significant figures) $\hat{U}$ , $\hat{r} = (0.11, 0.15), {\hat{ψ}}^{0} = {\hat{ψ}}^{1} = (0.00108, 0.00011), \hat{μ} = (0.081, 0.12)$ and $\hat{Σ} = (\begin{matrix} 0.082 & 0.015 \\ 0.015 & 0.091 \end{matrix})$ . Since the anchor coordinates specify a scaling on the estimated latent positions, we cannot make a direct comparison between the true and estimated model parameters (see and I.9).

We can instead examine the posterior predictive distribution and compare this to the simulated hypergraph. If the sampling procedure has produced meaningful estimates, we expect a reasonable correspondence between these two quantities. Since a hypergraph is a complex object, we make a comparison by summarising each hypergraph by its degree distribution. shows this separately for the degree distributions of order k = 2 and k = 3 hyperedges. Overall we see that there is reasonable overlap between these distributions. For completeness, we also summarize the output of the MCMC scheme in Supplement I where we see that the sampling procedure appears to have converged and the estimated latent coordinates correspond reasonably to the truth.

Fig. 6 Comparison of true and posterior predictive degree distributions for hyperedges of order k = 2 (left) and k = 3 (right). Vertical lines and black dots correspond to the range and median of the probabilities of observing each degree as calculated via the posterior predictive. The observed degree is shown as green triangles.

6.2 Scalability Study

Our MCMC scheme (see Section 5) updates the latent coordinates via a delayed acceptance (DA) step that allows poor proposals to be quickly rejected on a subset of the likelihood. Here we substantiate our claim that this offers computational advantages over a standard Metropolis-Hastings update on data simulated according to Algorithm 1 by comparing average per-iteration cost times for an MCMC scheme implemented with and without DA. Since the computational cost is closely connected to the number of nodes N and the density of the hypergraphs, we consider timings for hypergraphs with increasing N under two different density regimes, referred to as (R1) and (R2). We specify the parameters for each regime so that (R1) generates sparser hypergraphs than (R2). Details of the specific parameters are given in Supplement J alongside plots of the density of the simulated data.

This study allows us to examine the scalability of our approach as N grows and to demonstrate the reduction in computational cost offered by DA. reports average per-iteration times for the DA MCMC scheme for both regimes in addition to the ratio of average per-iteration times for a DA and non-DA MCMC scheme. This figure confirms our intuitions that hypergraphs with a larger N are more computationally challenging, and this cost increases as the density of the hypergraphs grows. By inspecting the right-hand plot of , we observe that incorporating DA into our MCMC offers a significant improvement in average per-iteration cost. We note that this relative improvement increases with N and is largely unaffected by the density of the hypergraphs. Whilst K remains fixed throughout this section, we do expect the improvements offered by DA to increase with K.

Fig. 7 Summary of average per-iteration costs after 200 iterations for an MCMC implemented with and without delayed acceptance (DA) for 10 datasets sampled from data regimes (R1) (black, circle) and (R2) (red, triangle). Left: average per-iteration cost in seconds for a DA scheme. Right: ratio of average per-iteration cost for a DA scheme and a scheme without DA. Numbers less than 1 imply DA offers a computational speed-up.

7 Real Data Examples

In this section we analyze two hypergraph datasets using the model from Algorithm A.1 of the supplement. These examples describe grocery co-purchases and academic coauthorship ().

Fig. 8 Visualization of the datasets considered in Section 7. The hyperedges correspond to the observations for each dataset. Each figure shows the full hypergraph (left) and a subset sampled according to a random walk (right).

7.1 Grocery Items

Here we consider a subset of the grocery transaction data available as part of the R package arules (Hahsler et al. 2007). Here, each node corresponds to a category of items and each hyperedge indicates whether a set of items were bought together. To analyze this data we take a random subset of size N = 41 with K = 6 and hold back $N^{*} = 4$ nodes for validation of the predictive distribution for additional nodes. Since the hyperedges can be expressed through the geometric or noise component of our model, we impose constraints on the noise terms to ensure that the set of hyperedges explained by the latent positions is non-empty. More specifically, we limit the order k noise terms so that they cannot exceed 90% of the density of the order k hyperedges in the observed hypergraph. With these constraints, we implemented our MCMC scheme for 100,000 iterations and discarded the first three quarters as burn-in. Figure L.14 in the supplement depicts the observed hypergraph with the nodes positioned at posterior mean of the latent coordinates $\hat{U}$ and we obtain the posterior mean estimate of the radii $\hat{r} = (1.096, 1.097, 1.101, 1.107, 1.11)$ . Traceplots for each parameter are shown in Supplement L and the average per-iteration cost was 0.012 min.

To select the $N^{*}$ nodes for which we evaluate the predictive distribution, we take the nodes with the smallest eigen-centralities in the graph obtained by connecting all node pairs who share a hyperedge. To reflect how these nodes were selected, we then sample additional latent coordinates from the posterior predictive by first obtaining $N + N^{*}$ samples from $N (μ^{(i)}, Σ^{(i)})$ and then selecting the $N^{*}$ coordinates that are furthest from the mode $μ^{(i)}$ in terms of Euclidean distance. Similarly to Section 6.1, we summarize the predictive hypergraphs by considering node degree and motif counts and reports the results of this. In particular, for each measure, we report the proportion of predictive samples which have discrepancy D from the true observed value where D denotes the absolute difference between the prediction and the truth. Overall, we see that the majority of samples have either zero or close to zero discrepancy from the observed values, suggesting that we reasonably capture the unobserved interaction patterns. We believe that the uptick in the upper tail for some of the motif counts does not have a structural explanation and is simply an artifact of this particular data example. Note that we do not observe this behavior for the coauthorship dataset presented in Section 7.2.

Fig. 9 Summary of predictive distributions for $N^{*}$ additional nodes for the Grocery dataset. For each measure we report the proportion of predictions which are distance D from the truth, where D is the absolute difference between the predictive and the truth.

7.2 Coauthorship for Statisticians

We now repeat the analysis from the previous section for a hypergraph with larger N. In this section we consider a coauthorship hypergraph constructed from the DBLP dataset,Footnote¹ presented in Mao, Sarkar, and Chakrabarti (Citation2020) and available online.Footnote² From this dataset we construct a subset of size N = 99 with K = 4 via a random walk and, similarly to Section 7.1, we hold out $N^{*} = 10$ of these nodes according to the graph eigen-centrality and use these to validate predictive inference. We fit the model outlined in Section 3.4 by implementing our MCMC for 400,000 iterations and taking the initial 3/4th of these as burn-in. Given equivalent restrictions on the noise parameters as described for the previous example, we obtain the posterior mean of the radii $\hat{r} = (0.23, 0.25, 0.31)$ and the posterior means of the latent position as shown in Figure L.14 in the supplement. The average per-iteration cost for our scheme was 0.012 min and traceplots showing convergence for the model parameters are given in supplement L. A summary of the predictive distribution of the $N^{*}$ additional nodes is given in and, for all summaries of the predicted interactions, we find that the majority of the discrepancies between the predicted and true values are zero or close to zero. This suggests the predicted interaction patterns are close to the truth.

Fig. 10 Summary of predictive distributions for $N^{*}$ additional nodes for the coauthorship dataset. For each measure we report the proportion of predictions which are distance D from the truth, where D is the absolute difference between the predictive and the truth.

To provide further justification for our assumption of increasing radii we compare our model to a restricted version with a simplicial geometric component in which r_k = r for $k = 2, 3, \dots, K$ . Since this data example is non-simplicial, we expect that the latent representation in the restricted model will be less informative for the hyperedges than the latent representation for our model. To verify this, we present the latent coordinate traceplots for a subset of the data in for both models. By comparing the middle and right panels of this figure, we see that the latent coordinates are more constrained for our model with a non-simplicial geometric component. This behavior occurs because the restricted model expresses the observed interactions by deleting hyperedges from a simplicial motif whilst our model aims to express the non-simplicial motif directly through the geometric component. Since a simplicial motif occurs when all latent positions are sufficiently close, this means that the traceplot does not allow us to distinguish between the latent positions in the restricted model. Contrast this with our model in which the node labeled “22” is distinctly separated from the nodes labeled “27” and “31”.

Fig. 11 Comparison between latent positions for subset of the coauthorship network on nodes {22, 27, 31}. Left: observed interactions. Middle: traceplot of latent positions for model with r_k = r. Right: traceplot of latent positions for our model with $r_{k} > r_{k - 1}$ .

8 Discussion

Our proposed model allows hyperedges to be expressed by a geometric or noise component, and we keep the noise term small to impose desirable properties on the hypergraphs. The geometric component of our model may be unable to express certain hypergraph relationships and this is made clear by considering, for example, that the maximum number of leaves in a star is limited by the latent space dimension. This may be mediated by choosing an alternative geometric simplicial complex to generate the nerve, increasing the probability of hyperedge modification, or taking a different specification for the latent positions.

There exist several modeling extensions which could be considered for our framework, such as modeling nonbinary hyperedges, the addition of community structure analogously to Handcock, Raftery, and Tantrum (Citation2007), or the introduction of covariate information. Additional avenues for future work follow from considering other choices for the nerve construction, latent geometry and restrictions on the radii. For example, we could instead use the more scalable Vietoris-Rips complex (Edelsbrunner and Harer Citation2010), consider non-Euclidean embeddings (Krioukov et al. Citation2010) or explore models with node-specific radii. However, it is important to stress that estimation for a given modification of our model may be challenging. Even for our proposed model, scalability remains a key limitation and this may be exacerbated by certain extensions. While we found that delayed acceptance offered computational improvements, it remains an open line of enquiry to consider how methodology to improve scalability for latent space networks (Raftery et al. Citation2012; Salter-Townshend and Murphy Citation2013) may be adapted to this setting.

Supplemental material

Supplemental Material

Download Zip (17.2 MB)

Supplemental Material

Download PDF (120 KB)

Acknowledgments

Thank you to Tin Lok James Ng for providing the Star Wars dataset analyzed in a previous version of this work and Matthew Kahle, Sandipan Roy, Brendan Murphy and Marco Battiston for helpful comments. We acknowledge the High End Computing facility at Lancaster University and are grateful to Daniel Grose for creating the associated R package.

Supplementary Materials

The supplementary materials contain additional details on Bookstein coordinates.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

We acknowledge the support of the EPSRC funded EP/L015692/1 STOR-i CDT and EPSRC grants EP/S00159X/1, EP/R01860X/1 and EP/V022636/1. This work was supported, in part, by NSF awards CAREER IIS-1149662 and IIS-1409177, by ONR awards YIP N00014-14-1-0485 and N00014-17-1-2131.

Notes

1 https://dblp.uni-trier.de/xml/

2 https://xueyumao.github.io/coauthorship

References

Balasubramanian, K. (2021), “Nonparametric Modeling of Higher-Order Interactions via Hypergraphons,” Journal of Machine Learning Research, 22, 1–35.
Web of Science ®Google Scholar
Banterle, M., Grazian, C., Lee, A., and Robert, C. P. (2019), “Accelerating Metropolis-Hastings Algorithms by Delayed Acceptance,” Foundations of Data Science, 1, 103–128. DOI: 10.3934/fods.2019005.
Google Scholar
Battiston, F., Cencetti, G., Iacopini, I., Latora, V., Lucas, M., Patania, A., Young, J.-G., and Petri, G. (2020), “Networks Beyond Pairwise Interactions: Structure and Dynamics,” Physics Reports, 874, 1–92. DOI: 10.1016/j.physrep.2020.05.004.
Web of Science ®Google Scholar
Bianconi, G., and Rahmede, C. (2017), “Emergent Hyperbolic Network Geometry,” Scientific Reports, 7, 41974. DOI: 10.1038/srep41974.
PubMed Web of Science ®Google Scholar
Bobrowski, O., and Kahle, M. (2018), “Topology of Random Geometric Complexes: A Survey,” Journal of Applied and Computational Topology, 1, 331–364. DOI: 10.1007/s41468-017-0010-0.
Google Scholar
Bookstein, F. L. (1986), “Size and Shape Spaces for Landmark Data in Two Dimensions,” Statistical Science, 1, 181–222. DOI: 10.1214/ss/1177013696.
Google Scholar
Chazal, F., and Michel, B. (2017), “An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists,” Frontiers in Artificial Intelligence, 4, 667963. DOI: 10.3389/frai.2021.667963.
Google Scholar
Chien, I., Lin, C.-Y., and Wang, I.-H. (2018), “Community Detection in Hypergraphs: Optimal Statistical Limit and Efficient Algorithms,” in International Conference on Artificial Intelligence and Statistics (Vol. 84), pp. 871–879, PMLR.
Google Scholar
Csardi, G., and Nepusz, T. (2006), “The igraph Software Package for Complex Network Research,” InterJournal, complex systems, 1695.5, 1–9.
Google Scholar
Doreian, P., Batagelj, V., and Ferligoj, A. (2004), “Generalized Blockmodeling of Two-Mode Network Data,” Social Networks, 26, 29–53. DOI: 10.1016/j.socnet.2004.01.002.
Web of Science ®Google Scholar
Dryden, I. L., and Mardia, K. V. (1998), Statistical Shape Analysis, Chichester: Wiley.
Google Scholar
Duchesne, P., and de Micheaux, P. L. (2010), “Computing the Distribution of Quadratic Forms: Further Comparisons between the Liu-Tang-Zhang Approximation and Exact Methods,” Computational Statistics and Data Analysis, 54, 858–862. DOI: 10.1016/j.csda.2009.11.025.
Web of Science ®Google Scholar
Edelsbrunner, H., and Harer, J. (2010), Computational Topology - an Introduction, Providence, RI: American Mathematical Society.
Google Scholar
Friel, N., Rastelli, R., Wyse, J., and Raftery, A. E. (2016), “Interlocking Directorates in Irish Companies Using a Latent Space Model for Bipartite Networks,” Proceedings of the National Academy of Sciences, 113, 6629–6634. DOI: 10.1073/pnas.1606295113.
PubMed Web of Science ®Google Scholar
Gamerman, D., and Lopes, H. F. (2006), Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference (2nd ed.), London: Chapman and Hall/CRC.
Google Scholar
Ghoshdastidar, D., and Dukkipati, A. (2017), “Consistency of Spectral Hypergraph Partitioning Under Planted Partition Model,” The Annals of Statistics, 45, 289–315. DOI: 10.1214/16-AOS1453.
Web of Science ®Google Scholar
Gilliland, D. C. (1962), “Integral of the Bivariate Normal Distribution over an Offset Circle,” Journal of the American Statistical Association, 57, 758–768. DOI: 10.1080/01621459.1962.10500813.
Web of Science ®Google Scholar
Hahsler, C. B., Buchta, C., Gruen, B., Hornik, K., Borgelt, C., Johnson, I., and Ledmi, M. (2021), arules: Mining Association Rules and Frequent Itemsets, R package version 1.6-8.
Google Scholar
Handcock, M. S., Raftery, A. E., and Tantrum, J. M. (2007), “Model-based Clustering for Social Networks,” Journal of the Royal Statistical Society, Series A, 170, 301–354. DOI: 10.1111/j.1467-985X.2007.00471.x.
Web of Science ®Google Scholar
Helske, J. (2021), ramcmc: Robust Adaptive Metropolis Algorithm, R package version 0.1.1.
Google Scholar
Hoff, P. D., Raftery, A. E., and Handcock, M. S. (2002), “Latent Space Approaches to Social Network Analysis,” Journal of the American Statistical Association, 97, 1090–1098. DOI: 10.1198/016214502388618906.
Web of Science ®Google Scholar
Ji, P., and Jin, J. (2016), “Coauthorship and Citation Networks for Statisticians,” The Annals of Applied Statistics, 10, 1779–1812. DOI: 10.1214/15-AOAS896.
Web of Science ®Google Scholar
Kahle, M. (2017), “Random Simplicial Complexes,” in Handbook of Discrete and Computational Geometry, eds. J. E. Goodman, J. O’Rourke, and C. D. Tóth, pp. 581–604, Boca Raton, FL: CRC Press.
Google Scholar
Kahle, M. (2014), “Topology of Random Simplicial Complexes: A Survey,” AMS Contemporary Mathematics, 620, 201–222.
Google Scholar
Kolaczyk, E. D. (2009), Statistical Analysis of Network Data: Methods and Models (1st ed.), New York: Springer.
Google Scholar
Krioukov, D., Papadopoulos, F., Kitsak, M., Vahdat, A., and Boguñá, M. (2010), “Hyperbolic Geometry of Complex Networks,” Physical Review E, 82, 036106. DOI: 10.1103/PhysRevE.82.036106.
Web of Science ®Google Scholar
Lunagómez, S., Mukherjee, S., Wolpert, R. L., and Airoldi, E. M. (2017), “Geometric Representations of Random Hypergraphs,” Journal of the American Statistical Association, 112, 363–383. DOI: 10.1080/01621459.2016.1141686.
Web of Science ®Google Scholar
Lunagómez, S., Olhede, S. C., and Wolfe, P. J. (2020), “Modeling Network Populations via Graph Distances,” Journal of the American Statistical Association, 116, 2023–2040. DOI: 10.1080/01621459.2020.1763803.
Web of Science ®Google Scholar
Mao, X., Sarkar, P., and Chakrabarti, D. (2020), “Estimating Mixed Memberships with Sharp Eigenvector Deviations,” Journal of the American Statistical Association, 116, 1928–1940. DOI: 10.1080/01621459.2020.1751645.
Web of Science ®Google Scholar
Marchette. (2021), HyperG: Hypergraphs in R, R package version 1.0.0.
Google Scholar
Ng, T. L. J., and Murphy, T. B. (2021), “Model-based Clustering for Random Hypergraphs,” Advances in Data Analysis and Classification, 16, 691–723. DOI: 10.1007/s11634-021-00454-7.
PubMed Web of Science ®Google Scholar
Penrose, M. (2003), Random Geometric Graphs. Oxford Studies in Probability. Oxford: Oxford University Press.
Google Scholar
Raftery, A. E., Niu, X., Hoff, P. D., and Yeung, K. Y. (2012), “Fast Inference for the Latent Space Network Model Using a Case-Control Approximate Likelihood,” Journal of Computational and Graphical Statistics, 21, 901–919. DOI: 10.1080/10618600.2012.679240.
PubMed Web of Science ®Google Scholar
Rastelli, R., Friel, N., and Raftery, A. E. (2016), “Properties of Latent Variable Network Models,” Network Science, 4, 407–432. DOI: 10.1017/nws.2016.23.
Web of Science ®Google Scholar
Salter-Townshend, M., and Murphy, T. B. (2013), “Variational Bayesian Inference for the Latent Position Cluster Model for Network Data,” Computational Statistics & Data Analysis, 57, 661–671. DOI: 10.1016/j.csda.2012.08.004.
Web of Science ®Google Scholar
Stasi, D., Sadeghi, K., Rinaldo, A., Petrović, S., and Fienberg, S. E. (2014), “β Models for Random Hypergraphs with a Given Degree Sequence,” ArXiv e-prints.
Google Scholar
Teerapabolarn, K. (2014), “An Improvement of Poisson Approximation for Sums of Dependent Bernoulli Random Variables,” Communications in Statistics - Theory and Methods, 43, 1758–1777. DOI: 10.1080/03610926.2012.674606.
Web of Science ®Google Scholar
The GUDHI Project (2021), GUDHI User and Reference Manual. GUDHI Editorial Board, 3.4.1 edition.
Google Scholar
Vihola, M. (2012), “Robust Adaptive Metropolis Algorithm with Coerced Acceptance Rate,” Statistics and Computing, 22, 997–1008. DOI: 10.1007/s11222-011-9269-5.
Web of Science ®Google Scholar
Wang, J.-W., Rong, L.-L., Deng, Q.-H., and Zhang, J.-Y. (2010), “Evolving Hypernetwork Model,” The European Physical Journal B, 77, 493–498. DOI: 10.1140/epjb/e2010-00297-8.
Web of Science ®Google Scholar
Wang, P., Pattison, P., and Robins, G. (2013), “Exponential Random Graph Model Specifications for Bipartite Networks—A Dependence Hierarchy,” Social Networks, 35, 211–222. DOI: 10.1016/j.socnet.2011.12.004.
Web of Science ®Google Scholar
Young, J.-G., Petri, G., Vaccarino, F., and Patania, A. (2017), “Construction of and Efficient Sampling from the Simplicial Configuration Model,” Physical Review E, 96, 032312. DOI: 10.1103/PhysRevE.96.032312.
PubMed Web of Science ®Google Scholar
Zhong, Y., Arandjelovic, R., and Zisserman, A. (2018), “Compact Deep Aggregation for Set Retrieval,” in Proceedings of the European Conference on Computer Vision Workshops.
Google Scholar
Zuev, K., Eisenberg, O., and Krioukov, D. (2015), “Exponential Random Simplicial Complexes,” Journal of Physics A: Mathematical and Theoretical, 48, 465002. DOI: 10.1088/1751-8113/48/46/465002.
Web of Science ®Google Scholar

Latent Space Modeling of Hypergraph Data

Abstract

1 Introduction