Search in:

The Journal of Mathematical Sociology Volume 41, 2017 - Issue 4

Submit an article Journal homepage

Open access

1,110

Views

CrossRef citations to date

Altmetric

Listen

Original Articles

Regression of directed graphs on independent effects for density and reciprocity

Bonne J.H. ZijlstraChild Development and Education, Universiteit van Amsterdam Faculteit der Maatschappij- en Gedragswetenschappen, Amsterdam, NetherlandsCorrespondence[email protected]

http://orcid.org/0000-0001-9924-4387

Pages 185-192 | Published online: 25 Oct 2017

Cite this article
https://doi.org/10.1080/0022250X.2017.1387858
CrossMark

In this article

ABSTRACT
1. Introduction
2. Models
3. Estimation and prior distributions for the model
4. Simulations
5. Application: reported emotional support between Dutch high school pupils
6. Concluding remarks
References

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF

ABSTRACT

In common models for dyadic network regression, the density and reciprocity parameters are dependent on each other. Here, the j₁ and j₂ models are introduced with a density parameter that represents the log odds of a single tie. Consequently, the density and reciprocity parameters are independent and the interpretation of both parameters more straightforward. Estimation procedures and simulation results for these new models are discussed as well as an illustrative example.

KEYWORDS:

MCMC
network density
network reciprocity
network regression
social networks
social support

1. Introduction

In social network analysis, a network is commonly represented by a directed graph, or digraph, reflecting presence and absence of social relations in a given set of actors. These can be partitioned into dyads, representing pairs of actors and their relations. In a null dyad, actors are mutually unconnected, an asymmetric dyad consists of a directed tie from an actor to another, which is not returned by the other, in a symmetric or reciprocal dyad both actors send a tie to each other.

One of the first statistical models for directed graphs was the model by Holland and Leinhardt (Citation1981). It estimates a density and a reciprocity parameter from the data. The density parameter models the odds of an asymmetric dyad versus a null dyad and is subdivided into an overall estimate and sender and receiver effects. The reciprocity parameter models the ratio between the odds of a symmetric dyad versus an asymmetric dyad and the odds of an asymmetric dyad versus a null dyad. The model (Van Duijn Citation1995; Van Duijn, Snijders and Zijlstra, Citation2004) is a successor to the model. It enables the inclusion of predictor variables through the introduction of random sender and receiver parameters. Predictors can be included on the actor level for sender and receiver effects and at the dyad level for density and reciprocity.

The and the models are based on a multinomial representation of the four possible dyadic outcomes. Consequently, the modeled probability of one outcome affects the probability of other outcomes and the density and reciprocity parameters are dependent on each other. This complicates the interpretation of both.

In this paper, the and models will be presented, in which the density parameter reflects the odds of a single tie, resulting in independent density and reciprocity parameters. Since these parameters are unconfounded, their interpretation is more straightforward. In addition, the density parameter is now comparable to parameters in more familiar models like logistic regression.

The model has random parameters for sending and receiving ties (activity and popularity). This sets it apart from the well-known class of exponential random graph models (ERGMs) (Robins et al. Citation2007). Even though both can be expressed as exponential models, ERGMs are more focused on, and very flexible in, modeling structure and predicting subgraphs. The model specifically aims at predicting ties.

In the remainder of this paper, the model will be further explained and the and models will be introduced in Section 2. Since estimates of the reciprocity parameter in the model are relatively likely to become unstable, specific adaptations to the estimation procedure are discussed in Section 3. This is followed by simulation results in Section 4. An illustrative example using empirical data is presented in Section 5, contrasting the and models. Finally, concluding remarks are given in Section 6.

2. Models

2.1 The model

The model (Holland and Leinhardt, Citation1981) describes the probabilities of two directed dichotomous tie observations in dyads of actors and as

(1)

where is a directed tie from actor to actor taking value in the presence of a tie and in the absence, and

(2)

In Equation (1), is the density parameter for a specific dyad, solving for which yields

(3)

the odds of an asymmetric dyad versus a null dyad. In Equation (2), is substituted by for the overall log odds and and reflecting individual deviations in sending and receiving ties in an asymmetric dyad for actors and , respectively. The sender and receiver effects and sum to zero. Since an asymmetric tie never involves the same actor as sender and receiver, the sender and receiver effect tend to have a (slightly) negative relation.

Solving Equation (1) for , the reciprocity parameter for a specific dyad, results in

(4)

the ratio of the odds of a reciprocal dyad versus an asymmetric dyad and the odds of an asymmetric dyad versus a null dyad. In Equation (2), this ratio is substituted by the overall estimate .

An implication of the multinomial representation of the four possible outcomes for the dyads in Equation (1) is that a change in the modeled probability of one outcome affects the probability of other outcomes. As a result, the parameters and in the model (Holland and Leinhardt, Citation1981) are dependent on each other. This can also be observed by recognizing that the denominator in the expression for (Equation (2)) equals the expression for (Equation (3)). A visual representation of the dependence between and for different values of the probability of a single tie, , is given in .

Figure 1. Relationship between theta (θ_ij) and rho (ρ_ij) in the P₁ model for given probabilities P(Y_ij = 1)= p .

Figure 1. Relationship between theta (θij) and rho (ρij) in the P1 model for given probabilities P(Yij = 1)= p .

As illustrates, in the model the parameters and have differently shaped negative relations for different values of the probability of a tie . Both and are also related to since for larger values of both increase. When , equals the log odds of . For values , is larger than the log odds of while the opposite holds for .

2.2 The model

The new model applies probabilities to the different dyadic outcomes through the density parameter and a parameter which is indirectly used to model the reciprocity:

(5)

with

The density parameter models the log odds of a single tie, which can be seen by solving Equation (5) for ,

(6)

The reciprocity parameter in the model, , is similar to the expression for in the model (Equation (4)),

(7)

The values for in Equation (5) and thus the probabilities for the dyadic outcomes can be determined after solving for as

In the model, only depends on and consequently is independent of . Any differences in as a result of differences in are subsequently also independent of . In summary, the model has a density parameter for the log odds of and an independent reciprocity parameter for the log odds ratio (Equation (7)).

2.3 The model

The model (Van Duijn Citation1995; Van Duijn et al., Citation2004) took a major step forward by extending the model (Holland and Leinhart, Citation1981) to allow predictor variables for the sender and receiver effects and for density and reciprocity. This was achieved by introducing random sender and receiver effects, freeing up parameter space for the coefficients of the predictors. The model extends on the model analogously to the extension of the model on the model.

The density parameter is substituted by random sender and receiver effects and and an overall parameter for the log odds of . The random effects are assumed to have a bivariate normal distribution with variances , and covariance . The density parameter is further substituted by the regression parameters , , and for sender, receiver, and density predictors , , and , respectively,

The sender and receiver predictors are variables at the actor level and the density predictors are variables at the dyad level.

The parameter for reciprocity (Equation (7)) is substituted by an overall estimate of reciprocity and regression coefficients for symmetric dyadic predictors, ,

3. Estimation and prior distributions for the model

In logistic regression, estimates can become highly unstable due to a phenomenon called separation (Albert and Anderson Citation1984). This stems from sparse data being (nearly) identified for one of the outcomes. This is also observed in the and models, in particular with the extended specification of the reciprocity parameters and in the model. Since these are independent of the probability of a tie , they are relatively unstable. One way to deal with this problem is to apply an appropriate prior distribution in Bayesian simulation methods. For this reason the model is estimated with an Markov chain Monte Carlo (MCMC) algorithm (see e.g. Gelman et al. Citation2014), subsequently sampling the random effects , their covariance matrix , and the regression parameters . The prior distribution applied to is an inverse Wishart distribution. Since the model closely resembles logistic regression for the parameters , , , and in , their prior distribution is based on assuming half an observation for each outcome in logistic regression, as suggested by Gelman et al. (Citation2008). This results in the likelihood

closely resembling a -distribution with 7 degrees of freedom and scale 2.5 (see ), which was taken as the actual prior distribution. In a similar manner, a prior distribution for the parameters and in was derived. However, the prior based on assuming half an observation for all dyads still resulted in unstable estimates. Therefore, the prior distribution was based on the likelihood assuming one observation for each dyad (and ),

This distribution is close to a -distribution with 7 degrees of freedom with scale 2 (see ), which was applied as prior distribution.

Figure 2. Likelihoods serving as reference for the prior distributions for m, γ₁, γ₂ and γ₃ (left) and r and γ₄ (right) and approximating t-distributions with df = 7.

Figure 2. Likelihoods serving as reference for the prior distributions for m, γ1, γ2 and γ3 (left) and r and γ4 (right) and approximating t-distributions with df = 7.

For the regression parameters , , , and , the above priors were further adjusted to the variation in the predictors through dividing the scale of the -distributions by the observed standard deviation of , , , and . Following Raftery (Citation1996) and Gelman (Citation2008), the predictors were centered to an average of zero to assure the center of the prior distributions can be set at zero as well.

4. Simulations

To verify the effectiveness of the prior distributions against separation, simulations were run. Data were simulated according to model 3 from Zijlstra et al. (2009), containing predictors at the actor level (for receiving) as well as at the dyad level (for density and reciprocity). The variance of the random sender effects () was taken twice as large as the variance of the random receiver effects () with a negative covariance (). The density and reciprocity were and , corresponding to . A receiver effect with coefficient was added for a predictor drawn from a binary (0,1) distribution with equal probabilities. There are two density predictors; is the coefficient for a predictor drawn from a multinomial distribution with five outcomes (1, 2, 3, 4, 5) with equal probability and is the coefficient for a random digraph with . Finally, the coefficient for the predictor for reciprocity, , represents the effect of the absolute differences of the multinomial outcomes used as a predictor for the density. Different values for were applied. A very small value of was based on Zijlstra et al. (2009). To have a more complete test for separation, a substantial value of , implying a predicted range of 4 for the log of Equation (4), and a large value , implying a predicted range of 8 for the log of Equation (4), were used as well. All models were simulated 1,000 times for relatively small networks of 20 actors and medium-sized networks of 40 actors. The -distribution with was used as prior, with scales as described in Section 3. Results are presented in . Here, only is reported for the models with simulated values and because results for the other parameters were highly similar to those in the model with .

Table 1. Mean parameter estimates, posterior standard deviations,and coverage rates of the 95% and 99% coverage intervals over 1,000 replicated networks.

Display Table

For the regression parameters, all coverage rates in are relatively close to the nominal values of 95 and 99. This indicates that even for small networks of 20 actors the prior distributions for the regression parameters yield acceptable intervals for inference, avoiding separation. However, for the small networks the average point estimates for and are relatively far from their simulated values. This appears to be mainly due to the large standard deviations for these parameters. For the medium-sized networks of 40 actors, the standard deviations are smaller and the point estimates are relatively close to their simulated values.

5. Application: reported emotional support between Dutch high school pupils

To further illustrate the model, it was applied to a network from the Dutch Social Behavior Study (Baerveldt and Snijders Citation1994; Baerveldt et al. Citation2004). The network was measured on a year group of 62 high school students with 37 boys and 27 girls of ages between 16 and 18 years old. Each student was asked from whom they had received emotional support when they were depressed, for instance, after the end of a relationship or in a conflict with someone else. Two models were estimated. In model 1, a sender and receiver effect for being a boy is included as well as a density effect for (dyads of) different gender. In model 2, a reciprocity effect for different gender is added.

Results are presented in . For the model, a negative receiver effect is found for being a boy. This indicates that the probability of reporting having received emotional support from boys is smaller compared to girls. There is an additional negative density effect for “different gender,” indicating a smaller probability of receiving emotional support in dyads with different gender. Thus, given the receiver effect of “boy,” receiving emotional support from someone of similar gender is more likely. In model 2, no further effect of different gender on the reciprocity is found.

Table 2. Estimates with posterior standard deviations for reported emotional support for and models.

Display Table

For , estimates for analogue models are presented in as well. A similar receiver effect of “boy” and density effect of “different gender” was found on the odds of an asymmetric versus a null dyad. However, in model 2, the density effect of different gender is no longer significant, while there is now a significant negative reciprocity effect (assuming a significance level of 0.05). This is unexpected since the observed log odds ratio for different gender (Equation (2)) is actually higher (6.14 compared to 5.03 for same gender). In the model, such effects are possible since the sender, receiver, density, and reciprocity effects are all dependent on each other and jointly model the observed data (see also ).

Comparing between models, the standard deviation for the coefficient of the predictor for reciprocity, , is particularly large in the model. This originates from the reciprocity parameters and γ₄ in partly modeling the probability . In contrast, in the reciprocity parameters are independent of , resulting in less stable estimates.

6. Concluding remarks

With , a new model for regression of predictors on dichotomous dyadic network data was introduced. This was compared to the most similar model, the model. Both can be preferred in different situations. An advantage of is that the density and reciprocity parameters are independent, avoiding ambiguous estimates as seen for the model in the application in Section 5. Besides, the parameter for density in the model has a familiar interpretation as the log odds of a tie, comparable to logistic regression. On the other hand, when there is theoretical interest in non-reciprocated ties, the model seems to be most appropriate because the density parameter in estimates the odds of asymmetric dyads.

A property of the model that requires further investigation is that the estimation of the reciprocity parameter is unstable because of the independence from the density parameter. With the introduction of specific prior distributions, a practical solution to this problem was presented. However, an analytical solution may further improve the estimates.

For the and model, the dependence between the density and reciprocity also requires further investigation. Even though it applies the same expression for reciprocity (Equation (4)) as the and model (Equation (7)), it is not yet clear how specifically the interpretation is affected by its dependence on density. Because the reciprocity parameter in the and model is related to the probability of a single tie, the frequency of symmetric dyads, that is, (1,1), is modeled to a greater extent by the reciprocity parameter than the frequency of null dyads, that is, (0,0). On the contrary, given the probability of a single tie, the and model assign equal weight to both null and symmetric dyads. These models may therefore be of particular value in cases where there is substantial theoretical interest in the extent to which both null and symmetric dyads are observed beyond chance.

Software for estimating models is freely available from the corresponding author. The R-package (R Core Team Citation2014) dyads is expected to become available in the near future.

References

Albert, A., and J. A. Anderson. (1984). On the existence of maximum likelihood estimates in logistic regression models. Biometrika, 71(1), 1–10.
Web of Science ®Google Scholar
Baerveldt, C., and T. A. B. Snijders. (1994). Influences on and from the segmentation of networks: Hypotheses and tests. Social Networks, 16, 213–232.
Web of Science ®Google Scholar
Baerveldt, C., M. A. J. Van Duijn, L. Vermeij, and D. A. Van Hemert. (2004). Ethnic boundaries and personal choice. Assessing the influence of individual inclinations to choose intra-ethnic relationships on pupils networks. Social Networks, 26, 55–74.
Web of Science ®Google Scholar
Gelman, A. (2008). Scaling regression inputs by dividing by two standard deviations. Statistics in Medicine, 27(15), 2865–2873.
PubMed Web of Science ®Google Scholar
Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin. (2014). Bayesian data analysis (Vol. 2). Boca Raton, Florida, USA: Chapman & Hall/CRC.
Google Scholar
Gelman, A., A. Jakulin, M. G. Pittau, and Y. S. Su. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4), 1360–1383.
Web of Science ®Google Scholar
Holland, P. W., and S. Leinhardt. (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76, 33–50.
Web of Science ®Google Scholar
R Core Team. (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing: Vienna, Austria. http://www.R-project.org/
Google Scholar
Raftery, A. E. (1996). Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika, 83(2), 251–266.
Web of Science ®Google Scholar
Robins, G., P. Pattison, Y. Kalish, and D. Lusher. (2007). An introduction to exponential random graph (p*) models for social networks. Social Networks, 29(2), 173–191.
Web of Science ®Google Scholar
Van Duijn, M. A. J. (1995). Estimation of a random effects model for directed graphs. In: SSS95. Symposium Statistische Software, 7, 113–131.
Google Scholar
Van Duijn, M. A. J., T. A. B. Snijders, and B. J. H. Zijlstra. (2004). P2: A random effects model with covariates for directed graphs. Statistica Neerlandica, 58, 234–254.
Google Scholar
Zijlstra, B. J. H., M. A. J. Van Duijn, and T. A. B. Snijders. (2009). MCMC estimation for the P2 network regression model with crossed random effects. British Journal of Mathematical and Statistical Psychology, 62, 143–166.
PubMed Web of Science ®Google Scholar

Download PDF

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Regression of directed graphs on independent effects for density and reciprocity

ABSTRACT

1. Introduction

2. Models

2.1 The model

2.2 The model

2.3 The model

3. Estimation and prior distributions for the model

4. Simulations

Table 1. Mean parameter estimates, posterior standard deviations,and coverage rates of the 95% and 99% coverage intervals over 1,000 replicated networks.

5. Application: reported emotional support between Dutch high school pupils

Table 2. Estimates with posterior standard deviations for reported emotional support for and models.

6. Concluding remarks

References

Information for

Open access

Opportunities

Help and information

Regression of directed graphs on independent effects for density and reciprocity

ABSTRACT

1. Introduction

2. Models

2.1 The model

2.2 The model

2.3 The model

3. Estimation and prior distributions for the model

4. Simulations

Table 1. Mean parameter estimates, posterior standard deviations,and coverage rates of the 95% and 99% coverage intervals over 1,000 replicated networks.

5. Application: reported emotional support between Dutch high school pupils

Table 2. Estimates with posterior standard deviations for reported emotional support for and models.

6. Concluding remarks

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date