231
Views
10
CrossRef citations to date
0
Altmetric
Articles

Delay of Social Search on Small-World Graphs

, &
 

Abstract

This article introduces an analytical framework for two small-world network models and studies the delay of targeted social search by considering messages traveling between source and target individuals in these networks. In particular, by considering graphs constructed on different network domains, such as rectangular, circular, and spherical network domains, analytical solutions for the average social search delay and the delay distribution are obtained as functions of source–target separation, distribution of the number of long-range connections and geometrical properties of network domains. Derived analytical formulas are first verified by agent-based simulations and then compared with empirical observations in small-world experiments. These formulas indicate that individuals tend to communicate with one another only through their short-range contacts and the average social search delay rises linearly, when the separation between the source and target is small. As this separation increases long-range connections are more commonly used, and the average social search delay rapidly saturates to a constant value and stays almost the same for all large values of the separation. These results do not require the dimensionality of the social space to be identical to the decay exponent of long-range social connections and are qualitatively consistent with experimental observations made by Travers and Milgram in Citation1969 as well as by others. Moreover, analytical distributions for the delay of social search predicted by the models introduced in this article are also compared with corresponding empirical distributions, and good statistical matches between them are observed. Other somewhat surprising conclusions of the article are that hubs have limited effect in reducing the delay of social search and variation in node degree distribution adversely affects this delay.

ACKNOWLEDGMENT

The authors thank the anonymous reviewers for their helpful feedback and suggestions, which resulted in significant improvements.

Notes

1Equivalently, if we follow the terminology introduced by Schenettler (Citation2009b), we can also state that the main focus and contributions of this article are on the structure and process related dimensions of small-world research, crosscutting algorithmic small-world research framework in the structural dimension and targeted search processes in the process dimension.

2For example, if we consider geographic location as the sole next step message holder selection criterion for illustrative purposes, the task of the source individual in a typical targeted search process is to search her small world at the global scale to advance the message to one of her acquaintances on the same continent as the target if source and target individuals are on different continents. Therefore, the targeted search process at the first step can be deemed to be a search problem at the global scale if source and target individuals are on different continents. Then, it becomes a search problem at the continental scale if the next step message holder and the target are in the same continent but in different countries, and then becomes a search problem at the country scale if the next step message holder and the target are in the same country but in different cities, and so on. The targeted search process continues in this manner on a nested sequence of small worlds spanning various geographical distances until the message is delivered to the target individual.

3In model 1, formation of long-range connections uniformly at random over the network domain is mainly motivated by the recent findings (Goel et al., Citation2009) indicating that the average length of acquaintance chains connecting individuals in experimental small-world studies becomes larger (around 50 individuals) than the common belief of six if the missing data is accounted for correctly. We obtain surprisingly close estimates for our model parameters in Appendix A when two different corrected experimental data sets of Goel et al. (Citation2009) accounting for missing chains are used for parameter estimation.

4We will use the term node as a technical term to refer to a network element modeling an individual in a social network.

5 Randomly in this context means that there is a sample selection process that involves stochastic dynamics while recruiting message originators (i.e., source individuals) and message recipients (i.e., target individuals). For example, Milgram and his colleagues (Milgram, Citation1967; Travers & Milgram, Citation1969; Korte & Milgram, Citation1970) used mail lists and newspaper advertisements for sample recruitment, whereas Dodds et al. (Citation2003) used the World Wide Web. Such a recruitment process may introduce sample selection bias into empirical small-world studies. In this article, we do not run into the sample selection problem since we have complete flexibility in where the source and target individuals are located in the network.

Note. T k  = T(k) for (k−1) r ≤ d < kr, where r is the radius of local friendship circles of individuals, and T(d) is the delay of social search when the social separation between source and target individuals is d. ϕ(t) is the probability generating function of the number of long-range connections, N, per node. It is defined as ϕ(t) = E[t N ].

Note. γ > 1 is the power-law decay exponent. ζ(γ) is the Reimann zeta function defined as . Polylog(γ, β i ) is the polylogarithm function defined as . λ > 0 is the mean value of the Poisson distribution. 0 < p < 1 is the shaping parameter for geometric, binomial and Bernoulli distributions. We also allow it to take values 0 and 1 for Bernoulli networks. M is a nonnegative integer determining the upper bound for binomial, uniform and Bernoulli distributions.

6The technical condition for this is to have the disk centered at the target node with radius ρ(X t , X s ) + r contained inside the network domain.

7A simple estimate for this number can be obtained as follows (Bernard et al., Citation2001; Newman, Citation2003a). Assume each individual has c friends on the average. Then, the number of friends of friends of an individual is equal to c(c − 1)/λ2, where λ is the lead-in-factor to account for triad closure. Estimates for c vary; two most commonly accepted ones being 150 (Dunbar, Citation1993) and 290 (McCarty et al., Citation2001). Taking c = 150 and λ = 1.6 as in Bernard et al. (Citation2001), the number of friends of friends of an individual can be estimated to be 8730.

8For small network sizes R = 50r, 100r and 250r, the number of chains initiated is set to 16000. For R = 500r, the number of chains initiated is set to 1600 due to limitations on simulation run times.

a The number of relay nodes is varied for each network size such that all nodes on the average have approximately 15, 30, 80, 160 and 320 local contacts.

9For example, if the target is a stock broker in Boston, the message ought to be with individuals living in Boston and working in related professions for the very last steps.

10Unlike model 1, we do not require a randomly formed long-range contact to be always outside of the local neighborhood of a node in model 2. This will make the analytical formulas simpler, since otherwise, we will need to consider situations in which a long-range contact of a node lies in the intersection of the local neighborhood of this node and the relevant disks centered around the target node while calculating β i, k 's.

11Sample selection bias, low participation rates and chain attrition rates (Kleinfeld, Citation2002; Schenettler, Citation2009a, Citation2009b) lead to underestimated values for the delay of social search in empirical studies. On the other hand, the fact that individuals do not always remember and choose their best contacts to reach the target (McCarty et al., Citation2001; Killworth, McCarty, Bernard, & House, Citation2006; Bell, Belli-McQueen, & Haider, Citation2007) causes the delay numbers to be overestimated in empirical studies, and therefore the net effect of these conflicting forces are undetermined. Hence, we do not consider the chain attrition rates in the analysis above. If it is taken to be 0.25 as in Watts et al. (Citation2002), we can obtain a p value of 0.22 for model 1 and 0.56 for model 2 with R and d set to 45r and 22r, respectively, in both models, different from the R (and d) values set to 32r (15r) and 28r (13r) above due to attrition.

12In previous studies, it has been observed that even though node degree distributions in real social networks may have heavy tails, they still exhibit quite severe cutoffs (Bernard, Johnsen, Killworth, & Robinson, Citation1991; Kossinets & Watts, Citation2006; Rapoport, Citation1963). As a result, these distributions are never scale-free at all scales. The fundamental reason behind this behavior is that maintaining social relationships is costly, and therefore people cannot have an infinite number friends (Amaral, Scala, Barthelemy, & Stanley, Citation2000). Our models are flexible enough that scale-free distributions for the number of long-range connections can be easily replaced with any right skewed node degree distribution by calculating its probability generating function, and then the same analysis in this section can be repeated for those right skewed distributions.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.