Abstract
Sybil attacks, in which an adversary forges a potentially unbounded number of identities, are a danger to distributed systems and online social networks. The goal of sybil defense is to accurately identify sybil identities.
This article surveys the evolution of sybil defense protocols that leverage the structural properties of the social graph underlying a distributed system to identify sybil identities. We make two main contributions. First, we clarify the deep connection between sybil defense and the theory of random walks. This leads us to identify a community detection algorithm that, for the first time, offers provable guarantees in the context of sybil defense. Second, we advocate a new goal for sybil defense that addresses the more limited, but practically useful, goal of securely white-listing a local region of the graph.
Notes
Although this goal may be more accurately characterized as sybil detection [CitationViswanath et al. 12a], we use here the term sybil defense originally proposed by [CitationYu et al. 08] and widely adopted in the literature.
Henceforth, mentions of sybil defense, unless specified otherwise, refer to techniques that leverage the structure of social networks.
More specifically: conductance is at the heart of social network-based sybil defense [CitationYu et al. 06]; the clustering coefficient has been used for sybil defense in a recent work [CitationYang et al. 11]; node degrees are used as a feature in a recent defense technique based on machine learning [CitationYang et al. 13]; and the distance between nodes plays a fundamental role in other recent defense schemes [CitationXu et al. 10, CitationViswanath et al. 12b].
For a more comprehensive treatment, see [CitationMitzenmacher and Upfal 05] and [CitationDubhashi and Panconesi 09].
Although in practice it is neither necessary nor likely, this assumption, without qualitatively altering our conclusions, leads to simple bounds on the effort required to make attacks undetectable to defenses based on popularity, network diameter, and clustering coefficient. Note that neither the conductance bound nor the theorems about ACL (see Section 6) rely on this assumption.
Note that any hope of using an increase in conductance as an indication of a possible attack is futile, because the adversary can always insure that conductance is below a threshold by creating a sparse cut in S.
The original proposal for Mislove’s algorithm [CitationMislove et al. 10] relies on a normalized conductance metric, but in the context of sybil defense the protocol is evaluated using just conductance [CitationViswanath et al. 10]. For consistency, we follow the approach of the second work.
Furthermore this attack can be modified to withstand also the preprocessing defined in Section 3.2. For instance, to avoid a preprocessing of nodes with degree <5, the attacker can add in the sybil region a series s0, s1, …, sn of sybil nodes as before. Each sybil node si is connected to the previous four sybil nodes si − 1, …, si − 4 (if they exist) and the four consecutive sybil nodes si + 1, …, si + 4 (if they exist). Furthermore s0, s1, and sn are connected to v. In this setting it is possible to see that if initially v picks s0, it will then pick all the nodes in the sybil region in sequence. If node v has no honest neighbor of degree 5 (after preprocessing), then the entire sequence of sybil nodes is admitted before any of his honest neighbors.
The ACL algorithm [CitationAndersen et al. 07] is actually defined in terms of a lazy version of the walk, in which at every step there is a probability of 1/2 of remaining in the same node. For the purpose of this study the two definitions are equivalent up to a simple change in α, so for simplicity here we use the standard random walk.
Note that the theorem would not hold if we used the PPR score directly.