Browse
We’re here to help

Find guidance on Author Services

Search
Browse
We’re here to help

Find guidance on Author Services

Home
All Journals
Internet Mathematics
List of Issues
Volume 10, Issue 3-4
Estimating Sizes of Social Networks via ....

Search in:

Advanced search

Internet Mathematics Volume 10, 2014 - Issue 3-4: Searching and Mining the Web and Social Networks

Journal homepage

165

Views

CrossRef citations to date

Altmetric

Original Articles

Estimating Sizes of Social Networks via Biased Sampling

Liran KatzirYahoo Labs, Building 3, Matam Park, Haifa31905, IsraelCorrespondence[email protected]

Edo LibertyYahoo Labs, 111 West 40th Street, New York, NY10018, USACorrespondence[email protected]

Oren SomekhYahoo Labs, Building 3, Matam Park, Haifa31905, IsraelCorrespondence[email protected]

Ioana A. CosmaDepartment of Mathematics and Statistics, University of Ottawa, 585 King Edward Street, Ottawa, ONK1N 6N5, CanadaCorrespondence[email protected]

Pages 335-359 | Published online: 15 Sep 2014

Cite this article
https://doi.org/10.1080/15427951.2013.862883
CrossMark

Sample our Mathematics & Statistics journals, sign in here to start your FREE access for 14 days

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/15427951.2013.862883?needAccess=true

Abstract

This article presents algorithms for estimating the number of users in online social networks. Although such networks sometimes publish such statistics, there are good reasons to validate their reports. The proposed schemes can also estimate the cardinality of network subpopulations. Because this information is seldom voluntarily divulged, such algorithms must operate only by interacting with the social networks’ public Applications Programming Interfaces (APIs). No other external information can be assumed. Due to obvious traffic and privacy concerns, the number of such interactions is severely limited. We therefore focus on minimizing the number of API interactions needed for producing good-sized estimates.

We adopt the standard abstraction of social networks as undirected graphs and perform random walk-based node sampling. By counting the number of collisions or nonunique nodes in the sample, we produce a size estimate. Then we show analytically that the estimate error vanishes with high probability for fewer samples than those required by prior-art algorithms. Moreover, although provably correct for any graph, our algorithms excel when applied to social network-like graphs. The proposed algorithms were evaluated on synthetic and real social networks such as Facebook, IMDB, and DBLP. Our experiments corroborate the theoretical results and demonstrate the effectiveness of the algorithms.

Notes

Moreover, the published statistics do not provide an estimate for connected sub-graphs, e.g., 20-30 year olds living in the US.

Note that online social networks’ public APIs provide lists of connected users for every user. Thus, acting like a neighbor list representation of the graph.

Sampling uniformly from this table is possible in our setting because random walks on graphs sample edges uniformly.

Other names for this method or closely related ones, include capture-recapture, capture-mark-recapture, mark-recapture, and mark-release-recapture.

We make a more general statement later in this paper.

In fact, the authors try to compare between two different search services but their approach is suitable for this task as well.

Compared to the definitions in [CitationKatzir et al. 11] here R includes an extra −r term. This is due to examining instead of . This is numerically insignificant because Ψ₁Ψ₋₁ is typically Θ(r²) but it makes the analysis slightly simpler.

Since the random walk sampling is performed on the full graph, which is assumed to be connected, the algorithm is agnostic to the subgraph connectivity property.

The DBLP database can be found at http://dblp.uni-trier.de/xml/.

The IMDB database can be found at ftp://ftp.fu-berlin.de/pub/misc/movies/database/.

The Facebook uniformly sampled crawl can be found at http://odysseas.calit2.uci. edu/research/.

[Katzir et al. 11] L. Katzir, E. Liberty, and O. Somekh. Estimating Sizes of Social Networks via Biased Sampling. In Proceedings of the 20th International Conference on World Wide Web (WWW’11), pp. 597–606. New York, NY: ACM, 2011.

Google Scholar

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Username Password

Forgot password?

Keep me logged in (not suitable for shared devices).

You will otherwise be logged out automatically, after a limited period, and will need to log in again.

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later Item saved, go to cart

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 61.00 Add to cart

PDF download + Online access Checkout

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Information for

Authors
R&D professionals
Editors
Librarians
Societies

Open access

Overview
Open journals
Open Select
Dove Medical Press
F1000Research

Opportunities

Reprints and e-prints
Advertising solutions
Accelerated publication
Corporate access solutions

Help and information

Help and contact
Newsroom
All journals
Books

Keep up to date

Sign me up

Taylor and Francis Group Facebook page

Taylor and Francis Group X Twitter page

Taylor and Francis Group Linkedin page

Taylor and Francis Group Youtube page

Taylor and Francis Group Weibo page

Registered in England & Wales No. 3099067
5 Howick Place | London | SW1P 1WG

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research