CrossRef citations to date

Methods and Approaches to Using Web Archives in Computational Communication Research


  • Abiteboul, S., Cobéna, G., Masanes, J., & Sedrati, G. (2002). A first experience in archiving the french web. In M. Agosti, & C. Thanos (Eds.), Research and advanced technology for digital libraries: 6th European conference, ECDL 2002 Rome, Italy, September 16 –18,2002 proceedings (pp. 1–15). Berlin, Heidelberg: Springer Berlin Heidelberg.
  • Adamic, L. A., & Glance, N. (2005). The political blogosphere and the 2004 U.S. election: Divided they blog. Paper presented at the 3rd International Workshop on Link Discovery, Chicago, IL.
  • Agarwal, S. D., Bennett, W. L., Johnson, C. N., & Walker, S. (2014). A model of crowd enabled organization: Theory and methods for understanding the role of twitter in the occupy protests. International Journal of Communication, 8(27), 646–672.
  • Agata, T., Miyata, Y., Ishita, E., Ikeuchi, A., & Ueda, S. (2014, 8–12 September). Life span of web pages: A survey of 10 million pages collected in 2001. Paper presented at the 14th annual international ACM/IEEE joint conference on Digital libraries, London, United Kingdom.
  • Ainsworth, S. G., Alsum, A., Salah Eldeen, H., Weigle, M. C., & Nelson, M. L. (2011). How much of the web is archived? Paper presented at the 11th annual international ACM/IEEE joint conference on Digital libraries, Ottawa, Ontario, Canada.
  • AlNoamany, Y., AlSum, A., Weigle, M. C., & Nelson, M. L. (2014). Who and what links to the Internet Archive. International Journal on Digital Libraries, 14(3–4), 101–115. doi:10.1007/s00799-014-0111-5
  • Arms, W., Huttenlocher, D., Kleinberg, J., Macy, M., & Strang, D. (2006). From wayback machine to Yesternet: New opportunities for social science. Proceedings of the 2nd International Conference on e-Social Science (Vol. 1).
  • Atefeh, F., & Khreich, W. (2015). A survey of techniques for event detection in twitter. Computational Intelligence, 31(1), 132–164. doi:10.1111/coin.12017
  • Bach, J., & Stark, D. (2004). Link, search, interact: The co-evolution of NGOs and interactive technology. Theory, Culture and Society, 21(3), 101–117. doi:10.1177/0263276404043622
  • Bainbridge, W. S. (1995). Sociology on the World Wide Web. Social Science Computer Review, 13(4), 508–523. doi:10.1177/089443939501300406
  • Bean, L. L., May, D. L., & Skolnick, M. (1978). The mormon historical demography project. Historical Methods: A Journal of Quantitative and Interdisciplinary History, 11(1), 45–53. doi:10.1080/01615440.1978.9955216
  • Beer, D., & Burrows, R. (2007). Sociology and, of and in web 2.0: Some initial considerations. Sociological Research Online, 12(5), 17. doi:10.5153/sro.1560
  • Bode, L., Hanna, A., Yang, J., & Shah, D. V. (2015). Candidate networks, citizen clusters, and political expression. The Annals of the American Academy of Political and Social Science, 659(1), 149–165. doi:10.1177/0002716214563923
  • Boyd, D., & Crawford, K. (2012). Critical questions for big data. Information, Communication & Society, 15(5), 662–679. doi:10.1080/1369118X.2012.678878
  • Brügger, N. (2009). Website history and the website as an object of study. New Media & Society, 11(1–2), 115–132. doi:10.1177/1461444808099574
  • Brügger, N. (2012). Historical network analysis of the web. Social Science Computer Review, 31(3), 306–321. doi:10.1177/0894439312454267
  • Brügger, N. (2016). Introduction: The Web’s first 25 years. New Media & Society, 18(7), 1059–1065. doi:10.1177/1461444816643787
  • Brunelle, J. F., Kelly, M., Salah Eldeen, H., Weigle, M. C., & Nelson, M. L. (2014, 8–12 September). Not all mementos are created equal: Measuring the impact of missing resources. Paper presented at the 14th annual international ACM/IEEE joint conference on Digital libraries, London, United Kingdom.
  • Bruns, A. (2007). Methodologies for mapping the political blogosphere: An exploration using the issuecrawler research tool. First Monday, 12(5). doi:10.5210/fm.v12i5.1834
  • Cappella, J. N. (2017). Vectors into the future of mass and interpersonal communication research: Big data, social media, and computational social science. Human Communication Research, n/a-n/a. doi:10.1111/hcre.12114
  • Chewning, L. V., Lai, C.-H., & Doerfel, M. L. (2012). Organizational resilience and using information and communication technologies to rebuild communication structures. Management Communication Quarterly. doi:10.1177/0893318912465815
  • Cohen, D. J. (2005). The future of preserving the past. CRM Journal, 2(2), 6.
  • Colleoni, E., Rozza, A., & Arvidsson, A. (2014). Echo chamber or public sphere? Predicting political orientation and measuring political homophily in twitter using big data. Journal of Communication, 64(2), 317–332. doi:10.1111/jcom.12084
  • Day, M. (2003). Preserving the fabric of our lives: A survey of web preservation initiatives. Lecture Notes in Computer Science, 461–472.
  • Dougherty, M., Meyer, E. T., Madsen, C. M., Van Den Heuvel, C., Thomas, A., & Wyatt, S. (2010). Researcher engagement with web archives: State of the art. Retrieved from https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1714997
  • Driscoll, K., & Walker, S. (2014). Big data, big questions| Working within a black box: Transparency in the collection and production of big twitter data. International Journal of Communication, 8, 20.
  • Erdélyi, M., Benczúr, A. A., Masanés, J., & Siklósi, D. (2009). Web spam filtering in internet archives. Paper presented at the 5th International Workshop on Adversarial Information Retrieval on the Web (AIRWeb ‘09), New York, NY.
  • Foot, K. (2006). Web sphere analysis and cybercultural studies. In D. Silver, & A. Massanari (Eds.), Critical Cyberculture Studies (pp. 88–96). New York, NY: New York University Press.
  • Foot, K., & Schneider, S. M. (2006). Web campaigning (acting with technology). Cambridge, MA: The MIT Press.
  • Foot, K., Warnick, B., & Schneider, S. M. (2005). web-based memorializing after September 11: Toward a conceptual framework. Journal of Computer-Mediated Communication, 11, (1), 72–96. doi:10.1111/j.1083-6101.2006.tb00304.x
  • Freelon, D., Lynch, M., & Aday, S. (2015). Online fragmentation in wartime. The Annals of the American Academy of Political and Social Science, 659(1), 166–179. doi:10.1177/0002716214563921
  • Gomes, D., Miranda, J., & Costa, M. (2011). A survey on web archiving initiatives. Research and Advanced Technology for Digital Libraries, 408–420.
  • Gordon, M., & Pathak, P. (1999). Finding information on the World Wide Web: The retrieval effectiveness of search engines. Information Processing & Management, 35(2), 141–180. doi:10.1016/S0306-4573(98)00041-7
  • Green, B. B., Cook, A. J., Ralston, J. D., Fishman, P. A., Catz, S. L., Carlson, J., … Thompson, R. S. (2008). Effectiveness of home blood pressure monitoring, Web communication, and pharmacist care on hypertension control: A randomized controlled trial. JAMA : the Journal of the American Medical Association, 299(24), 2857–2867. doi:10.1001/jama.299.24.2857
  • Greer, J. D., & Mensing, D. (2006). The evolution of online newspapers: A longitudinal content analysis, 1997-2003. In X. Li (Ed.), Internet newspapers: The making of a mainstream medium (pp. 13–32). Mahwah, NJ: Lawrence Erlbaum Associates.
  • Hockx-Yu, H. (2014). Access and scholarly use of web archives. Alexandria, 25(1–2), 113–127. doi:10.7227/ALX.0023
  • Holzmann, H., Goel, V., & Anand, A. (2016, June 19–23). ArchiveSpark: Efficient Web archive access, extraction and derivation. Paper presented at the 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL).
  • Kavanaugh, A. L., Fox, E. A., Sheetz, S. D., Yang, S., Li, L. T., Shoemaker, D. J., … Xie, L. (2012). Social media use by government: From the routine to the critical. Government Information Quarterly, 29(4), 480–491. doi:10.1016/j.giq.2012.06.002
  • Kelleher, C., Sangwand, T. K., Wood, K., & Kamuronsi, Y. (2010). The human rights documentation initiative at the university of Texas libraries. New Review of Information Networking, 15(2), 94–109. doi:10.1080/13614576.2010.528342
  • Khannanov, A. (2003). Internet in education: Support materials for educators. Retrieved from. http://iite.unesco.org/publications/3214612/
  • Kim, H.-J., & Lee, H.-W. (2007). Development of metadata elements for intensive web archiving. Journal of the Korean Society for Information Management, 24(2), 143–160. doi:10.3743/KOSIM.2007.24.2.143
  • Kimpton, M., & Ubois, J. (2006). Year-by-year: From an archive of the internet to an archive on the internet. In J. Masanés (Ed.), Web archiving (pp. 201–212). Berlin, Heidelberg: Springer Berlin Heidelberg.
  • Kraut, R., Kiesler, S., Boneva, B., Cummings, J., Helgeson, V., & Crawford, A. (2002). Internet paradox revisited. Journal of Social Issues, 58(1), 49–74. doi:10.1111/1540-4560.00248
  • Ksepka, D. T., & Boyd, C. A. (2012). Quantifying historical trends in the completeness of the fossil record and the contributing factors: An example using Aves. Paleobiology, 38(1), 112–125. doi:10.1666/10059.1
  • Lazer, D., Pentland, A., Adamic, L. A., Aral, S., Barabasi, A.-L., Brewer, D., … Van Alstyne, M. (2009). Computational social science. Science, 323, 721–723. doi:10.1126/science.1167742
  • Leighton, H. V., & Srivastava, J. (1999). First 20 precision among World Wide Web search services (search engines). Journal of the Association for Information Science and Technology, 50(10), 870.
  • Lin, J. (2015). Scaling down distributed infrastructure on wimpy machines for personal web archiving. Paper presented at the International Conference on World Wide Web, Florence, Italy.
  • Lin, J., Gholami, M., & Rao, J. (2014). Infrastructure for supporting exploration and discovery in web archives. Paper presented at the WWW ’14 Companion Proceedings of the 23rd International Conference on the World Wide Web, Seoul, Korea.
  • Lin, J., Kraus, K., & Punzalan, R. (2014). Supporting “distant reading” for web archives. Paper presented at Digital Humanities 2014, Lausanne, Switzerland.
  • Lin, J., Milligan, I., Wiebe, J., & Zhou, A. (2017). Warcbase: Scalable analytics infrastructure for exploring web archives. Journal Computation Cultural Herit, 10(4), 1–30. doi:10.1145/3097570
  • Lomborg, S. (2012). Researching communicative practice: Web archiving in qualitative social media research. Journal of Technology in Human Services, 30(3–4), 219–231. doi:10.1080/15228835.2012.744719
  • Lor, P. J., Britz, J., & Watermeyer, H. (2006). Everything, for ever? The preservation of South African websites for future research and scholarship. Journal of Information Science, 32(1), 39–48. doi:10.1177/0165551506059221
  • Lustick, I. S. (1996). History, historiography, and political science: Multiple historical records and the problem of selection bias. American Political Science Review, 90(3), 605–618. doi:10.2307/2082612
  • Maemura, E., Becker, C., & Milligan, I. (2016, December 5–8). Understanding computational web archives research methods using research objects. Paper presented at the 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, USA.
  • Manovich, L. (2011). Trending: The promises and the challenges of big social data. doi:10.5749/minnesota/9780816677948.003.0047
  • Masanès, J. (2006). Web archiving: Issues and methods. Web Archiving, 1–53.
  • Meyer, E. T., Schroeder, R., & Cowls, J. (2016). The net as a knowledge machine: How the Internet became embedded in research. New Media & Society, 18(7), 1159–1189. doi:10.1177/1461444816643793
  • Mi, C., Shan, X., Qiang, Y., Stephanie, Y., & Chen, Y. (2014). A new method for evaluating tour online review based on grey 2-tuple linguistic. Kybernetes, 43(3–4), 601–613. doi:10.1108/k-06-2013-0123
  • Milligan, I. (2016). Lost in the infinite archive: The promise and pitfalls of web archives. International Journal of Humanities and Arts Computing, 10(1), 78–94. doi:10.3366/ijhac.2016.0161
  • Milligan, I., Ruest, N., & Lin, J. (2016). Content selection and curation for Web archiving: The gatekeepers vs. the masses. Paper presented at the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, Newark, NJ.
  • Murata, T. (2003). Visualizing the structure of Web communities based on data acquired from a search engine. IEEE Transactions on Industrial Electronics, 50(5), 860–866. doi:10.1109/TIE.2003.817486
  • Newhagen, J. E., & Rafaeli, S. (1996). Why communication researchers should study the internet: A dialogue. Journal of Computer-Mediated Communication, 1(4). doi:10.1111/j.1083-6101.1996.tb00172.x
  • Newman, M. E. J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 8577–8582. doi:10.1073/pnas.0601602103
  • Niu, J. (2012). An overview of web archiving. D-Lib Magazine, 18(3/4). doi:10.1045/dlib.magazine
  • Ogden, J. (2016). Interrogating the politics and performativity of web archives. At JCDL2016: Joint Conference on Digital Libraries 2016: Doctoral Consortium, Newark, NJ, USA.
  • Parks, M. R. (2014). Big data in communication research: Its contents and discontents. Journal of Communication, 64(2), 355–360. doi:10.1111/jcom.12090
  • Penone, C., Davidson, A. D., Shoemaker, K. T., Di Marco, M., Rondinini, C., Brooks, T. M., … Costa, G. C. (2014). Imputation of missing data in life-history trait datasets: Which approach performs the best? Methods in Ecology and Evolution, 5(9), 961–970. doi:10.1111/2041-210X.12232
  • Price, G. (1995). The World Wide Web and the historian. History and Computing, 7(2), 104–108. doi:10.3366/hac.1995.7.2.104
  • Qin, J., Zhou, Y., Reid, E., Lai, G., & Chen, H. (2007). Analyzing terror campaigns on the internet: Technical sophistication, content richness, and Web interactivity. International Journal of Human-Computer Studies, 65(1), 71–84. doi:10.1016/j.ijhcs.2006.08.012
  • Rackley, M. (2000). Internet archive encyclopedia of library and information science (3rd ed., pp. 2966–2976). London, UK: Taylor & Francis.
  • Reyes, A. (2014). Linguistic anthropology in 2013: Super-new-big. American Anthropologist, 116(2), 366–378. doi:10.1111/aman.12109
  • Robinson, L., Cotten, S. R., Ono, H., Quan-Haase, A., Mesch, G., Chen, W., … Stern, M. J. (2015). Digital inequalities and why they matter. Information, Communication & Society, 18(5), 569–582. doi:10.1080/1369118X.2015.1012532
  • Ruths, D., & Pfeffer, J. (2014). Social media for large studies of behavior. Science, 346(6213), 1063. doi:10.1126/science.346.6213.1063
  • Salah Eldeen, H. M., & Nelson, M. L. (2012). Losing my revolution: How many resources shared on social media have been lost?. In Theory and practice of digital libraries (pp. 125–137). Springer.
  • Shah, D. V., Cappella, J. N., Neuman, W. R., Shah, D. V., Cappella, J. N., & Neuman, W. R. (2015). Big data, digital media, and computational social science. In Zaphiris P., Buchanan G., Rasmussen E., Loizides F. (eds.), Theory and Practice of Digital Libraries. TPDL 2012. Lecture Notes in Computer Science, vol. 7489. Berlin, Heidelberg: Springer.
  • Shaw, J. D. (2017). Advantages of starting with theory. Academy of Management Journal, 60(3), 819–822. doi:10.5465/amj.2017.4003
  • Shumate, M. (2003). The coevolution of a population with a community, organizations and the evironment: The emergence, evolution, and impact of HIV/AIDS NGOs. (Doctor of Philosophy). Los Angeles, CA: University of Southern California.
  • Shumate, M. (2012). The Evolution of the HIV/AIDS NGO Hyperlink Network. Journal of Computer-Mediated Communication, 17(2), 120–134. doi:10.1111/j.1083-6101.2011.01569.x
  • Shumate, M., Fulk, J., & Monge, P. (2005). Predictors of the international HIV-AIDS INGO network over time. Human Communication Research, 31, 482–511. doi:10.1111/j.1468-2958.2005.tb00880.x
  • Spaniol, M., Denev, D., Mazeika, A., Weikum, G., & Senellart, P. (2009). Data quality in web archiving. Paper presented at the Proceedings of the 3rd workshop on Information credibility on the web, Madrid, Spain.
  • Stevens, J. (2004). Long-term literary E-zine stability: Issues and access in libraries. Technical Services Quarterly, 22(1), 21–32. doi:10.1300/J124v22n01_03
  • Stucchi, M., Albini, P., Mirto, M., & Rebez, A. (2004). Assessing the completeness of Italian historical earthquake data. Annals of Geophysics, 47(2–3). doi:10.4401/ag-3330
  • Taylor, M., & Doerfel, M. L. (2003). Building interorganizational relationships that build nations. Human Communication Research, 29(2), 153–181. doi:10.1111/j.1468-2958.2003.tb00835.x
  • Topps, D., Helmer, J., & Ellaway, R. (2013). YouTube as a platform for publishing clinical skills training videos. Academic Medicine, 88(2), 192–197. doi:10.1097/ACM.0b013e31827c5352
  • Weber, M., Ognyanova, K., & Kosterich, A. (2017). Imitation in the quest to survive: Lessons from news media on the early web. International Journal of Communication, 11(2017), 5068–5092.
  • Weber, M. S. (2012). Newspapers and the long-term implications of hyperlinking. Journal of Computer-Mediated Communication, 17(2), 187–201. doi:10.1111/j.1083-6101.2011.01563.x
  • Weber, M. S., & Monge, P. (2014). Industries in turmoil: Driving transformation during periods of disruption. Communication Research, 1–30. doi:10.1177/0093650213514601
  • Weber, M. S., & Nguyen, H. (2015). Big Data? Big Issues: Degradation in Longitudinal Data and Implications for Social Sciences. Paper presented at the WebSci 2015, Oxford, UK.
  • Welles, B. F., & Contractor, N. (2015). Individual motivations and network effects. The Annals of the American Academy of Political and Social Science, 659(1), 180–190. doi:10.1177/0002716214565755
  • Williams, K. C. M. (2000). Reproduced and emergent genres of communication on the World Wide Web. The Information Society, 16(3), 201–215. doi:10.1080/01972240050133652
  • Wouters, P., Hellsten, I., & Leydesdorff, L. (2004). Internet time and the reliability of search engines. First Monday, 9, 10.
  • Zeng, R., & Greenfield, P. M. (2015). Cultural evolution over the last 40 years in China: Using the google ngram viewer to study implications of social and political change for cultural values. International Journal of Psychology, 50(1), 47–55. doi:10.1002/ijop.12125
  • Zhou, Y., Reid, E., Qin, J., Chen, H., & Lai, G. (2005). US domestic extremist groups on the Web: Link and content analysis. IEEE Intelligent Systems, 20(5), 44–51. doi:10.1109/MIS.2005.96

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.