224
Views
3
CrossRef citations to date
0
Altmetric
Articles

Storing, preprocessing and analyzing tweets: finding the suitable noSQL system

ORCID Icon, ORCID Icon & ORCID Icon
Pages 586-595 | Received 28 May 2020, Accepted 01 Nov 2020, Published online: 17 Nov 2020
 

Abstract

In the past few years, Tweets have been widely used to perform Big Data analysis. However, the incredible amount of data captured by Twitter needs to be stored for further processing which may be a challenging task for many database systems. NoSQL is a generation of databases that aim to handle a large volume of data. However there is a large set of NoSQL systems, each has its own characteristics. Consequently choosing the suitable NoSQL system to handle Tweets is challenging. Based on these motivations, this work is carried out to find the suitable NoSQL system to manage Tweets. This paper presents the requirements of managing Tweets and provides a detailed comparison of five NoSQL systems namely, Redis, Cassandra, MongoDB, Couchbase and Neo4j regarding these requirements. The five NoSQL systems are compared in a real scenario where we collect and analyze 1.000.000 Tweets. The chosen scenario enables to evaluate not only the performance of the read and write operations, but also other requirements related to Tweets management such as scalability, analysis tools support and analysis languages support. The obtained results show that Couchbase is the most suitable NoSQL systems for managing Tweets.

GRAPHICAL ABSTRACT

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

Additional information

Notes on contributors

Souad Amghar

Souad Amghar is a PhD Student since December 2016, in Computer Science and Telecommunications Research Laboratory (LRIT), at Faculty of Sciences, Mohamed V University in Rabat, Morocco. Her research focuses on Big Data management and integration in the context of NoSQL systems. Her research areas include data science, data management, Nosql databases, Internet of Things, data integration, and model driven engineering.

Safae Cherdal

Safae Cherdal is a Ph.D. in Computer science and a collaborator researcher at the Computer Science and Telecommunications Research Laboratory LRIT which is located in the faculty of Sciences of Rabat – Mohammed V University in Morocco. She is a part-time Professor at Mohammed V University, and other institutions like INSEA, IGA. Her research interests include NoSQL databases, Big Data, Internet of Things, Model Driven Engineering, Modeling languages, formal specification & verification, and Petri nets.

Salma Mouline

Salma Mouline is a Professor of Computer Science at Mohammed V University in Rabat, Morocco, where she is a member of the computer Science and Telecommunications Research Laboratory. She received the Ph.D in Computer Science from the University of Grenoble, France. Her research interests include Modeling Languages, Domain Specific Modeling Languages, Model-Driven Engineering, and formal modeling and analysis of critical systems. Fields of application include the e-heath and Smart Environments.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 288.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.