Abstract
Structured visualization offers an efficient way to explore and analyze data. Some of the latter need a hierarchical visualization to be more understandable. However, with a large amount of big heterogeneous data, several issues regarding visualization are challenged. The numerous patterns cannot be visualized simultaneously together, and they must be presented incrementally and in an effective manner. On the other hand, the visualization must be done in low latency, while respecting the real-time and scalability constraints. This paper proposes a new approach to prepare large data-sets for multidimensional interactive visualizations. Based on the greedy algorithm, it offers mechanisms to prioritize the potential hierarchical one, select the first patterns to be visualized, and manage the multidimensional visualization cases. The proposed approach was implemented using Apache Spark to ensure the parallel computing tasks. The experimentation shows its efficiency regarding the considered factors such as scalability constraints, complexity, multidimensionality, and hierarchy.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 Data-set consulted on 3 December 2020, URL: https://bchi.bigcitieshealth.org/indicators/1827/searches/34444. It is also accessible in the GitHub repository of this paper’s project.
2 URL of the GitHub repository: https://github.com/Mus-Kah/spark_dataset_preprocessing/releases/tag/v0.1.
Additional information
Notes on contributors
Moustafa Sadek Kahil
Moustafa Sadek Kahil is a PhD student in Computer Science at LAMIS laboratory, Larbi Tebessi University, Tebessa, Algeria. He received his bachelor degree in Computer Science from Larbi Ben Mhidi university, Oum El Bouaghi, Algeria, his master degree in Distributed Architectures from Larbi Ben Mhidi university likewise. His research interests include software engineering, web development, data science including retrieval and visualization, Big Data analytics, and cloud computing.
Abdelkrim Bouramoul
Dr Abdelkrim Bouramoul is an Associate Professor at the Department of Fundamental Computer Science and Applications, Faculty of New Technologies of Information and Communication, University of Constantine 2, Algeria. He is also a researcher in OSIG research group of the MISC Laboratory at the same university. He obtained his University Habilitation (HDR) from Constantine2 University in April 2017 and his PhD in Computer Sciences from the same University in September 2011. He has published many articles in international journals and conferences. His research interests include information processing in Big Data context, Semantic and context-based information retrieval, social and collaborative information retrieval.
Makhlouf Derdour
Pr Makhlouf Derdour received his Engineering degree in computer sciences from University of Constantine, Algeria, in 2004, his Magister degree in computer sciences from University of Tebessa, and his Ph.D. degree in computer networks from the University of Pau and Pays de l’Adour (UPPA), France, in 2012. He is currently a full professor at Computer Science department of the University of Oum El Bouaghi, Algeria. His research interests include software architecture, multimedia applications, adaptation and self-adaptation of applications, design and modelling of systems, systems security. He is a general chair of the International Conference on Pattern Recognition and Intelligent Systems (PAIS).