Search in:

Journal of Information and Telecommunication Volume 2, 2018 - Issue 3

Submit an article Journal homepage

Open access

38,607

Views

CrossRef citations to date

Altmetric

Listen

Articles

Usage Apriori and clustering algorithms in WEKA tools to mining dataset of traffic accidents

Faisal Mohammed Nafie AliDepartment of Computer Science, College of Science and Humanities at Alghat, Majmaah University, Majmaah, Saudi ArabiaCorrespondence[email protected]

Abdelmoneim Ali Mohamed HamedDepartment of Mathematical, College of Science and Humanities at Alghat, Majmaah University, Majmaah, Saudi Arabia

Pages 231-245 | Received 01 Dec 2017, Accepted 01 Mar 2018, Published online: 13 Apr 2018

Cite this article
https://doi.org/10.1080/24751839.2018.1448205
CrossMark

In this article

ABSTRACT
1. Introduction
2. WEKA
3. Related work
4. Methodology
5. Results and discussions
6. Conclusion
Disclosure statement
References

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF

ABSTRACT

The aim of this study is finding approaches for investigating association rules mining algorithms and clustering to offer new rules from a broad set of discovered rules which taken from traffic accident data at Alghat Provence in KSA. Several tools are applying in data mining to extracting data. WEKA provides applications of learning algorithms that can efficiently execute any dataset. In WEKA tools, there are many algorithms used to mining data. Apriori and cluster are the first-rate and most famed algorithms. Apriori is the simple algorithm, which applied for mining of repeated the patterns from the transaction dataset to find frequent itemsets and association between various item sets. A cluster is a technique used to group a collection of items having similar features. Association rules applied to find the connection between data items in a transactional database. Association rules data mining algorithms used to discover frequent association. WEKA tools were used to analysing traffic dataset, which composed of 946 instances and 8 attributes. Apriori algorithm and EM cluster were implemented for traffic dataset to discover the factors, which causes accidents. Through the results, shows that the Apriori algorithm is better than the EM cluster algorithm.

KEYWORDS:

Data mining
association rules
clustering
traffic accidents
Apriori
EM algorithm

1. Introduction

There is a significant amount of data stored in the databases, and with the rapid spread of the data warehouse, it is necessary to find techniques to extract information and knowledge by exploiting these data stored for used in problem-solving and decision-making using modern computer applications, the current smart technology famous as artificial intelligence. Data mining is an analytical process that combines artificial intelligence, statistics, and machine learning. It is considered a step of knowledge in databases. Data mining and machine learning are topics in artificial intelligence that focus on pattern discovery, prediction, and forecasting based on possessions of gathered data (Witten, Frank, Hall, & Pal, Citation2016).

Data mining is repeated process within which progress as the operation is defined by discovery, through either automatic or manual method. It is potential to put data mining actions into one of two classes: predictive and descriptive. The function of predictive is produced the system explained by the given data set. Predictive is generate new, not trivial information based on the available data collections (Han, Pei, & Kamber, Citation2011). Several techniques are using in data mining to extracting data such as R-programing, SPSS, IBM Clementine, WEKA, Knime, and Orange. presents the compression between several data mining tools and shown advantages and disadvantages of these tools (Solanki, Citation2013).

Table 1. Advantages and disadvantages of data mining tools.

Download CSV Display Table

This research aims to suggest an approach for employ association rules mining algorithms and clustering by using data mining tool to offering new rules from a broad set of discovered rules which taken from Traffic accident data at Alghat Provence in KSA within four years (1432, 1434, 1435, and 1436).

Clustering is the assignment of appointing a set of items to groups so that the elements in the same cluster are more like to any other than to these in another one. Clustering is an essential mission of explorative data mining, and a combined method for statistical data analysis used in such fields, containing machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. It offers the best interface to the user than comparing the other data mining tools (Han et al., Citation2011). It is a technique to group a set of items having similar features.

Association rules applied to find the connection between data items in a transactional database. Association rules data mining algorithms used to discover frequent association (Amira, Pareek, & Araar, Citation2015).

There are many algorithms used to mining data. In this paper, authors attempted to find the best association rules using WEKA data mining tools. Apriori and cluster are the first-rate and most famed algorithms. Apriori is the simple algorithm, which applied for mining of repeated the patterns from the transaction database. The Apriori reaches good performance by decreasing the size of candidate sets. However, in states with very many recurrent item sets, large item sets, or little minimum support; it still suffers from the cost of generating a massive number of candidate sets (Wu et al., Citation2008). The objective of using Apriori algorithm is to find frequent itemsets and association between different itemsets, that is, association rule. Apriori is Easy implementation. The algorithm applies information from previous steps to produce the frequent itemsets (Shweta & Garg, Citation2013). Apriori is the most uncomplicated algorithm, which is employed for mining of repetitive patterns from the transaction database. We have aimed to execute the Apriori algorithm for adequate study work, and we have applied WEKA for mentioning the process of association rule mining. The benefits of using Apriori algorithm are usages large item set property. Easily parallelized, simply and easy to implement, Apriori algorithm is an efficient algorithm for finding all frequent itemsets.

EM algorithm is a general method of finding the maximum likelihood estimate of data distribution when data is partially missing or hidden (Prajwala & Sangeeta, Citation2014).The advantages of using EM algorithm are to give a beneficial result for the real world data set. Moreover, use this algorithm when you want to perform cluster analysis of a small scene or region-of-interest, and are not satisfied with the results obtained from the other algorithms (Sharma, Bajpai, & Litoriya, Citation2012). EM algorithm is an essential algorithm for data mining. We used this algorithm when we are satisfied the result of other algorithms methods. EM is chosen to cluster data for the many reasons: first, It has a robust statistical basis. Second, it is linear in database size. Third, it is healthy to noisy data. Fourth, it can accept the desired number of the cluster as input. Fifth, it can handle high dimensionality, and final, it converges fast given a proper initialization (Abbas, Citation2008), guarantees about optimality, easily explainable results (Ordonez & Omiecinski, Citation2002). Also has several disadvantages, Algorithm is highly involved, it is hard to initialize, and the quality of the ﬁnal solution depends on the quality of the initial solution (Slimani & Lazzez, Citation2014).

2. WEKA

WEKA term is a set of modern machine learning ways and data pre-handling tools. It is identified as a set of machine learning approaches for data extraction tasks (Seppelt, Voinov, & Lange, Citation2012). It designed so that users can speedily test out existing machine learning modes on new datasets in very flexible ways (Frank et al., Citation2009). The workbench contains techniques for the first data mining troubles: regressions, classiﬁcation, clustering, and association rule mining, conception, and attribute selection (Seppelt et al., Citation2012). It is an excellent appropriate for improving new machine learning methods (Hall et al., Citation2009). The user can access components through JAVA programming or command line interfaces. It affords graphical user interface in an application named the WEKA Knowledge Flow Environment featuring visual programming and WEKA explorer (Parikh & Tirkha, Citation2013).

There are three additional graphical user interfaces to WEKA. The Knowledge Stream interface allows the user to design conﬁgurations for flowed data handling. WEKA’s third interface, the Experimenter intended to help the user to answer a fundamental practical question when implementing classiﬁcation and regression methods: Which techniques and parameter values work better for the given problem. The fourth interface, so-called the Workbench, is a uniﬁed graphical user interface that combines the other three into one application (Witten et al., Citation2016). In this study, we chose WEKA from other software tools on the market because it is the package that would be recommended for people who are beginners to such software to those who are very adept. The software merely is very robust with built-in features. WEKA is contained many built-in features that require no programming or coding knowledge. Then WEKA has become very common with the academic and industrial researchers, also widely applied for teaching aims. WEKA is better suited for mining association rules, powerful in machine learning techniques and Suited for machine learning. WEKA is user-friendly with a graphical interface that allows for fast setup and operation. WEKA work on the prediction that the user data is gained as a flat file or relation. In another word, that means each data object described by a stable number of attributes that ordinarily are of a specified type, normal alpha-numeric or numeric values (Ramamohan, Vasantharao, Chakravarti, & Ratnam, Citation2012).

WEKA offers applications of learning algorithms that you can efficiently use to your dataset. It contains a diversity of tools for converting datasets, such as the algorithms for discretization and sampling (Witten et al., Citation2016).

WEKA makes it easy to compare different solution strategies based on the same evaluation method and identify the one that is most appropriate for the problem at hand. It is implemented in JAVA and runs on almost any computing platform (Frank, Hall, Trigg, Holmes, & Witten, Citation2004). WEKA provides applications of learning algorithms that can quickly implement any dataset. It also contains a diversity of tools for transforming dataset (Frank et al., Citation2009). WEKA is an open source software tool for implementing machine-learning algorithms.

3. Related work

Bansal and Bhambhu (Citation2013) reported that association rule transacts with frequent itemsets as done by much association algorithms like Apriori algorithm, which used in widely real vitality applications. In this paper, authors contain the use of association rule mining in extracting pattern that frequently happened within a dataset and explanation the implementation of the Apriori algorithm WEKA technique from a dataset, which is gathering of demeaning crimes against women in Session court. This paper studies two association rule algorithms Apriori algorithm and Predictive Apriori algorithm and matches the result of both the algorithms using WEKA tool. Therefore, the result of rules together algorithms visibly shows that Apriori algorithm achieves better and faster than the Predictive Apriori algorithm. The study uses a comprehension of recurrent pattern matching based on support and confidence measures produced excellent results in various fields. The paper indicates that investigation of repetitive pattern matching based on support and confidence measures provided excellent results in multiple areas.

Apriori Algorithm can saw as a two-stage operation:

All item sets having support factor greater than or equal to, the user-specified minimum support.
All rules are having the confidence factor more significant than or similar to the user-specified minimum confidence.

Research explained that association rule in data mining shows a critical key in the process of mining data for repeated item sets. Apriori algorithm is applied to find out and comprehend the underlying patterns involved in the court’s records from their data contains in various sections.

Amira, Pareek, and Araar (Citation2015) offered association rule-mining algorithms are commonly applied to find all rules in the database to pleasing some minimum support and minimum confidence restrictions. The number of generated rules reduced the adaptation of the association rule-mining algorithm to mine only a particular subset of association rules where the classification class attribute is assigned to the right-hand-side was investigated in past research. In this study, a dataset about traffic accidents was gathered from Dubai Traffic Department, UAE. After data preprocessing, Apriori and Predictive Apriori association rules algorithms were applied to the dataset to explore the link between recorded accidents factors to accident acuity in Dubai area. Two sets of class association rules were generated using the two algorithms and summarized to get the most interesting rules using technical measures. Experimental results showed that the class association rules generated by Apriori algorithm were more effective than those generated by Predictive Apriori algorithm were. More associations between accident factors and accident severity level were explored when applying Apriori algorithm. This paper showed that when applying rule covers method on the generated class association rules using Apriori and Predictive Apriori algorithms, many class association rules produced by Apriori algorithm were eliminated, and more effective rules than those generated by Predictive Apriori algorithm were obtained.

Shrivastava and Panda (Citation2014) explained there are several algorithms developed to mine the association rules from the huge databases. Authors offered the Apriori algorithm is the best common algorithm to mine the association rules from the dataset. Various tools are existing to execute the Apriori algorithm. WEKA is an open source software tool for implementing machine-learning algorithms. A study defined WEKA is the gathering or a collection of the implements for execution data mining with the application of the association rules in it. Association rules formed by analysing data for various samples and using the standard support and dependability to identify the most important relationships. They are differed into separate classes in data mining and used in the WEKA to perform the operations. The result in Apriori algorithm generates the best association rule for the dataset after operating the WEKA tool. The implementation of Apriori algorithm, it can be more compatible and purposeful in future, by the implementation of the new association algorithms for some other new operations and analysis in this WEKA tool.

Agrawal and Agrawal (Citation2017) explained details description about Analysis of Clustering Algorithm of WEKA Tools. Paper defined clustering is a method used in several areas such as image analysis, pattern recognition, and statistical data analysis. Clustering is a partition of data into sets of similar items. Every cluster contains various items that are analogous to them and unlike compared to objects of other sets. Some clustering algorithms represent to produce clusters (Chauhan, Kaur, & Alam, Citation2010). WEKA tool used to compare different clustering algorithms. It used because it provides a better interface to the user than compare to other data mining tools. In this paper, algorithms are analysing and comparing the various clustering algorithm by using WEKA tool to find out which algorithm will be more comfortable for the operators for execution clustering algorithm. This present the applications of data mining WEKA tool it provides the cluster’s huge data set and clustering that provide making a hand in the optimizing of the search engine.

Verma, Srivastava, Chack, Diswar, and Gupta (Citation2012) demonstrated EM algorithm is a reiterative procedure for finding maximum likelihood or maximum a posteriori estimates of parameters in statistical paradigm, wherever the model consists of hidden implicit variables. The paper dealt with that the EM iteration rotates among implementing an expectation step, which calculates the expectation of the log-probability estimated using the present evaluate for the parameters, and maximization(M)step, those count parameters maximizing the expected log-probability found on the expectation (E) step. These parameters estimations used to determine the allocation of the potential variables in the sequent expectation step.

Tanna and Ghodasara (Citation2014) discussed the using of Apriori through WEKA for repeated pattern mining. Paper described Apriori algorithm is very an effective for extracting repeated groups for Boolean association rules. The conclusion of this paper that Apriori is the simple algorithm, which applied for mining of repeated patterns from the transaction database. Paper presented the used of WEKA implements for association rule to applying Apriori algorithm. Authors exercised the Apriori Algorithm to get the association rules that have minSupport = 50% and min confidence = 50% by using WEKA GUI. They have tried to implement the Apriori algorithm for sufficient research work, and they have utilized WEKA for referring the process of association rule mining.

AprioriTID: Slimani and Lazzez (Citation2014) reported AprioriTID proposed by Agrawal and Srikant (Citation1994). This algorithm has the extra property that the database is not used at all for counting the support of candidate item set after the first pass. Instead, an encoding of the applicant item sets used in the previous pass is employed for this purpose. The most critical tasks of frequent pattern mining approaches are itemset mining, sequential pattern mining, sequential rule mining, and association rule mining. Apriori algorithm is among the proposed initially structure which deals with association rule problems. In synchronism with Apriori, the AprioriTid and AprioriHybrid algorithms have been offered. For smaller problem sizes, the AprioriTid algorithm is executed equivalently well as Apriori, but the performance degraded two times slower when applied to massive problems. The support counting method included in the Apriori algorithm has involved voluminous research due to the performance of the algorithm. A useful number of ineffective data mining algorithms exist in the literature for frequent mining patterns. This study offered a summary of the status and future directions of frequent pattern mining.

Hierarchical clustering: Steinbach, Karypis, and Kumar (Citation2000) demonstrated the results of experimental research of several general document-clustering techniques: agglomerative hierarchical clustering and K-means. Authors presented Hierarchical clustering is many times represented as the excellent property clustering approach but is limited because of its quadratic time complexity. The runtime of parting K-means is very enticing when compared to that of hierarchical clustering methods. However, through the way of study tests Authors exposed that a simple and efficient variant of K-means, ‘bisecting’ K-means, can produce clusters of documents that are better than those given by ‘regular’ K-means and useful or useful than those yielded by agglomerative hierarchical clustering techniques. The study also has been able to find what we think is a reasonable explanation for this behaviour.

FP-growth: Novitasari, Hermawan, Abdullah, Sembiring, and Herawan (Citation2015) presented the candidate set generation and tests are two major drawbacks in Apriori-like algorithms. Therefore, to deal with this problem, a new data structure called frequent pattern tree (FPTree) was introduced. FP-Growth was then developed based on this data structure and currently is a benchmarked and fastest algorithm for mining frequent itemset Lee, Kim, Cai, Han (Citation2003). The benefits of FP-Growth are, it needs two times of scanning the transaction database. First, it scans the database to calculate a list of various items sorted by descending order and eliminates rare items. Then, it scans to compress the database into an FPTree structure and mines the FP-Tree recursively to construct its conditional FP-Tree.

Mansouri and Javad Kargar (Citation2014) showed driving accidents had always been counted as one of the most likely causes of deaths in the societies today. In this study, the rules and issues motivating the traffic road accidents have been mined along with extracting a local data model after collecting the data from a diversity of sources followed by data collection and combination, data cleaning, and separating the inconsistent data. In this study used data mining methods, such as clustering and decision tree. The objective of this research was to analyse and monitor the road traffic accidents using the data mining techniques in suburban roads in Isfahan Province. The obtained results in this study are interesting and significant which can be considered by authorities as invaluable information to be used for decreasing the road accidents. Furthermore, five algorithms existing in data mining was used in this study for knowledge discovery of the accident dataset. The C5.0 decision tree algorithm proved to generate the best results and performance. Later in this research clustering of the data was also performed but did not result in separation of clusters with a specific meaning. Based on the clustering results, it can be concluded that each route follows its particular pattern and differentiating the data concerning time, vehicle, and the road status is not generalizable to all of the routes. In determining the accident type as Casualty, fatal, and car crash, the most important characteristic was the type of vehicle.

Verma, Srivastava, Chack, Diswar, and Gupta (Citation2012Citation) presented data clustering is a manner of setting similar data into groupings. The paper showed that the cluster algorithm divisions a data set into some groups such that the similarity within a group is larger than among groups. This study revises six types of clustering techniques – k-means clustering, hierarchical clustering, DBS can clustering, density-based clustering, optics, EM algorithm. These clustering techniques are implemented and analysed using a clustering tool WEKA. Performance of the six techniques are obtainable and compared. The paper presented several indicates: The performance of K-means algorithm is better than hierarchical clustering algorithm, all the algorithms have some confusion in some (noisy) data when clustered, K-means and EM algorithm are very sentient for fuss in a dataset. This noise makes it complex for the algorithm to cluster data into convenient clusters while affecting the outcome of the algorithm, K-means algorithm is faster than other clustering algorithm and generates property clusters when applying, a hierarchical clustering algorithm is more sensitive for noisy data.

Prajwala and Sangeeta (Citation2014) demonstrated the two clustering algorithms considered are EM and density-based algorithm. EM algorithm is a common way of discovering the maximum likelihood estimation of data distribution when data are lost or concealed. In density-based clustering, clusters are large areas in the data space, split by sections of lower object density. This paper showed WEKA an open source tool is used for comparing these two algorithms. In conditions of likelihood, EM algorithm is better than a density-based algorithm; the density-based algorithm takes less time than EM algorithm to build the model.

Kumar and Rukmani (Citation2010) proposed this research on the web using mining and in particular, efforts on finding the web usage procedures of websites from the server log files. The study used Apriori algorithm and Frequent Pattern Growth algorithm for evaluation memory practice and time usage. The characterize of using Apriori algorithm are Operates large item set property. It easily parallelized and easy to implement. The research showed some restrictions of Apriori algorithm. Treating a huge number of applicant sets is costly. It is tedious to recurrently scan the database and checked a large set of nominees by pattern identical, which is especially true for mining long patterns. The main distinguishes of the FP-growth algorithm is usages compressed data structure and rejects recurrent database scan. The main obstacle of the FP-growth algorithm is the fulminatory amount of lacks a good candidate generation method. Future research can combine FP-Tree with Apriori nominee to make way to solve the drawbacks of together Apriori and FP-growth.

Krömer et al. (Citation2013) used to investigate a data set describing trafﬁc accidents in Ethiopia and use a machine learning method based on artiﬁcial evolution and fuzzy systems to mine symbolic description of selected features of the data set. Paper demonstrated there are simple fuzzy classiﬁers as well as complex rule-based fuzzy classiﬁcation systems that usually build and maintain sophisticated rule bases. The popularity of fuzzy classiﬁers can be attributed to their ability to perform soft classiﬁcation, to assign multiple labels to data samples, and to the ease of their interpretation. This study compared the ability of evolutionary fuzzy rules to evolve classiﬁers for binary and multi-class attributes. While the rules for a binary attribute were successful, the artiﬁcial evolution as implemented in this work was not able to ﬁnd fuzzy rules that would accurately classify data according to selected multi-class attributes.

Rai, Verma, and Thoke (Citation2012) defined MSApriori is an association rule mining algorithm planned to mine frequent itemsets including rare objects and to give better performance in comparison with approaches that employ single minimum support. In association rule, mining MSApriori algorithm plays an important role as it considers rare item sets. This paper proposed a novel approach MSApriori-T algorithm, which uses total support tree structure to make MSApriori algorithm more efficient. T-tree stores each item in a tree as nodes and links are available to its child node. To beat the drawback of an MSApriori algorithm that needs high storage requirement and processing time, authors proposed an approach that combines the MSA prior algorithm with a total support tree storage structure resulting in a more efficient algorithm in terms of storage requirement and processing time.

4. Methodology

The importance of this research is in suggesting a way using data mining algorithms to determine the causes of accidents in terms of time, road, driver nationality, and type of accident from a large set of discovered rules extracted from Alghat traffic accidents real data. This study was based on traffic accident data which taken from public traffic department in Alghat Provence in KSA within four years (1432, 1434, 1435, and 1436). One of the main obstacles, which researchers faced when collecting data from traffic department that information of the accident in traffic registration form is incomplete. For this reason, many variables had been neglected from the analysis such as the driver age, driver health status, driver behaviour, and weather state. WEKA tools used for preprocessing and analysing data. In WEKA, we implemented two tools, Apriori algorithm in association rules and EM clustering algorithm. A comparison between these algorithms (Ariori and clustering) were made to discover the factors, which caused accidents.

shows an Attributes relationship File Format (ARFF) for the traffic accident dataset after converted it from excel file. The header of the data is started with the name of the relationship (traffic), and a block knows the attributes (year, type of accident, location, number of vehicles, driver, injured, and death). Also, the @data line includes the values entries for each attribute. It is prepared dataset in Attribute relation format file to execute in WIKA interface. ARFF format just gives a dataset; it cannot appoint which of the attributes the one that is supposed to be predicted. It can be applied to locate different algorithms used in WEKA software.

Figure 1. Traffic accident dataset an ARFF file.

In this part displays ARFF file for the traffic accident dataset which pre-processing in WEKA explorer. The file contains 8 attributes and 946 instances. At this stage, the data will be ready for mining and extraction information by using various algorithms supported by WEKA tools.

Figure 2. Opening ARFF file in WEKA explorer.

shows the use of the Apriori algorithm to find best results that have minSupport = 0.4 and minimum confidence = 0.9.

Figure 3. Apply the Apriori algorithm.

. Shows the best results obtained by the EM cluster algorithm.

Figure 4. Apply the EM cluster algorithm in WEKA.

5. Results and discussions

After Apriori algorithm executed, we obtained many results, which based on the size of the set of the large itemsets. explains the results obtained with item sets: 10, the results show that the highest number of accidents occurred in 1434, most of the incidents happened during the day, the most common types of accidents were a collision with another vehicle. The highest accidents appeared in highway the drivers who caused the accidents were non-Saudis. Most accidents happened between two cars or more. The total number of accidents was 946; there were 171 injured and 35 death.

Table 2. Size of a set of large itemsets L (1): 10.

Download CSV Display Table

shows the results taken with item sets: 4, the output display that most of the incidents occurred during the day, the most common types of accidents were a collision with another vehicle. As the results, the highest number of accidents occurred in highway with 51%. Main accidents happened between two vehicles or more.

Table 3. Size of a set of large itemsets L (2): 4.

Download CSV Display Table

displays the results reached with item sets 3, the highest number of accidents occurred in 1434, most of the incidents happened during the day; the most common types of accidents are a collision with another vehicle, and the highway achieved the highest accidents and drivers who caused the accidents were non-Saudis.

Table 4. Size of a set of large itemsets L (2): 13

Download CSV Display Table

shows the best rules found in Apriori algorithm. The results depend on the comparison of confidence, leverage, and convince. All rules in this table support the rules presented in above tables.

Table 5. Best rules using Apriori algorithm.

Download CSV Display Table

represents the number of accidents happened within four years (1432, 1434, 1435, and 1435), the data show that the most accidents and injured occurred in 1434. The highest death in 1435.

Table 6. The distribution of accidents per year.

Download CSV Display Table

represents the summarizes results obtained using the EM clustering algorithm.

Table 7. Summarized results obtained by using EM clustering algorithm.

Download CSV Display Table

Through the results obtained from current study showed that in EM cluster algorithm time taken to build model was (1.58 s), and Log likelihood was (7.46685-). Log-likelihood here refers to the probability of identifying a correct group of data elements. The EM algorithm is a general statistical method of maximum likelihood estimation. EM cluster may converge to a poor locally optimal solution, therefore; it needs an unknown number of iterations to converge to a good solution (Ordonez & Omiecinski, Citation2002). While Appling Apriori algorithm in this research we obtained for the best result because the Apriori algorithm is an efficient algorithm for finding all frequent itemsets. Therefore Apriori algorithm more effective better than the EM cluster algorithm.

6. Conclusion

The aim of the study to present the implementations of the WEKA tools in data mining techniques. Apriori and cluster algorithms used to discover and concept the underlying patterns involved in the traffic accident dataset in Alghat Provence. As result of rules of both algorithms, display that Apriori algorithm performs better and faster than cluster algorithm. The paper presents Apriori algorithm is a simple and efficient tool to analyse the dataset. In general, WEKA interface is a very useful tool in data mining, which allows the user to choose many different algorithm and compare them to reach the accurately required results.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes on contributors

Faisal Mohammed Nafie Ali received the B.Sc. from Omdurman Ahlia University, Faculty of Applied & Computer science in Sudan, in 2001. He got M.Sc. and Ph.D. degree in Computer Science in 2009 and 2014 respectively from Alneelain University Faculty of Computer science and Information Technology in Sudan. He worked in the field of education as Computer Science teacher in Sudan from 2002 to 2013. He worked as Assistant Professor in Majmaah University, Suadia Arabia from 2014 until now; He worked as Oracle Database Administrator in National Pensions Fund in Sudan from 2007 to 2014. He has an experience in Data mining using WEKA and Clementine. He has many Certifications in oracle database Administrator.

Abdelmoneim Ali Mohamed Hamed received the B.Sc. and M.Sc.in mathematics in Sudan, Alnileen University Faculty of science, in 1989 and 2005 respectively. He has received Ph.D. in applied statistics in Sudan, Sudan University for science and technology, 2012. From 1989 to 2008, He worked in the field of education as a mathematics teacher in Sudan and Saudi Arabia. From 2009 to 2013, He worked as a lecturer at Al Ahfad University for Girls. From 2014 until now, he worked as Assistant Professor at Al Majmaah University. He has an experience in statistical analysis using SPSS and WEKA.

References

Abbas, O. A. (2008). Comparisons between data clustering algorithms. International Arab Journal of Information Technology (IAJIT), 5(3), 320–325.
Web of Science ®Google Scholar
Agrawal, R., & Agrawal, J. (2017). Analysis of clustering algorithm of Weka tool on air pollution dataset. International Journal of Computer Applications, 168(13), 1–5. doi: 10.5120/ijca2017914522
Google Scholar
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules, In Proc. 20th Int. Conf. Very Large Data Bases, VLDB, vol. 1215, (pp. 487–499), September.
Google Scholar
Amira, A., Vikas, P., & Abdelaziz, A. (2015). Applying Association Rules Mining Algorithms for Traffic Accidents in Dubai. International Journal of Soft Computing and Engineering (IJSCE), 5(4), 1–12.
Google Scholar
Bansal, D., & Bhambhu, L. (2013). Usage of Apriori algorithm of data mining as an application to grievous crimes against women. International Journal of Computer Trends and Technology, 4(19), 3194–3199.
Google Scholar
Chauhan, R., Kaur, H., & Alam, M. A. (2010). Data clustering method for discovering clusters in spatial cancer databases. International Journal of Computer Applications, 10(6), 9–14. doi: 10.5120/1487-2004
Google Scholar
Frank, E., Hall, M., Holmes, G., Kirkby, R., Pfahringer, B., Witten, I. H., & Trigg, L. (2009). Weka-a machine learning workbench for data mining. In In data mining and knowledge discovery handbook (pp. 1269–1277). Boston, MA: Springer.
Google Scholar
Frank, E., Hall, M., Trigg, L., Holmes, G., & Witten, I. H. (2004). Data mining in bioinformatics using Weka. Bioinformatics (oxford, England), 20(15), 2479–2481. doi: 10.1093/bioinformatics/bth261
PubMed Web of Science ®Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10–18. doi: 10.1145/1656274.1656278
Google Scholar
Han, J., Pei, J., & Kamber, M. (2011). Data mining: Concepts and techniques. Elsevier.
Google Scholar
Krömer, P., Beshah, T., Ejigu, D., Snášel, V., Platoš, J., & Abraham, A. (2013, April). Mining traffic accident features by evolutionary fuzzy rules. In Computational Intelligence in Vehicles and Transportation Systems (CIVTS), 2013 IEEE Symposium on (pp. 38–43). IEEE.
Google Scholar
Kumar, B. S., & Rukmani, K. V. (2010). Implementation of web usage mining using APRIORI and FP growth algorithms. International Journal of Advanced Networking and Applications, 1(06), 400–404.
Google Scholar
Lee, Y. K, Kim, W. Y, Cai, Y. D, & Han, J. (2003). CoMine: Efficient Mining of Correlated Patterns. In ICDM, 3, 581–584. November.
Google Scholar
Mansouri, M., & Javad Kargar, M. (2014). Analysis and monitoring of the traffic suburban road accidents using data mining techniques; a case study of Isfahan Province in Iran. The Open Transportation Journal, 8(1), 39–49. doi: 10.2174/1874447801408010039
Google Scholar
Novitasari, W., Hermawan, A., Abdullah, Z., Sembiring, R. W., & Herawan, T. (2015). A method of discovering interesting association rules from student admission dataset. International Journal of Software Engineering and Its Applications, 9(8), 51–66. doi: 10.14257/ijseia.2015.9.8.05
Google Scholar
Ordonez, C., & Omiecinski, E. (2002, November). FREM: Fast and robust EM clustering for large data sets. In Proceedings of the eleventh international conference on Information and knowledge management. (pp. 590-599). ACM.
Google Scholar
Parikh, D., & Tirkha, P. (2013). Data mining & data stream mining—open source tools. International Journal of Innovative Research in Science, Engineering and Technology, 2(10), 5234–5239.
Google Scholar
Prajwala, T. R., & Sangeeta, V. I. (2014). Comparative analysis of EM clustering algorithm and density based clustering algorithm using WEKA tool. International Journal of Engineering Research and Development, 9(8), 19–24.
Google Scholar
Rai, D., Verma, K., & Thoke, A. S. (2012). MSApriori using total support tree data structure. International Journal of Computer Applications, 43(23), 45–49.
Google Scholar
Ramamohan, Y., Vasantharao, K., Chakravarti, C. K., & Ratnam, A. S. K. (2012). A study of data mining tools in knowledge discovery process. International Journal of Soft Computing and Engineering (IJSCE) ISSN, 2(3), 2231–2307.
Google Scholar
Seppelt, R., Voinov, A. A., & Lange, S. (2012). Tools for environmental data mining and intelligent decision support. Iemss. Org.
Google Scholar
Sharma, N., Bajpai, A., & Litoriya, M. R. (2012). Comparison the various clustering algorithms of WEKA tools. Facilities, 4(7), 78–80.
Google Scholar
Shrivastava, A. K., & Panda, R. N. (2014). Implementation of Apriori algorithm using WEKA. KIET International Journal of Intelligent Computing and Informatics, 1(1), 4.
Google Scholar
Shweta, M., & Garg, D. K. (2013). Mining efficient association rules through Apriori algorithm using attributes and comparative analysis of various association rule algorithms. International Journal of Advanced Research in Computer Science and Software Engineering, 3(6), 306–312.
Google Scholar
Slimani, T, & Lazzez, A. (2014). Efficient analysis of pattern and association rule mining approaches. Journal of Information Technology and Computer Science (IJITCS), 6(3), 70–81. doi: 10.5815/ijitcs.2014.03.09
Google Scholar
Solanki, H. (2013). Comparative study of data mining tools and analysis with unified data mining theory. International Journal of Computer Applications, 75(16), 23–28. doi: 10.5120/13195-0862
Google Scholar
Steinbach, M., Karypis, G., & Kumar, V. (2000, August). A comparison of document clustering techniques. In KDD workshop on text mining (Vol. 400, No. 1, pp. 525–526).
Google Scholar
Tanna, P., & Ghodasara, Y. (2014). Using Apriori with WEKA for frequent pattern mining. arXiv preprint arXiv:1406.7371.
Google Scholar
Verma, M, Srivastava, M, Chack, N, Diswar, A. K, & Gupta, N. (2012). A comparative study of various clustering algorithms in data mining. International Journal of Engineering Research and Applications (IJERA), 2(3), 1379–1384.
Google Scholar
Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data mining: Practical machine learning tools and techniques. Cambridge, MA: Morgan Kaufmann.
Google Scholar
Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., & Zhou, Z. H. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1–37. doi: 10.1007/s10115-007-0114-2
Web of Science ®Google Scholar

Download PDF

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Usage Apriori and clustering algorithms in WEKA tools to mining dataset of traffic accidents

ABSTRACT

1. Introduction

Table 1. Advantages and disadvantages of data mining tools.

2. WEKA

3. Related work

4. Methodology

5. Results and discussions

Table 2. Size of a set of large itemsets L (1): 10.

Table 3. Size of a set of large itemsets L (2): 4.

Table 4. Size of a set of large itemsets L (2): 13

Table 5. Best rules using Apriori algorithm.

Table 6. The distribution of accidents per year.

Table 7. Summarized results obtained by using EM clustering algorithm.

6. Conclusion

Disclosure statement

References

Information for

Open access

Opportunities

Help and information

Usage Apriori and clustering algorithms in WEKA tools to mining dataset of traffic accidents

ABSTRACT

1. Introduction

Table 1. Advantages and disadvantages of data mining tools.

2. WEKA

3. Related work

4. Methodology

5. Results and discussions

Table 2. Size of a set of large itemsets L (1): 10.

Table 3. Size of a set of large itemsets L (2): 4.

Table 4. Size of a set of large itemsets L (2): 13

Table 5. Best rules using Apriori algorithm.

Table 6. The distribution of accidents per year.

Table 7. Summarized results obtained by using EM clustering algorithm.

6. Conclusion

Disclosure statement

Notes on contributors

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date