ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Performance evaluation of top-k sequential mining methods on synthetic and real datasets

Journal: International Journal of Advanced Computer Research (IJACR) (Vol.7, No. 32)

Publication Date:

Authors : ; ; ;

Page : 176-184

Keywords : Pattern discovery; Top-k; Data mining; Sequential pattern mining; Association rule mining.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Discovering sequential pattern from a large sequence database is an important problem in the field of sequential pattern mining, which is the well-known data mining technique. Several articles have surveyed the field of sequential pattern mining over the past few years. In those papers major focus was on improving the efficiency of algorithms by employing different techniques. However, the researchers paid less attention to consider the characteristics of the underlying data that the algorithm uses. It is very less investigated. The properties of data incredibly affect the execution of data mining algorithms. This study complemented the top-k sequential pattern mining field by providing further in depth analysis with respect to data properties and characteristics. The performance of top-k sequential pattern mining (TKS) with top-k closed sequential pattern mining (TSP), the state-of-the-art algorithm for top-k sequential pattern mining were evaluated both on synthetic and real databases. Experiments were carried out on real and synthetic datasets having varied characteristics. The impact of different parameters was investigated against the running time and memory usage analysis of each algorithm. Extensive experiments show that TKS and TSP have certain advantages and disadvantages of different types of data. Furthermore, due to the continuous addition of large amounts of data in the databases, the idea of sequential pattern mining (SPAM) is becoming popular. Various algorithms have been developed that are used for mining the sequential patterns in the data. These algorithms have proved to be more effective for smaller databases, but when the size of the database increased, their performance may decline. Hence these methods have to be amended in order to perform the mining processes in a more efficient way.

Last modified: 2017-09-14 18:10:58