Performance evaluation of top-k sequential mining methods on synthetic and real datasets
Journal: International Journal of Advanced Computer Research (IJACR) (Vol.7, No. 32)Publication Date: 2017-09-14
Authors : Asima Jamil; Abdus Salam; Farhat Amin;
Page : 176-184
Keywords : Pattern discovery; Top-k; Data mining; Sequential pattern mining; Association rule mining.;
Abstract
Discovering sequential pattern from a large sequence database is an important problem in the field of sequential pattern mining, which is the well-known data mining technique. Several articles have surveyed the field of sequential pattern mining over the past few years. In those papers major focus was on improving the efficiency of algorithms by employing different techniques. However, the researchers paid less attention to consider the characteristics of the underlying data that the algorithm uses. It is very less investigated. The properties of data incredibly affect the execution of data mining algorithms. This study complemented the top-k sequential pattern mining field by providing further in depth analysis with respect to data properties and characteristics. The performance of top-k sequential pattern mining (TKS) with top-k closed sequential pattern mining (TSP), the state-of-the-art algorithm for top-k sequential pattern mining were evaluated both on synthetic and real databases. Experiments were carried out on real and synthetic datasets having varied characteristics. The impact of different parameters was investigated against the running time and memory usage analysis of each algorithm. Extensive experiments show that TKS and TSP have certain advantages and disadvantages of different types of data. Furthermore, due to the continuous addition of large amounts of data in the databases, the idea of sequential pattern mining (SPAM) is becoming popular. Various algorithms have been developed that are used for mining the sequential patterns in the data. These algorithms have proved to be more effective for smaller databases, but when the size of the database increased, their performance may decline. Hence these methods have to be amended in order to perform the mining processes in a more efficient way.
Other Latest Articles
- Global optimisation using Pareto cuckoo search algorithm
- HISTORICAL ASPECTS OF LEAN ACCOUNTING’S DEVELOPMENT
- A COMPARATIVE ANALYSIS OF EARTH SCIENCE EDUCATION IN ELEMENTARY SCHOOLS IN TURKEY AND IN THE USA
- CONTEMPORARY TEACHING METHODS EMPHASIZING CONCEPTUAL UNDERSTANDING ADAPTED FOR ENGINEERING EDUCATION AT ESTONIAN CENTRE FOR ENGINEERING PEDAGOGY
- STRATEGIES OF DIALOGUE IN MEDIA-AIDED BIOLOGY EDUCATION
Last modified: 2017-09-14 18:10:58