ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Improving Efficiency of Apriori Algorithms for Sequential Pattern Mining

Journal: Bonfring International Journal of Data Mining (Vol.4, No. 1)

Publication Date:

Authors : ; ;

Page : 01-06

Keywords : Data mining; Sets; Sequence data; Time series; Intrusion detection system; DoS attacks;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Computer Systems are exposed to an increasing number of different types of security threats due to the expanding of internet in recent years. How to detect network intrusions effectively becomes an important security technique. Many intrusions arenot composed by single events, but by a series of attack steps taken in chronological order. Analyzing the order in which events occur can improve the attack detection accuracy and reduce false alarms. Intrusion is a multi step process in which a number of events must occur sequentially in order to launch a successful attack. Intrusion detection using sequential pattern mining is a research topic focusing on the field of information security. Sequential Pattern Mining is used to discover the frequent sequential pattern in the event dataset. Sequential Pattern mining algorithms can be broadly classified into Apriori based, Pattern growth based and a combination of both. The first algorithm is based on the characteristic of Apriori and the second uses a pattern growth approach. The major drawback of the Apriori based algorithm is the multiple scans of the database, generating maximal patterns. In this paper, a simulation study of both the algorithms, a modified AprioriALL Algorithm to optimize the processing by including set theory techniques and the original AprioriALL algorithm is done on a network intrusion dataset from KDD cup 1999. Experimental results show that the modified algorithm shrinks the dataset size. At the most, it also scans the database twice. Also, as the interestingness of the itemset is increased with the dataset shrinking it leads to efficient sequences with high associativity. As the database is reduced, the time taken to mine sequences also reduces and is faster than Apriori based algorithm

Last modified: 2015-01-07 14:47:36