ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

N-Gram Analysis in SVM Training Phase Reduction Using Dataset Feature Filtering for Malware Detection

Journal: International Journal of Science and Research (IJSR) (Vol.3, No. 9)

Publication Date:

Authors : ; ;

Page : 550-554

Keywords : n-gram analysis; malware variants; kernel trick; SVM; WEKA tool;

Source : Downloadexternal Find it from : Google Scholarexternal


An n-gram is a sub-sequence of n items from a given sequence. Various areas of statistical natural language processing and genetic sequence analysis are using N-gram Analysis. In which sequence analysis is the process of comparing the sequence or series of attributes in order to find the similarity. Malicious software that is designed by attackers for disturbing computers is called as malware. The principal belong to the same family of malware eventhough Malware variants will have distinct byte level representations. The byte level content is different because small changes to the malware source code can result in significantly different compiled object code. In which programs are used as operational code (opcode) density histograms obtained through dynamic analysis. The process of testing and evaluation of application or a program during running time is called as dynamic analysis. A SVM is used for classification or regression problems. Kernel trickis a technique by SVM to transform your data and then based on these transformations it finds an optimal boundary between the possible outputs. We employ static analysis to classify malware which is identified a prefilter stage using hex values of files, that can reduce the feature set and therefore reduce the training effort. The result shows that the relationships between features are complex and simple statistics filtering approaches do not provide a Practical approach. One of the approach, hex decimal based produces a suitable filter. The entire system will be implemented in WEKA tool.

Last modified: 2021-06-30 21:07:44