ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Study of Dataset Feature Filtering of OpCode for Malware Detection Using SVM Training Phase

Journal: International Journal of Science and Research (IJSR) (Vol.4, No. 12)

Publication Date:

Authors : ;

Page : 474-479

Keywords : SVM; N-gram analysis; obfuscation; area of intersect;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Malware can be defined as any type of malicious code that has the potential to harm a computer or network. To detect unknown malware families, the frequency of the appearance of Opcode (Operation Code) sequences are used through dynamic analysis. Opcode n-gram analysis used to extract features from the inspected files. Opcode n-grams are used as features during the classification process with the aim of identifying unknown malicious code. A support vector machine (SVM) is used to create a reference model, which is used to evaluate two methods of feature reduction, which are area of intersect and subspace analysis using eigenvectors. The SVM is configured to traverse through the dataset searching for Opcodes that have a positive impact on the classification of benign and malicious software. The dataset is constructed by representing each executable file as a set of Opcode density histograms. Classification tasks involve separating dataset into training and test data. The training sets are classified into benign and malicious software. In area of interest the characteristics of benign and malicious Opcodes are plotted as normal distributions. They are grouped into density curves of a single Opcode. The key feature to note is the overlapping area of the two density curves. In Subspace analysis the importance of individual OpCodes, are investigated by the eigenvalues and eigenvectors in subspace. PCA is used for data compression and mapping. The eigenvector filter Opcodes coincides with the SVM classify the malware Opcodes feature.

Last modified: 2021-07-01 14:28:06