ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

An Enhanced Model for the Classification of Mined Data

Journal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.8, No. 12)

Publication Date:

Authors : ; ; ;

Page : 46-58

Keywords : Text mining; Text classification; Data mining; KNN; Euclidean Distance Classifier; Mean TF-IDF;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Text mining is the process of deriving high-quality information from text. It typically involves the process of structuring the input text, extracting previously unknown pattern and deriving patterns with the structured data. Text classification is the processing of classifying documents into pre-defined categories based on their contents. Unstructured data is typically text heavy and difficult to handle. In this work, we developed an enhanced model for mining and classification of textual data. The methodology used is Object Oriented System Development Methodology (OOSDM) in its approach. We used both K-Nearest Neighbour (KNN) Algorithm and Euclidean Distance Classifier for text mining and classification using data mining that requires fewer documents for training. Also, we employed the association rules from words to derive feature set from pre-classified textual documents and used Mean Term Frequency Inverse Document Frequency (Mean TF-IDF) Model for the feature Selection. The proposed system was implemented with C# Programming language and MySql connector was used to store the dataset in the database. The results of the text to speech from the software show that the model has a shift in pronunciation comparing to the human pronunciation of the natural language. This work could be beneficial to any organization that deals with language interpretations.

Last modified: 2019-12-29 21:05:00