ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

A NOVEL APPROACH FOR SEMI SUPERVISED CLUSTERING ALGORITHM

Journal: International Journal of Advanced Trends in Computer Science and Engineering (IJATCSE) (Vol.6, No. 1)

Publication Date:

Authors : ;

Page : 1-7

Keywords : Data Mining; Clustering; Semi Supervised Clustering; SesProC; SSL; SSC; Data Clustering;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Semi-supervised clustering (SSC) is an important research problem in machine learning. While it is usually expected that the use of unlabelled data can improve performance, in many cases SSL is outperformed by supervised learning using only labelled data. To this end, the construction of a performance-safe SSL method has become a key issue of SSC study. In this paper classified the effect of fast food on human body by clustering with supervised learning and improve the clustering. This paper also use feature selection and feature extraction. Clustering is the technique used for data reduction. It divides the data into groups based on pattern similarities such that each group is abstracted by one or more representatives. Recently, there is a growing emphasis on exploratory analysis of very large datasets to discover useful patterns. This paper explains extracting the useful knowledge represented by clusters from textual information contained in a large number of emails for text and data mining techniques. E-mail data that are now becoming the dominant form of inter and intra organizational written communication for many companies. The sample texts of two mails are verified for data clustering. The cluster shows the similar emails exchanged between the users and finding the text similarities to cluster the texts. In this paper the use of Pattern similarities i.e., the similar words exchanged between the users by considering the different Threshold values are made for the purpose. The threshold value shows the frequency of the words used. The representation of data is done using a vector space model. .The semisupervised projected model-based clustering algorithm (SeSProC) also includes a novel model selection approach, using a greedy forward search to estimate the final number of clusters. The quality of SeSProC is assessed using synthetic data, demonstrating its effectiveness, under different data conditions, not only at classifying instances with known labels, but also at discovering completely hidden clusters in different subspaces.

Last modified: 2017-03-12 16:25:43