ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

A BIPARTITE GRAPH MODEL FOR COMPARING CLUSTERING OF SOFTWARE SYSTEM FOR FEATURE LOCATION

Journal: International Journal of Computer Engineering and Technology (IJCET) (Vol.9, No. 4)

Publication Date:

Authors : ; ;

Page : 63-72

Keywords : Cluster Analysis; Cluster Coefficient; Feature Location; Latent Dirichlet Allocation Modeling; Text Mining; Variation of Information.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Text mining is widely used for many Software Engineering tasks and clustering is applied to group the related documents for feature location. There are many clustering algorithms in the literature and each one is designed with a perspective that is suitable for a specific kind of application. Text corpus for the analysis can be drawn from the different sources of a software system. Due to the inconsistent nature of clustering algorithms and the variability of the target text for analysis, a method for evaluating and comparing the clustering is required to choose an appropriate algorithm with respect to an application. In this paper, a bipartite graph model is used to compare any two clustering with external validation measure Variation of Information, Clustering Coefficient and F-Measure. Individual clustering is assessed with internal validation measure Silhouette Coefficient. Three open source software systems namely JEdit, ArgoUML, and JabRef have been taken for empirical evaluation. Latent Dirichlet Allocation modeling and K-Means clustering algorithm are applied on the dataset and the results show that the Variation of Information correctly points out the similarity between similar clusters. This model is designed with the objective to integrate the knowledge derived from different clustering based on the analysis.

Last modified: 2018-12-08 15:13:59