On the Utilization Aspect of Document Data for Mining the Side Information
Journal: International Journal of Science and Research (IJSR) (Vol.4, No. 4)Publication Date: 2015-04-05
Authors : N.S. Krishna Prasad; S. Dhana Sekaran;
Page : 3069-3074
Keywords : Data mining; clustering; Text documents; partitioning algorithm;
Abstract
In text mining applications, side-information is also available along with the text documents. This side-information can be like document provenance information, links existing inside the document, web logs based on user-access behavior, or non-textual attributes which exist in the text document. Such attributes will contain remarkable amount of information for clustering purposes. Usually it-s difficult to estimate the importance of this side-information when they are noisy. In these scenarios, there is a huge amount of risk involved in incorporating this side-information into the mining process, since they can add noise to the process rather than improving the quality of the mining process. We need a standard way to perform the mining process, so that we make best use of the advantages based on this side information. In this paper, we propose an algorithm to create an effective clustering approach, based on the combination of traditional partitioning algorithms with probabilistic models. We also show how to illustrate methodology to the classification problem.
Other Latest Articles
- A Review on Advancements beyond Conventional Transistor Technology
- The Designing of Measurement Instrument for Information Technology Risk Assessment as a Risk Management Strategy Recommendation at SBUPE Bandung
- A Hybrid System Using Genetic Algorithm for Anomaly Intrusion Detection
- Object Oriented Software Testability Survey at Designing and Implementation Phase
- Efficient use of Steam Injection for Suppression of N?? Emission in Gas-Turbines Engines
Last modified: 2021-06-30 21:44:39