ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login


Journal: International Journal of Advanced Research in Engineering and Technology (IJARET) (Vol.12, No. 03)

Publication Date:

Authors : ;

Page : 113-119

Keywords : online; offline; linguistic identification; data recovery; opinion mining; indexation.;

Source : Downloadexternal Find it from : Google Scholarexternal


In online or offline scenarios, from years on, a large number of text-data are generated from different web sources. Mainly incoherent and unstructured format is the huge amount of data, so hard to process via the available computer machines. A large number of unknown objects can be examined in an uncontrolled classification method. Text categorization involves learning methodology which is applied in areas such as linguistic identification, data recovery, opinion mining, spam filtering and e-mail routing, etc. The categorisation of text can also be considered as a mechanism for the labeling of different documents from natural corpus. The text classification by various Mechanisms of Machine Learning meets the challenge of the vector's high dimensionality. The latent semant indexing method can solve this problem by replacing the individual words with statistically derived conceptual indices. We propose a twostage feature selection method with the aim of improving the accuracy and efficiency of categorizing. Firstly, to reduce the dimension of the terms, we apply a new method of selection and then build a new semantinal space, between terms, which is based on the latent semant indexation method. We can find that our two-stage feature selection method works better with certain applications involving the categorisation of the spam database.

Last modified: 2021-03-29 19:46:21