ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

SOCIAL MEDIA HASHTAG CLUSTERING USING GENETIC ALGORITHM

Journal: International Journal of Advanced Research in Engineering and Technology (IJARET) (Vol.9, No. 1)

Publication Date:

Authors : ; ;

Page : 12-25

Keywords : Social Media; Hashtag Clustering; Genetic Algorithm; Crowd Sourcing.;

Source : Download Find it from : Google Scholarexternal

Abstract

Twitter is one of the most influencing microblogging platforms in the revolutionary era of social media. Tweets, short messages posted by the user to interact with the social world, are an invaluable source of data which can be used to predict trends, timeline generation, community detection, etc. Extracting useful information from tweets is challenging because of two reasons; first, a short length of a tweet (140 characters) and second, users just focus on the meaning of a tweet, neither on grammar rules nor on correct spellings. People use the hashtag symbol (#) before keyword or phrase in the tweet to emphasize the importance of those words in the tweet during a search. Hashtag clustering is an important technique to extract the knowledge by categorizing tweets in different clusters. Hashtag clustering is the challenging task due to three major reasons. First, the number of clusters is not known in advance, second, domain-related information is not available and third, different hashtags are being created for the same topic (#deelnet, #deeplearning, #dl, etc.). Genetic Algorithm is an adaptive heuristic search algorithm that mimics the evolutionary process of natural selection and survival of the fittest. To the best of our knowledge, this is the first attempt to cluster hashtags using Genetic Algorithm. We have experimented our algorithm on a large set of tweets downloaded from popular Indian media twitter accounts. The results obtained by our model are compared using crowdsourcing method as there is no other source available to validate the quality of the results. The results achieved by our model are superior compared to crowdsourcing results. Also, the users' validation for the clusters generated proves the accuracy of the proposed model.

Last modified: 2018-04-06 20:30:09