ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Analysis of authorship attribution technique on Urdu tweets empowered by machine learning

Journal: International Journal of Advanced Trends in Computer Science and Engineering (IJATCSE) (Vol.10, No. 3)

Publication Date:

Authors : ;

Page : 2150-2157

Keywords : ;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Theprocess of identifying the author of an anonymous document from a group of unknown documents is called authorship attribution. As the world is trending towards shorter communications, the trend of online criminal activities like phishing and bullying are also increasing. The criminal hides their identity behind the screen name and connects anonymously. Which generates difficulty while tracing criminals during the cybercrime investigation process. This paper evaluates current techniques of authorship attribution at the linguistic level and compares the accuracy rate in terms of English and Urdu context, by using the LDA model with n-gram technique and cosine similarity, used to work on Stylometry features to identify the writing style of a specific author. Two datasets are used Urdu_TD and English_TD based on 180 English and Urdu tweets against each author. The overall accuracy that we achieved from Urdu_TD is 84.52% accuracy and 93.17% accuracy on English_TD. The task is done without using any labels for authorship

Last modified: 2021-06-16 19:43:18