Analysis of authorship attribution technique on Urdu tweets empowered by machine learning
Journal: International Journal of Advanced Trends in Computer Science and Engineering (IJATCSE) (Vol.10, No. 3)Publication Date: 2021-06-11
Authors : Zain Ali Arfan Ali Nagra Zufishan Hameed Muhammad Asif;
Page : 2150-2157
Keywords : ;
Abstract
Theprocess of identifying the author of an anonymous document from a group of unknown documents is called authorship attribution. As the world is trending towards shorter communications, the trend of online criminal activities like phishing and bullying are also increasing. The criminal hides their identity behind the screen name and connects anonymously. Which generates difficulty while tracing criminals during the cybercrime investigation process. This paper evaluates current techniques of authorship attribution at the linguistic level and compares the accuracy rate in terms of English and Urdu context, by using the LDA model with n-gram technique and cosine similarity, used to work on Stylometry features to identify the writing style of a specific author. Two datasets are used Urdu_TD and English_TD based on 180 English and Urdu tweets against each author. The overall accuracy that we achieved from Urdu_TD is 84.52% accuracy and 93.17% accuracy on English_TD. The task is done without using any labels for authorship
Other Latest Articles
- DEVELOPMENT MANAGEMENT MECHANISM OF THE JUNIOR OFFICER STRUCTURE OF THE DONETSK PEOPLE'S REPUBLIC POWER STRUCTURES IN THE CONDITIONS OF DIGITALIZATION
- METHODS OF ENSURING THE ECONOMIC SECURITY OF BUSINESS ACTIVITIES AND THE FEATURES OF THEIR HARMONIZATION
- DECISION-MAKING MODEL FOR FORMING DEVELOPMENT STRATEGIES OF INDUSTRIAL ENTERPRISES IN TRANSITION TO THE KNOWLEDGE ECONOMY
- Research: leading motives for getting married
- Astrology: Truth or Fiction
Last modified: 2021-06-16 19:43:18