ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

OCR Technology for Detecting Homoglyph Spam

Journal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.13, No. 6)

Publication Date:

Authors : ;

Page : 22-26

Keywords : Homoglyphs; Text recognition; OCR; Spam; Recognition accuracy;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

This paper examines the effectiveness of Optical Character Recognition (OCR) technology in detecting spam composed of homoglyphs, visually similar characters from different scripts. The research utilizes a console application capable of generating all possible homoglyphic representations for specified words, which are then visually represented and analyzed using the OCR tool Tesseract. The study assesses the recognition accuracy of homoglyphs in different text cases: only uppercase, only lowercase, random case, and only the first letter uppercase. Findings reveal variable accuracy rates across these text formats, with uppercase letters generally showing higher recognition rates. This differential recognition underscores the challenges and potential for refining OCR applications to better detect and filter homoglyph-based spam, enhancing security across digital communication platforms.

Last modified: 2024-06-27 17:00:14